Bifrost adds a value for the C factor equaling 2*src. This does not
correspond directly to API blend modes so it is not too useful in
general. However, it's required for src*dest + dest*src blending to be
done in hardware instead of a blend shader. GFXbench uses that blend
mode, so it must be important ;-)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12152>
Add unit tests for the fixed-function blending helpers in pan_blend.c.
Each test consists of a Porter-Duff blend mode and the associated
hardware state. In this commit, we add tests for the most common modes.
For motivation, this code has NOT been properly tested in CI. True,
functional correctness of the blend module as a whole is tested by
dEQP-GLES3.functional.fragment_ops.blend.* among other integration
tests. However, this testing is insufficient to check for regressions.
Crucially, the following broken patch would clear CI:
bool pan_can_fixed_function(...) {
return false;
}
In that case, blend shaders are used 100% of the time, which will
regress performance horribly but still pass dEQP. The only clue
something went wrong would be some traces changing checksum due to the
fixed-function blender producing slightly different output than
equivalent blend shaders. By unit testing the fixed blend path, we
ensure we always use the fixed-function path when we expect it to.
Similarly, using incorrect values for the blend metadata may not affect
functional correctness but will increase power consumption. Let's check
all the data we export to drivers.
Note: due to additive commutativity, there are many pairs of equivalent
Mali blend modes. Unfortunately, the vendor is... inconsistent about how
to resolve ambiguous modes. Our algorithm for computing modes is
correct; the "preferred" values are left in comments since otherwise our
tests fail despite correct code. I want to blame Bifrost for this, but
Midgard was patient zero.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12152>
This is required to disable dithering on a per-draw basis when OPAQUE
output is used (bypassing the blender which normally uses the
round_to_framebuffer_precision flag to do the same).
This functionally reverts:
ebc07f4b2f ("panfrost: Remove padded unorm blendable formats")
fae90a7940 ("panfrost: Always pick dithered tb formats")
while adding the functionality to make them useful.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12152>
It's an optimization pass -- omitting it should not cause MMU faults
(!). Make sure the UBO push mask is set regardless of whether the pass
is called, and just call the pass when required.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12328>
Changes:
- nir_metadata_preserve(..., nir_metadata_dominance)
is called only when pass makes progress
- nir_metadata_preserve(..., nir_metadata_all) is called when pass doesn't
make progress
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12324>
Changes:
- nir_metadata_preserve(..., nir_metadata_dominance)
is called only when pass makes progress
- nir_metadata_preserve(..., nir_metadata_all) is called when pass doesn't
make progress
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12324>
Changes:
- nir_metadata_preserve(..., nir_metadata_all) is called when pass doesn't
make progress
v2: fix build
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12324>
Changes:
- removal of attr_wa_state (it's passed directly)
- nir_metadata_preserve(..., nir_metadata_all) is called when pass doesn't
make progress
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12324>
gl_FragDepth default value is gl_FragCoord.z so if a shader does:
gl_FragDepth = gl_FragCoord.z
we can drop this assignment.
v2: use nir_ssa_scalar_resolved and don't do this is gl_FragDepth
is wrote multiple times (Jason)
v3: - move to its own pass (Jason)
- handle var = NULL (Rhys)
v4: refactoring (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10697>
This is the virtual memory address of the buffer object. Calling it the
BO's address is a lot more obvious than calling it an offset in one of
the now many graphics translation tables.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12206>
Per https://gitlab.freedesktop.org/mesa/mesa/-/issues/5178#note_1019666,
the assumption fundamental to this optimization is false. Section
2.4.1 (Float to Integer) of Ivy Bridge PRMs describes the situation.
The wording of the section is somewhat confusing (because it doesn't
clearly delineate between signed and unsigned integers), but the last
two rows of the table make it clear that F->UD conversion clamps
negative float values to 0.
All other hardware mentioned in that thread seems to behave the same
way.
The real problem is that, with hardware that behaves in this ways,
converting f2u(2147483648.0) to f2i(2147483648.0) changes the bit pattern
that would be produced from 0x80000000 to 0x7fffffff.
This reverts commit ad05920258.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>
clang in llvm 12 no longer accepts "-cl-denorms-are-zero" as a cc1
options which is how this code uses it.
For now just pick the correct cc1 equivalent.
This fixes a crash with llvm master and CL conversions tests
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12286>
We updated the key correctly for whether we wanted to use a
safe_constlen binning pass variant, but then passed the wrong
key to ir3_shader_variant().
Fixes: 1dd24bf27b ("freedreno: Share constlen between different stages properly")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12314>
Otherwise drivers that don't use 16-bit slots for varyings will get
confused and have their driver_locations scribbled over. This has caused
multiple problems for both Panfrost and Asahi this week. Given the only
other user of the pass for varyings is radeonsi, which needs both
together, I think this is the least controversial fix.
Fixes: fb29cef8dd ("nir: add many passes that lower and optimize 16-bit input/outputs and samplers")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11732>
A non-static inline function body is only actually emitted by GCC during optimization passes,
so running -O0 ends up never emitting the body, producing linker errors.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12158>
It can happen that the user reads an input attachment as the first use
of that attachment. In that case there are no subpass dependencies
required at all, because there could be a pipeline barrier before the
renderpass instead, and in any case we assume that dependencies with the
first subpass as a destination can be executed only once outside the
renderpass. The result is that we only do a CACHE_INVALIDATE once
before the entire renderpass, but it's actually required after each GMEM
load, because input attachments read GMEM through UCHE and those writes
to GMEM invalidate UCHE.
While we could add the missing CACHE_INVALIDATE "by hand" somehow, it
turns out it's actually just as easy to do an optimization the blob
does, where it simply doesn't patch those input attachments and reads
them directly instead. This means we can skip allocating memory in GMEM
for them entirely in some circumstances.
This fixes e.g.
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image.4_bit
with TU_DEBUG=forcebin.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12213>
Setting it to GPU can cause an L3$ flush in certain cases. That's not
what we want as we really only care about coherency within the GPU.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12291>