The lowered LS and NGG stages use local_invocation_index and they
can benefit from the unsigned upper bound because they can emit a
less expensive integer multiplication instruction.
This was working in the past, but accidentally borked by a refactor.
Fossil DB changes on Sienna Cichlid:
Totals from 956 (0.74% of 128647) affected shaders:
CodeSize: 2354172 -> 2344712 (-0.40%)
Instrs: 434359 -> 434327 (-0.01%)
Latency: 1883949 -> 1876814 (-0.38%)
InvThroughput: 762638 -> 757405 (-0.69%)
Fossil DB changes on Sienna Cichlid (with NGGC enabled):
Totals from 57873 (44.99% of 128647) affected shaders:
CodeSize: 155844192 -> 155607064 (-0.15%)
Instrs: 29799184 -> 29799152 (-0.00%)
Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00%
InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00%
Fixes: 8af6766062
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
(cherry picked from commit 548b383310)
Consider the following sequence in a shader:
b = p_extract a
c = v_mad_u32_u16 b, X, 0
The optimizer applies extract, resulting in:
c = v_mad_u32_u16 a, X, 0 (correct)
Then it mistakenly turns that into:
c = v_mul_u32_u24 a, X, 0 (incorrect)
In this case, the p_extract is applied to v_mad_u32_u16 by
apply_extract. After this, we can no longer be sure that
the operands are still 16 or 24-bit, so we have to remove
this flag.
No Fossil DB changes.
Fixes: 54292e99c7
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
(cherry picked from commit 76b9dd6266)
Observed in a shader from Resident Evil Village.
This also helps prevent emitting invalid IR.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12599>
(cherry picked from commit cfb0d931f2)
Fixes assert when ReadPixels() called to read from FBO to
GL_PIXEL_PACK_BUFFER, on mip-level > 0, since num_layers
wasn't properly calculated with mip-level.
v2: patched 'iris_create_sampler_view' function instead of
'resolve_sampler_views'. Just like it was suggested in this
function's comment.
The logic of fix is similar to one in 'update_image_surface' function
of i965 driver, which is introduced in commit
f9fd0cf479.
With a slight change: setting array_len=1, like it was done in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5808 ,
since minifying depth fails KHR-GLES2.texture_3d.filtering tests.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4145
Fixes: 3c979b0e ('iris: add some draw resolve hooks')
Signed-off-by: Yevhenii Kharchenko <yevhenii.kharchenko@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9253>
(cherry picked from commit b945262773)
We can't append instructions following a return/halt instruction
because the control flow helpers will modify the successor of the
block containing the return/halt. And the NIR validator enforces that
the return/halt must have the end of the function as successor.
This tends to happen following lower_shader_calls lowering which
inserts halts. This probably doesn't prevent the optimization, it'll
just happen in one of the return shaders after the halt has been
removed.
v2: Move prev block ending check earlier in the function (Daniel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12506>
(cherry picked from commit a13e79843e)
We were treating each field as if it took up a single slot. However
that's not the case. And with strict matching (GLSL 4.20+ / ES 3.1+) we
would end up not matching identical interfaces.
Fixes: c4545676d7 ("glsl/linker: fix location aliasing checks for interface variables")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12479>
(cherry picked from commit 18962b94d3)
Stencil on Gfx7 has a vertical alignment element of 8, but the largest
its surface state can express is 4. Apply the Gfx6 solution of changing
the alignment in blorp_surf_retile_w_to_y.
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12132>
(cherry picked from commit c7bcbc950c)
The hardware can handle much larger textures than we allowed. The game
"Cathedral" requires larger textures for some bizarre reason. Raise the
limit so the game runs, syncing MAX_MIP_LEVELS, the comments, and the CAPs.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #5203
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12312>
(cherry picked from commit d051b06a48)
Don't attempt to transform uniform boolean instructions when
their operands are unsuitable. This can happen eg. due to other
optimizations that combine SALU instructions which clear out
the uniform instruction labels.
Cc: mesa-stable
Fixes: 8a32f57fff
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11573>
(cherry picked from commit abcc83e713)
this was apparently always broken, but in a very, very subtle way
where the hash table would compare the current pipeline state against
itself instead of using the cache entry's state
Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12535>
(cherry picked from commit 560dc4f790)
This ensures each context can have a separate batch writing a resource
and we don't race trying to flush each other's batches. Unfortunately
the extra hash table operations regress draw-overhead numbers by about
8% but I'd rather eat the overhead and have an obviously correct
implementation than leave known buggy code in tree.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12528>
(cherry picked from commit 8503cab2e0)
Conflicts:
src/gallium/drivers/panfrost/pan_job.c
This expresses the semantic of the flush only applying to batches within
the context, not globally, in line with OpenGL's multithreading rules.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12528>
(cherry picked from commit e98aa55413)
Conflicts:
src/gallium/drivers/panfrost/pan_job.c
This is on the context, so no concurrency issues. This will allow us to
efficiently iterate active batches.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12528>
(cherry picked from commit 79dd1a4e63)
This can be tracked efficiently with atomics, and reduces the places we
use the rsrc->track.users bitmap which has concurrency issues.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12528>
(cherry picked from commit b8da5b1b7f)
This will help us reduce shared state and simplify multithreading, at
the expense of additional CPU overhead.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12528>
(cherry picked from commit 2f63ccd080)
We already took the lock, we just unlocked too early. Since the label is
reset in the BO cache, this is racy. Minimal impact in practice but is
still /wrong/ and caught by helgrind.
Fixes: 3fa1f93dac ("panfrost: Label all BOs in userspace")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12525>
(cherry picked from commit bd15e5e6af)
ralloc is not thread safe, so we cannot use a pipe_screen as a ralloc
context unless we lock the screen. The allocation patterns for resources
are trivial, so just use malloc/calloc/free directly instead of ralloc.
This fixes a segfault in:
dEQP-EGL.functional.sharing.gles2.multithread.random.images.copytexsubimage2d.1
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12525>
(cherry picked from commit e6924be737)
Without a lock, two threads may bind the same shader CSO simultaneously,
allocate the same variant simultaneously, and then race each other in
the compiler. This manifests in various ways, most commonly failing the
assertion that UBO pushing has only run once. The simple_mtx_t solution
is used in Iris. Fixes the crash in:
dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12525>
(cherry picked from commit 40edc87956)
The current code overwrote the original view which meant if we
had to reemit a surface the second emit would be wrong.
This fixes cubemaps on gm45 and maybe some issues with 3D textures
elsewhere.
Fixes: f3630548f1 ("crocus: initial gallium driver for Intel gfx 4-7")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12514>
(cherry picked from commit 63138c42c5)
Conflicts:
src/gallium/drivers/crocus/crocus_state.c
If a user attempts to run Panfrost on an unsupported GPU (e.g. Mali
T604), Panfrost will refuse to load and will destroy the screen
immediately, allowing for a graceful fallback to a software rasterizer.
However, the screen destroy code calls a screen_destroy function in the
GenXML vtbl -- and this function is still NULL when the allowlist is
checked. This manifests as crashes on unsuported GPUs.
Issue tracked down with Icecream95's mad Ghidra skills.
Closes: #5269
Fixes: 88dc4db6be ("panfrost: Init/destroy blitter from per-gen file")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12512>
(cherry picked from commit 2d31d469f7)
Fixes an invalid read caught by valgrind when there is a hole in the
valid render target mask:
==6749== Conditional jump or move depends on uninitialised value(s)
==6749== at 0x5E88EC0: panfrost_prepare_fs_state (pan_cmdstream.c:417)
==6749== by 0x5E88EC0: panfrost_emit_frag_shader (pan_cmdstream.c:501)
==6749== by 0x5E88EC0: panfrost_emit_frag_shader_meta (pan_cmdstream.c:573)
==6749== by 0x5E88EC0: panfrost_update_state_fs (pan_cmdstream.c:2593)
==6749== by 0x5E8B0BF: panfrost_direct_draw (pan_cmdstream.c:2839)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Fixes: a124c47b9f ("panfrost: Fix NULL derefs in pan_cmdstream.c")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11383>
(cherry picked from commit 3113dbd837)
Otherwise we end up accessing overwritten registers. Fixes
dEQP-GLES31.functional.draw_buffers_indexed.overwrite_common.common_enable_buffer_enable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11383>
(cherry picked from commit f45ceb8182)
This seems to have fixed a few additional tests on 21.2:
- dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.const_expression_fragment
- dEQP-GLES31.functional.draw_buffers_indexed.overwrite_indexed.common_separate_blend_func_buffer_separate_blend_func
- dEQP-GLES31.functional.draw_buffers_indexed.overwrite_common.common_blend_func_buffer_separate_blend_func
These are the same! Either you're blendable and can use f32/f16
conversion, or you're raw and you can only get raw. It's that simple!
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11383>
(cherry picked from commit 2cf581b195)
So the hash function doesn't end up hashing uninitialized values.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reported-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Fixes: bbff09b952 ("panfrost: Move the blend shader cache at the device level")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11383>
(cherry picked from commit 6b7b8eb046)
Fixes framebuffer_fetch and blend_equation_advanced dEQP tests on v6.
v2: Use clause dependencies rather than comparing the message type
v3: Shift the BIFROST_SLOT_* constants before using them as a mask
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12375>
(cherry picked from commit 295807e666)
Apparently, CLPER_V7 is missing from Mali G31, but CLPER_V6 works. Fixes
INSTR_INVALID_ENC faults and failures in
dEQP-GLES3.functional.shaders.derivate.* on Dvalin.
Technically not an errata but an implementation difference. I suspect
Mali G51 will need this as well, should we ever allowlist it.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12478>
(cherry picked from commit 61c8e39649)
Conflicts:
src/panfrost/bifrost/bi_quirks.h
Use the explicit sample mode and set the sample ID in the pixel indices
structure to the current sample ID. This fixes tilebuffer loads in blend
shaders on multisampled framebuffers.
Make sure the new routine is broken out to a helper for use with ST_TILE
in the next commit.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12478>
(cherry picked from commit 16394dc71a)
This breaks screen-space derivatives in a shader that uses multiple
render targets, if the derivative calculation is scheduled after a BLEND
instruction calling into a blend shader.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12478>
(cherry picked from commit 710498e424)
Although it is passing all of dEQP-GLES31, it is failing a few
KHR-GLES31.* tests. It also has performance issues at the moment. Invert
the existing noindirect debug flag to become a indirect debug flag. Set
this flag for dEQP-GLES31 CI on G52, to make sure the code doesn't bit
rot on the hope someone will pick this up later on.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12478>
(cherry picked from commit a7f7d74137)
Conflicts:
src/gallium/drivers/panfrost/pan_screen.c
src/panfrost/lib/pan_util.h
In b9c095cc2c ("panfrost: Rewrite the clear colour packing code"),
packing of clear colours was corrected to use the tilebuffer's
fractional bits, fixing dithering of the clear colour with formats like
RGB565. Unfortunately, that commit did so unconditionally. If the
framebuffer is dithered, but dithering is disabled at the time of
the clear, we would incorrectly dither the clear.
This is a regression, as the old (broken) code passed the relevant CTS
test. What's the catch? Depending on dither state, there are two
formulas to pack tilebuffer colours. We need to handle both. Fixes
KHR-GLES31.core.draw_buffers_indexed.color_masks.
Fixes: b9c095cc2c ("panfrost: Rewrite the clear colour packing code")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12460>
(cherry picked from commit 22538b89b3)
Conflicts:
src/panfrost/lib/tests/test-clear.c
src/panfrost/vulkan/panvk_cmd_buffer.c
We have no reason to report a subpixel precision of 4 for lines; in fact
LLVMpipe uses 8 subpixel bits for lines, similar to other primitives.
But let's use the pipe-cap for this instead of hard-coding it.
Fixes: 9fbf6b2abf ("lavapipe: implement VK_EXT_line_rasterization")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12277>
(cherry picked from commit a16f3963d3)
Commit 3a772be026 ("freedreno: Add perfetto renderpass support")
uses C++17 init-statement feature.
GCC
../src/gallium/drivers/freedreno/freedreno_perfetto.cc: In lambda function:
../src/gallium/drivers/freedreno/freedreno_perfetto.cc:148:11: warning: init-statement in selection statements only available with ‘-std=c++17’ or ‘-std=gnu++17’
148 | if (auto state = tctx.GetIncrementalState(); state->was_cleared) {
| ^~~~
Clang
../src/gallium/drivers/freedreno/freedreno_perfetto.cc:148:11: warning: 'if' initialization statements are a C++17 extension [-Wc++17-extensions]
if (auto state = tctx.GetIncrementalState(); state->was_cleared) {
^
Intel C++ Compiler
../src/gallium/drivers/freedreno/freedreno_perfetto.cc(148): error: expected a ")"
if (auto state = tctx.GetIncrementalState(); state->was_cleared) {
^
Fixes: 3a772be026 ("freedreno: Add perfetto renderpass support")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5193
Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12293>
(cherry picked from commit 4fc2a6cbdb)
With hw devices, when you submit a present, implicit sync will
make sure the work submitted to the gpu on the client will end
up happening before the present work submitted on the server.
However with sw paths there is no real GPU, the lavapipe fake
GPU thread is client side only and presenting is done directly
from the pixmap (or later shared pixmap). In order for this to
make sense the wsi common code should wait for the fence on the
image before queueing the submit to the server so that all
client works has been flushed to the pixmap before the copy or
present operation is submitted.
Fixes: 8004fa9c95 ("vulkan/wsi: add sw support. (v2)")
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12502>
(cherry picked from commit 0cddfba328)
Using separate aspects is required.
Fixes few CTS failures (dEQP-VK.api.copy_and_blit.*) when the compute
path is forced in the driver. Note that CTS coverage of compute queue
is rather limited.
Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12287>
(cherry picked from commit be6bdb0918)
It can happen that we create an enormous merge set, even larger than the
entire register file, in which case find_best_gap() would loop
infinitely. This seems to be triggered more often with
IR3_SHADER_DEBUG=spillall, since it actually happened with a CTS test.
Just bail out in that case.
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12033>
(cherry picked from commit efb34d6ee6)