Since the variable length dot must stay in its slot configuration
do not try to merge the group with the previous group when an op may be
moved to the t slot, because this may lead to breaking the multi-slot
operation.
Fixes: 357e5fac99
r600/sfn: Use variable length DOT on Evergreen and Cayman
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20451>
In the rare case that after register allocation the highest directly
accessed register index is below the highest value used for an
indirectly accessed array we have to ensure that the shader allocates
enough registers to account for these indices that are not seen by the
assembler.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7966
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20451>
76 out of 86 piglits pass.
Some fail because SSBOs are only supported for FS and CS on r600, but
the piglits try to use SSBOs with VS, and there are piglits that try to
bind SSBO at 8, and only 0-7 is supported as binding point.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20451>
On drivers that do not set PIPE_CAP_SHAREABLE_SHADERS,
st_destroy_program_variants() may reach st_save_zombie_shader()
which accesses st->zombie_shaders.mutex.
Destroying st->zombie_shaders.mutex before destroying program variants
may result in an invalid access in a multiple context scenario for
those drivers.
Move the mutex destroy call to after program variants destroy so that
it doesn't hit a deadlock on context destroy.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20458>
radv_before_taskmesh_draw will no longer call radv_before_draw and
instead implement the necessary functionality on its own.
radv_before_draw will no longer have to emit mesh shader descriptors.
As a result, both functions should have a lower CPU overhead now.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18829>
Any unused split_vector definition can always use the same register
as the operand. This avoids creating unnecessary copies.
Fossil DB stats on Rembrandt (RDNA2):
Totals from 3904 (2.89% of 134906) affected shaders:
CodeSize: 18326692 -> 18271688 (-0.30%)
Instrs: 3386632 -> 3372888 (-0.41%)
Latency: 42337481 -> 42330085 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6566731 -> 6566424 (-0.00%); split: -0.01%, +0.00%
Copies: 224301 -> 210559 (-6.13%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
Eliminate unnecessary copies when the operand registers of a
p_split_vector instruction are not clobbered between the p_split_vector
and the user of its definitions.
This happens when p_split_vector doesn't kill its operand and its
definitions have a shorter lifespan that the operand. It affects every
NGG culling shader among other things.
This optimization exists because it's too difficult to solve it
in RA, and should be removed after we solved this in RA.
v2 by Daniel Schürmann:
- Rearrange and simplify conditions for the new optimization
- Fix a few bugs
v3 by Daniel Schürmann:
- Check number of encoded ALU operands
Fossil DB stats on Rembrandt (RDNA2):
Totals from 64896 (48.10% of 134906) affected shaders:
CodeSize: 175693348 -> 175434944 (-0.15%)
Instrs: 33333912 -> 33269388 (-0.19%)
Latency: 183766084 -> 183763432 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 28589651 -> 28589340 (-0.00%); split: -0.00%, +0.00%
Copies: 2806550 -> 2742038 (-2.30%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
This allows is_overwritten_since to return false when the last
writer instruction of a register can't be tracked but we know it wasn't
written in the current block.
Fossil DB stats on Rembrandt (RDNA2):
Totals from 1163 (0.86% of 134906) affected shaders:
CodeSize: 9815920 -> 9805016 (-0.11%)
Instrs: 1843688 -> 1840962 (-0.15%)
Latency: 19219153 -> 19209171 (-0.05%)
InvThroughput: 3354375 -> 3353852 (-0.02%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
This works because of SSA and should be safer than just setting 'not_written_yet'.
No Fossil DB changes on Rembrandt (RDNA2).
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
Without this change a fieldname like '{DST::align=12}' was not used for
encoding. Change the regex to include such fieldnames and remove the fieldname
property in a later step.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20462>
This was causing excessive flushes, because requesting an out-fence fd
triggers the drm layer to flush deferred submits instead of continued
merging.
Fixes: 48b5164356 ("freedreno/drm: Return fence from submit flush")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20465>
There is only one special case (a3xx/a4xx queries) were a pipe_resource
is allocated without a backing buffer (because we don't know the needed
size until we know the # of bins). But those will never end up as an
a6xx render target!
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20465>
It was not meant to be used(Iris have assert for it) and it was
removed from Pipe_Control instruction in gen12.5 and newer.
BSpec: 47112
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20444>
To avoid the replication of code to properly emit PIPELINE_SELECT.
init_compute_queue_state() had a different emit of PIPELINE_SELECT but
as there is no compute engine in GFX VER 11 we are safe with the
differences.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20444>
This change copies the string once.
Direct leak of 196 byte(s) in 14 object(s) allocated from:
#0 0x7f71598ec7a7 in strdup (/usr/lib64/libasan.so.6+0x5c7a7)
#1 0x7f70a56ff942 in driParseOptionInfo ../src/util/xmlconfig.c:357
#2 0x7f70a56f0565 in pipe_loader_load_options ../src/gallium/auxiliary/pipe-loader/pipe_loader.c:126
#3 0x7f70a56f0565 in pipe_loader_create_screen_vk ../src/gallium/auxiliary/pipe-loader/pipe_loader.c:167
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20026>
See 0afd691f29 ("panfrost: clang-format the tree") for why I'm doing this.
Asahi already mostly follows Mesa style so this doesn't do much. But this means
we can all stop thinking about formatting and trust the robot poets to do that
for us.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20434>
There are no outstanding commits to these files in any branch, so they don't
need to be considered for the rebasing script. That said, they are massive and
bottleneck the rebasing script, so we'll want to split them out to keep rebasing
efficient.
(Nominally I should make the rebasing script less stupid but with these files
ignored it works pretty well.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20434>