These got split out from gl4[3456]-master a while back, so we were missing
coverage in CI.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17876>
uadd_carry returns 1 or 0, so ANDing with 1 is unnecessary. Probably
this was implemented thinking that it was returning a boolean value.
shader-db results for V3D:
total instructions in shared programs: 12463571 -> 12462964 (<.01%)
instructions in affected programs: 28994 -> 28387 (-2.09%)
helped: 110
HURT: 1
total uniforms in shared programs: 3704591 -> 3704588 (<.01%)
uniforms in affected programs: 247 -> 244 (-1.21%)
helped: 3
HURT: 0
total max-temps in shared programs: 2148138 -> 2148117 (<.01%)
max-temps in affected programs: 729 -> 708 (-2.88%)
helped: 23
HURT: 2
total sfu-stalls in shared programs: 21230 -> 21232 (<.01%)
sfu-stalls in affected programs: 0 -> 2
helped: 0
HURT: 2
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17903>
cull_small_primitive will now allow the caller to pass an
SSA def that it will use to determine if the primitive was
initially rejected.
This allows ACO to remove an s_branch instruction from every
NGG culling shader.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160086644 -> 159355824 (-0.46%); split: -0.46%, +0.00%
Instrs: 30477916 -> 30356092 (-0.40%); split: -0.40%, +0.00%
Latency: 139587915 -> 139611487 (+0.02%); split: -0.00%, +0.02%
InvThroughput: 21184261 -> 21184346 (+0.00%)
Copies: 2762930 -> 2702024 (-2.20%); split: -2.20%, +0.00%
Branches: 1236970 -> 1176052 (-4.92%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17919>
OpenGL API calls like glClearBufferData() result in mapping/unmapping
of a given buffer by Mesa and unmapping of a host blob fails in
virglrenderer because VirGL driver uses command that is intended for
unmapping of a guest buffer. In particular this causes problem for the
"Total War: Warhammer" game that gets GL_OUT_OF_MEMORY error due to the
failed unmapping command. Fix this by setting the mapping usage flag in
accordance to the resource flags, allowing virgl_buffer_transfer_unmap()
to differentiate host buffer from guest.
Fixes: 3b54e5837a ("virgl: support PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17914>
Now they all return whether the primitive was rejected.
No Fossil DB changes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
For primitives which are rejected based on only W and face, this
will reduce the number of executed branches.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160330564 -> 160086644 (-0.15%)
Instrs: 30477385 -> 30477916 (+0.00%); split: -0.00%, +0.00%
Latency: 139802763 -> 139587915 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 21198444 -> 21184261 (-0.07%); split: -0.07%, +0.00%
SClause: 749811 -> 749810 (-0.00%)
Copies: 2701482 -> 2762930 (+2.27%); split: -0.00%, +2.28%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
The previous code checked all_w_positive in the if condition.
Instead, always execute the bbox culling code and include
all_w_positive at the end.
We assume checking in the if is not beneficial because it's
very unlikely that there is no primitive in a wave whose W are
not all positive.
This allows moving other things to the condition
in the next commit.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160574204 -> 160330564 (-0.15%); split: -0.15%, +0.00%
Instrs: 30538297 -> 30477385 (-0.20%); split: -0.20%, +0.00%
Latency: 139810902 -> 139802763 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 21198449 -> 21198444 (-0.00%); split: -0.00%, +0.00%
SClause: 749810 -> 749811 (+0.00%)
Copies: 2701474 -> 2701482 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
This extension introduces a new layout which allows applications
to both render and sample from the same image inside the same draw
(aka. feedback loops).
Previously, the GENERAL layout was used and this introduced some
rendering artifacts because the hw can't read&write DCC/HTILE for
the same image, and we try to keep it compressed on GFX10+.
This helps fixing corruption with D3D9 and RPCS3 games which
are candidate for feedback loops.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4411
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17883>
Fix typo in calculation of position of start of a row of tiles. This
could otherwise cause an out-of-bounds access in the next patch.
Fixes: 81d85be9a5 freedreno/gmem: Reverse order of alternative tile rows
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17888>
The old version worked fine for SIMD16 instructions but SIMD8
instructions where the destination spans two registers have the same
problem.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17908>
That way we can look at the SBT entries for debug purposes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17908>
This is known to produce bogus results for certain combinations of
operands, so don't use it. See this issue for details:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6555
With this change, the idiv lowering will produce mul_high instructions,
so we need to instruct the compiler to lower those with the ALU lowering
right after the idiv lowering by adding the lower_mul_high option (we
only need to add this to V3D, since V3DV already had it set). This will
cause injection of uadd_carry instructions, for which we have backend
implementations that produce better code for us than the NIR lowering.
total instructions in shared programs: 12457692 -> 12463625 (0.05%)
instructions in affected programs: 23115 -> 29048 (25.67%)
helped: 0
HURT: 111
total threads in shared programs: 416372 -> 416368 (<.01%)
threads in affected programs: 8 -> 4 (-50.00%)
helped: 0
HURT: 2
total uniforms in shared programs: 3704067 -> 3704589 (0.01%)
uniforms in affected programs: 5804 -> 6326 (8.99%)
helped: 2
HURT: 109
total max-temps in shared programs: 2147845 -> 2148088 (0.01%)
max-temps in affected programs: 2456 -> 2699 (9.89%)
helped: 6
HURT: 91
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17871>
Just got into a midnight discussion with a hw guy.
TC_ACTION_ENA apparently doesn't invalidate L1, so don't clear
the INV_VCACHE flag.
Fixes: 4056e953fe - radeonsi: move emit_cache_flush functions into si_gfx_cs.c
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17902>
Also while switching to GPGPU pipeline, make sure to flush the untyped
dataport cache. HDC pipeline flush bit must be set if we are flushing
untyped dataport L1 data cache.
v2: Add utrace support (Lionel)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>
v2: Drop ANV_PIPE_UNTYPED_DATAPORT_CACHE_FLUSH_BIT from invalidate bits (Lionel)
Add utrace support
Expand on comment about PIPE_CONTROL::UntypedDataPortCache
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>
Set write back L1 cache policy in STATE_BASE_ADDRESS instruction for A64
messages.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>
Set write back L1 cache policy in STATE_BASE_ADDRESS instruction for A64
messages.
v2: Also set the value in genX_state.c (Lionel)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>
For a RW L1 cache, both reads and writes are cached in the L1, at high
priority (MRU position). For a RO L1 cache, reads are cached at higher
priority and writes bypass the cache.
v1: (Ken)
- Set caching policy for buffer surfaces too
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>