We can't detect color attachment without exports when compiling a PS
epilog, so we can't compact MRTs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18514>
This was used to distinguish definitions fixed before and during RA, but
it seems it isn't used anymore.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18547>
Corner cases like implicit gl_PrimitiveID are currently broken and
will be fixed later, but the general case should work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18516>
With GPL, it's possible to build the pre-rasterization stages separately
from the fragment stage. Implicit IO (like gl_PrimitiveID) between the
last pre-rast stage and the FS will be addressed later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18516>
fossils-db (NAVI21):
Totals from 158 (0.12% of 134913) affected shaders:
CodeSize: 569456 -> 568824 (-0.11%)
Only Control seems affected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18615>
Otherwise, there might be redundant barycentric_at_sample intrinsics
that will be lowered and this will increase code size.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18615>
../mesa/src/amd/common/ac_surface.c:2324:48: warning: implicit conversion from enumeration type 'AddrResourceType' (aka 'enum _AddrResourceType') to different enumeration type 'enum gfx9_resource_type' [-Wenum-conversion]
surf->u.gfx9.resource_type = AddrSurfInfoIn.resourceType;
~ ~~~~~~~~~~~~~~~^~~~~~~~~~~~
../mesa/src/amd/common/ac_surface.c:3046:38: warning: implicit conversion from enumeration type 'const enum gfx9_resource_type' to different enumeration type 'AddrResourceType' (aka 'enum _AddrResourceType') [-Wenum-conversion]
input.resourceType = surf->u.gfx9.resource_type;
~ ~~~~~~~~~~~~~^~~~~~~~~~~~~
../mesa/src/amd/common/ac_surface.c:3069:38: warning: implicit conversion from enumeration type 'const enum gfx9_resource_type' to different enumeration type 'AddrResourceType' (aka 'enum _AddrResourceType') [-Wenum-conversion]
input.resourceType = surf->u.gfx9.resource_type;
The enums are compatible so lets just add some casts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18694>
Fix defect reported by Coverity Scan.
Useless call (USELESS_CALL)
side_effect_free: Calling count_tes_user_sgprs(key) is only useful for
its return value, which is ignored.
Fixes: 8253ec3855 ("radv: add shader arguments for dynamic patch control points")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18659>
The current stack size is a significant limiter for occupancy, and
hence we need smaller stacks in LDS.
Rhys earlier had a patch that just put the N entries closest to the
root in LDS and the rest in scratch. However, this is not ideal for
performance as most of the activity is happening away from the root,
near the leaves. Of course we can't just switch it around, as the
leaf activity likely isn't happening all the way at the end of the
stack.
So what we do is make the LDS stack kinda a ringbuffer by always
accessing it using the stack index modulo the buffer size (always
a power of two so we can efficiently mask). If we then do not have
free space in this buffer we evict the entries closest to the root
to scratch and if we hit the "bottom" of the LDS space we load from
scratch.
Some rough perf numbers for indication with Q2RTX:
| evicting | LDS entries | perf |
|----------|-------------|------|
| no | 76 | 55% |
| no | 32 | 100% |
| no | 24 | 105% |
| yes | 32 | 95% |
| yes | 16 | 100% |
| yes | 8 | 90% |
| yes | 4 | 75% |
(For the case with 4 entries we need to do some extra accounting as
a full batch may not be available to evict)
So an obvious choice is to use a stack of 16 entries.
One might wonder if Q2RTX perf is mainly good due to BVHs with very
little geometry and hence low depth, so I also did some profiling
with control. This is done with RGP instruction timing, so this is
instructions executed not weighted for enabled masks, i.e. divergence
effects included.
| game | LDS entries | scratch action | fraction of iterations |
|---------|-------------|----------------|------------------------|
| Control | 8 | store | 10.3% |
| Control | 8 | load | 34.8% |
| Control | 16 | store | 0.58% |
| Control | 16 | load | 2.62% |
| Q2RTX | 16 | store | 1.00% |
| Q2RTX | 16 | load | 3.07% |
So Q2RTX doesn't seem like an unreasonably good case for this
algorithm.
On the implementation side, we can always place the scratch stack at
address 0 by just reserving the scratch space, and in the case of fixed
callstack size moving that up. In the dynamic case the dynamic stack
base already takes any reserved scratch space into account.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18541>
This reverts commit 3750663c72.
Doing things with derefs adds extra instructions for multiplying the
index with the element size, e.g.
BBF0_13:
s_waitcnt vmcnt(0)
v_mov_b32_e32 v27, v55
s_mov_b32 s23, exec_lo
v_cmpx_ne_i32_e32 -1, v27
s_cbranch_execz _L14
BBF0_14:
v_lshlrev_b32_e32 v48, 2, v46 <--
ds_write_b32 v48, v27
v_add_nc_u32_e32 v46, 32, v46
_L14:
s_mov_b32 exec_lo, s23
v_mov_b32_e32 v27, v54
s_mov_b32 s23, exec_lo
v_cmpx_ne_i32_e32 -1, v27
s_cbranch_execz _L15
BBF0_15:
v_lshlrev_b32_e32 v48, 2, v46 <--
ds_write_b32 v48, v27
v_add_nc_u32_e32 v46, 32, v46
On Q2RTC indirect lighting this saves about 2.3 VALU instructions
per loop iteration, which is ~4% of VALU instructions (we're at
58 per iteration now according to RGP).
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18538>
Saves a bunch of processing and a lot of LDS traffic.
Improves perf of the indirect lighting RT pass in Q2RTX by ~3%. This
is mostly due to the -5% VALU instructions and -25% LDS instructions.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18538>
This simplifies the DGC path and removes some untested code. The only user
of the partial DGC implementation (vkd3d-proton) doesn't use
EXT_vertex_input_dynamic_state.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18465>
This is completely broken because the PS epilog has refcount and
radv_upload_shaders() updates its VA.
This reverts commit 7c34b31db2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18632>
Port from radeonsi.
Works on both GFX11 and GFX10. Although GFX10 can do atomic
GDS add on all threads, now we just disable the NGG streamout
for GFX10, so it's OK.
There's a difference for the GFX11 implementation with radeonsi
that we do all 4 buffer/stream info calc on a single thread.
It's just because this is simple, we need to update GDS on a
single thread anyway, and streamout is not that performance
critical to loss a small amount of instruction. We may change
to a better implementation when using register based streamout.
When streamout enabled, ES threads need to save all vertex
attributes to LDS besides position. This is because we don't
know where in the streamout buffer to export the attributes to
and wheter there are space in the streamout buffer.
Streamout is done in primitives, so we need to check if there
is space and where the current primitive should be written to
by GDS atomic add, then in GS threads do the streamout.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17654>
Streamout also need barrier after culling, so move the
prim id barrier up to after culling.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17654>
When compiling only the pre-rast stages in a library, the input
assembly state might not be present and the topology would be 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18519>
If we have a VS that needs a prolog without using the dynamic state,
that means that it comes from a library, so we can overwrite the
cmdbuf VS input state.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18519>
With GPL it's possible to create VS prologs without this dynamic state,
so it seems better to rename.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18519>
This mirrors the change we made for vega10 (6bbe3c6d3) in August...
Seems like the chances of a PASS are indeed slim, but possible.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18590>
When tests are already in the flakes list, it's useless to mark them
as expected failures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18592>
It's invalid to bind NULL pipelines, but make sure to reset it to
its previous NULL state.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18567>
it's more costly to submit individual sparse buffer binds than to
merge them and submit bigger binds, so try to pre-compare and flatten
out the bind array as much as possible to reduce ioctl counts
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18507>
Gets rid of a bit of code and fixes the RRA accel_struct_vas table if
the BO is freed before vkDestroyAccelerationStructureKHR is called.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18530>