descriptor_stride includes multiple plane size, this new field tracks
just the data of one plane.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
To make BTI indexing simpler with ycbcr samplers, stop doing packing
calculations in the apply_layout. We'll insert NULL bindings for the
few ycbcr cases where it's needed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
When in direct descriptor mode, the descriptor pool buffers will hold
surface states directly. We won't allocate surface states in image &
buffer views.
Instead views will hold a packed RENDER_SURFACE_STATE ready to copied
into the descriptor buffers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Now that descriptor sets are located a in a 1Gb area, we can avoid
storing the whole address to the descriptor and add the base address
of the area to a 32bit offset.
Replay a bunch of fossils with this and changes not really significant
one way or another :
Totals:
Instrs: 9278246 -> 9277148 (-0.01%); split: -0.01%, +0.00%
Cycles: 3547598421 -> 3547579435 (-0.00%); split: -0.00%, +0.00%
Totals from 353 (1.14% of 31021) affected shaders:
Instrs: 581546 -> 580448 (-0.19%); split: -0.23%, +0.04%
Cycles: 25885422 -> 25866436 (-0.07%); split: -0.31%, +0.24%
No difference on send messages or spills/fills.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
We'll use the fact that the pool is aligned to 4Gb to limit the amount
of address computations to build the address in the shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
This is the default for now. It needs to be part the pipeline hashing
as we will allow this to be tweaked per application.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
We bump the max surfaces to ~16 million instead of ~1 million on
Gfx9-12. We could do more but that'll come later.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
At the beginning of the buffer is located the driver identifier for
error states.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
This is a leftover from a previous fix attempt. We don't need this.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
global load/store (or A64 messages) need the NIR bound checking which
is enabled by "robust" behavior even when robust behavior is disabled.
Many thanks to Christopher Snowhill for pointing out the pushed
constant related issue with the initial version of this patch.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
This helps a lot with accessing surface handles in control flow. Our
resource_intel intrinsic has a non_uniform flag, in which case we
cannot apply this optimization. But in uniform cases, this is just a
massive win. We drop all kind of pipeline stalls due to
find_live_channel. We also reduce register pressure by doing the
surface handle computation in a single GRF (instead of 2 or 4).
There are some regressions in max dispatch width but those I think are
only on SIMD32 and due to the current heuristic disabling it after
throughput comparison with SIMD16. We know this heuristic is not
perfect, it should probably be updated in another change.
Here are some stats (all titles seem to have similar gains) :
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
red_dead_redemption2 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93%
---------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 4716 -37.29% -5.67% +0.95% +0.07% -81.26% -79.16% -70.62% -9.15% -8.47%
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93%
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
rise_of_the_tomb_raider_g2 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 11732 -37.27% -22.14% +0.01% +0.00% -99.01% -99.14% -98.65% -7.67% -5.11%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96%
PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
total_war_warhammer2 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62%
-----------------------------------------------------------------------------------------------------------------------------------
All affected 335 -28.31% -12.77% -82.35% -88.46% -66.67% -6.25% -7.24%
-----------------------------------------------------------------------------------------------------------------------------------
Total 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62%
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
witcher_3_dxvk_g2 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00%
------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 693 -41.93% -58.45% +0.09% +0.01% -98.52% -97.29% -98.10% -10.25% -1.33%
------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00%
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Using the information coming from surface_index_intel, we can tell
whether we should use the BTI or bindless heap for a particular SSBO
access.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Non uniform lower can insert read_first_invocation on the result of
resource_intel. We want to keep that intrinsic directly in front of
the user (load_ubo/load_ssbo/load_image/etc...)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
The resouce_intel intrinsic doesn't not result in an actual
instruction, it's just a wrapper around another value, usually a
load_const.
Allowing this intrinsic to be moved anywhere means it's going to be
closer to the value it wraps, enabling opt_gcm to move a load_ubo
using this resource_intel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Intel HW has multiple ways to access resources like UBO/SSBO/images :
- binding tables : a small ~240 heap of surfaces
- bindless surfaces : a 64Mb heap of surfaces up to Gfx12+, 4Gb on Gfx12.5+
- surfaces : a 4Gb heap on Gfx12.5+ (mostly unused at the moment,
only available through the LSC)
For samplers, we have 2 options since Gfx11+ :
- samplers indexed from the Dynamic State Heap (4Gb)
- samplers indexed from the Bindless Sampler Heap (4Gb)
Additionally our whole push constant promotion mechanism is based
around binding table indices. This is problematic if you want to also
promote to push constants things that would be accessed through the
bindless heap.
To solve this issue, we introduce a new intrinsic that will cary a
block index that is not based off the binding table index nor the
bindless table offset.
We will also use this intrinsic to identify whether the buffer/surface
index in load_ubo/load_ssbo/store_ssbo/etc... is relative to the
binding table or the bindless heap.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Some instruction we would like to keep around because they carry
additional information in their indices.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Due to how the decoders work, they will write garbage data into
the padding, and later using the image for sampling with linear
images will use the garbage to create broken results. Let the
user specify the image size and align it up in the driver, so
sampling of the image later has the correct w/h.
cc: mesa-stable
Reviewed-by: Lynne <dev@lynne.ee>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23227>