Only found 3 shaders affected in Red Dead Redemption :
Totals from 3 (0.05% of 5969) affected shaders:
Instrs: 2246 -> 2230 (-0.71%)
Cycles: 156506 -> 148402 (-5.18%); split: -5.23%, +0.05%
This will have a larger effect when we add the
load_ubo_uniform_block_intel intrinsic where we will have larger
blocks (vec8/vec16 vs vec4 only now).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
It is now set by all relevant drivers and not checked anywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Delete unnecessary virtual functions, we need just two. Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.
Rename the virtuals so overrides don't hide the common convenience
functions. Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457>
Via Coccinelle patches
@@
expression a, b, c;
@@
-nir_channels(b, a, (1 << c) - 1)
+nir_trim_vector(b, a, c)
@@
expression a, b, c;
@@
-nir_channels(b, a, BITFIELD_MASK(c))
+nir_trim_vector(b, a, c)
@@
expression a, b;
@@
-nir_channels(b, a, 3)
+nir_trim_vector(b, a, 2)
@@
expression a, b;
@@
-nir_channels(b, a, 7)
+nir_trim_vector(b, a, 3)
Plus a fixup for pointless trimming an immediate in RADV and radeonsi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
In ad9bc1ffb5 ("intel/fs: enable UBO accesses through bindless heap")
we added a new source, we need to fixup the source index for the
generator.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ad9bc1ffb5 ("intel/fs: enable UBO accesses through bindless heap")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23405>
This required updating the expected results in a number of test. The
vast majority of these are cases where Gfx12.5 platforms don't allow
mixing F and HF sources.
In all honesty... I just updated the half_float_conversion expected
results until the test passed.
The next commit will add changes specific to Gfx12.5.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
This is what other tests do. The next commit will add a third set of
possible results (for Gfx12.5+), and the multiple macro method does not
scale.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.
v3: Remove redundant patterns. Noticed by Matt.
shader-db:
DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15
total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32
Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.
All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.
fossil-db:
DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210
Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922
Lost: 90
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
Sources that are already W, UW, or HF can be represented as those types
by definition. Pass them through. Previously an HF source on a MAD would
have been marked as !can_promote. I'm pretty sure this means it would
get moved out to a register, but I did not verify this.
For ADD3, a constant source could be D or UD. In this case, the value
must be tested to determine whether it can be represented as W or
UW. The patterns in opt_algebraic won't generate an ADD3 with constant
source, so this problem cannot occur yet.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
This only impacts ADD3, so at this point it should not have any
affect. As soon as constants are propagated into ADD3 instructions, it
will be a problem.
The worst part is, the ADD3 instrutions that are broken by the old code
aren't even "progress" on this pass.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
This is a prepare step to remove depends on p_defines.h in src/util/*
This is done by:
replace pipe_prim_type with mesa_prim
replace shader_prim with mesa_prim
replace PIPE_PRIM_MAX with MESA_PRIM_COUNT
replace SHADER_PRIM_ with MESA_PRIM_
replace PIPE_PRIM_ with MESA_PRIM_
This patch only replace code only
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23369>
In commit 284f0c9a57 I refactored the
handling of the data source to just call a helper rather than special
casing opcodes with 0 or 2 sources. Unfortunately, I also dropped the
"else return 1", creating a fallthrough for all sources other than
SURFACE_LOGICAL_SRC_ADDRESS and SURFACE_LOGICAL_SRC_DATA.
The case below happened to return the correct value for all cases except
SURFACE_LOGICAL_SRC_SURFACE, which has been returning 2 instead of 1
since that commit.
Restore the else case. Thanks to Marcin Ślusarz for catching this.
Fixes: 284f0c9a57 ("intel/compiler: Add an lsc_op_num_data_values() helper")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23347>
I found those issues while testing DOOM eternal and Ian also ran into
it with other shaders.
We write the desc register in SIMD1 exec_all, so all the data is in
the first component. We need to make sure to pass that component in
the lower SEND instructions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23354>
CPS is dynamically turned off when per sample interpolation is active.
Update the asserts to reflect this.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5644011f06 ("intel/compiler: Convert wm_prog_key::persample_interp to a tri-state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23103>
Now that descriptor sets are located a in a 1Gb area, we can avoid
storing the whole address to the descriptor and add the base address
of the area to a 32bit offset.
Replay a bunch of fossils with this and changes not really significant
one way or another :
Totals:
Instrs: 9278246 -> 9277148 (-0.01%); split: -0.01%, +0.00%
Cycles: 3547598421 -> 3547579435 (-0.00%); split: -0.00%, +0.00%
Totals from 353 (1.14% of 31021) affected shaders:
Instrs: 581546 -> 580448 (-0.19%); split: -0.23%, +0.04%
Cycles: 25885422 -> 25866436 (-0.07%); split: -0.31%, +0.24%
No difference on send messages or spills/fills.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
This helps a lot with accessing surface handles in control flow. Our
resource_intel intrinsic has a non_uniform flag, in which case we
cannot apply this optimization. But in uniform cases, this is just a
massive win. We drop all kind of pipeline stalls due to
find_live_channel. We also reduce register pressure by doing the
surface handle computation in a single GRF (instead of 2 or 4).
There are some regressions in max dispatch width but those I think are
only on SIMD32 and due to the current heuristic disabling it after
throughput comparison with SIMD16. We know this heuristic is not
perfect, it should probably be updated in another change.
Here are some stats (all titles seem to have similar gains) :
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
red_dead_redemption2 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93%
---------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 4716 -37.29% -5.67% +0.95% +0.07% -81.26% -79.16% -70.62% -9.15% -8.47%
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93%
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
rise_of_the_tomb_raider_g2 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 11732 -37.27% -22.14% +0.01% +0.00% -99.01% -99.14% -98.65% -7.67% -5.11%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96%
PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
total_war_warhammer2 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62%
-----------------------------------------------------------------------------------------------------------------------------------
All affected 335 -28.31% -12.77% -82.35% -88.46% -66.67% -6.25% -7.24%
-----------------------------------------------------------------------------------------------------------------------------------
Total 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62%
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
witcher_3_dxvk_g2 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00%
------------------------------------------------------------------------------------------------------------------------------------------------------------
All affected 693 -41.93% -58.45% +0.09% +0.01% -98.52% -97.29% -98.10% -10.25% -1.33%
------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00%
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Using the information coming from surface_index_intel, we can tell
whether we should use the BTI or bindless heap for a particular SSBO
access.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Non uniform lower can insert read_first_invocation on the result of
resource_intel. We want to keep that intrinsic directly in front of
the user (load_ubo/load_ssbo/load_image/etc...)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
This simplifies things a bit. Note that in some cases, the arguments are
swapped, because multiplications are commutative, and nir_fmul_imm only
allows the second operand to be an immediate.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>
We need to do 3 things to accomplish this :
1. make all the register access consider the maximal case when
unknown at compile time
2. move the clamping of load_per_vertex_input prior to lowering
nir_intrinsic_load_patch_vertices_in (in the dynamic cases, the
clamping will use the nir_intrinsic_load_patch_vertices_in to
clamp), meaning clamping using derefs rather than lowered
nir_intrinsic_load_per_vertex_input
3. in the known cases, lower nir_intrinsic_load_patch_vertices_in
in NIR (so that the clamped elements still be vectorized to the
smallest number of URB read messages)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22378>
With Anv/Zink, the piglit test :
arb_shader_storage_buffer_object-max-ssbo-size -auto -fbo fsexceed
is failing validation after copy propagation :
load_payload(8) vgrf15:F, vgrf1+0.12<0>:F, vgrf1+0.0<0>:F, vgrf1+0.4<0>:F, vgrf1+0.8<0>:F, vgrf1+0.12<0>:F
../src/intel/compiler/brw_fs_validate.cpp:191: A <= B failed
A = inst->src[i].offset / REG_SIZE + regs_read(inst, i) = 2
B = alloc.sizes[inst->src[i].nr] = 1
In most cases it works because src[0] would be at offset 0 and so
reading a full reg passes validation, but Anv/Zink started emitting
slightly different code adding an offset maybe the size read 2 GRFs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23126>
We only support 32-bit versions of ufind_msb, find_lsb, and bit_count,
so we need to lower them via nir_lower_int64.
Previously, we were failing to do so on platforms older than Icelake
and let those operations fall through to nir_lower_bit_size, which
used a callback to determine it should lower them for bit_size != 32.
However, that pass only emulates small bit-size operations by promoting
them to supported, larger bit-sizes (i.e. 16-bit using 32-bit). It
doesn't support emulating larger operations (i.e. 64-bit using 32-bit).
So nir_lower_bit_size would just u2u32 the 64-bit source, causing us to
flat ignore half of the bits.
Commit 78a195f252 (intel/compiler: Postpone most int64 lowering to
brw_postprocess_nir) provoked this bug on Icelake and later as well,
by moving the nir_lower_int64 handling for ufind_msb until late in
compilation, allowing it to reach nir_lower_bit_size which broke it.
To fix this, we always set int64 lowering for these opcodes, and also
correct the nir_lower_bit_size callback to ignore 64-bit operations.
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23123>
C++17 is the project-wide default since f9057cea51 ("fix(FTBFS):
meson: raise C++ standard to C++17"), so let's drop these local
overrides.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23048>