v2: Add some comments explaining some of the nuance of the shift
optimizations. Fix a bug in the shift count calculation of the upper
32-bits. Move the @64 from the variable to the opcode. All suggested
by Jordan.
No shader-db changes on any Intel platform.
fossil-db:
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154507026 -> 154506576 (-0.00%)
Cycle count: 17436298868 -> 17436295016 (-0.00%)
Max live registers: 32635309 -> 32635297 (-0.00%)
Totals from 42 (0.01% of 632575) affected shaders:
Instrs: 5616 -> 5166 (-8.01%)
Cycle count: 133680 -> 129828 (-2.88%)
Max live registers: 1158 -> 1146 (-1.04%)
No fossil-db changes on any other Intel platform.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
I noticed some shaders with patterns similar to these while working on
cooperative matrix lowering.
Meteor Lake and DG2 are the only platforms that support iadd3, so there
were no shader-db or fossil-db changes on any other platforms.
shader-db:
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19869445 -> 19868343 (<.01%)
instructions in affected programs: 419426 -> 418324 (-0.26%)
helped: 913 / HURT: 2
total cycles in shared programs: 936010029 -> 935909811 (-0.01%)
cycles in affected programs: 31746523 -> 31646305 (-0.32%)
helped: 495 / HURT: 356
LOST: 10
GAINED: 12
fossil-db:
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04%
Spill count: 146887 -> 146886 (-0.00%)
Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00%
Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5550128 -> 5550368 (+0.00%)
Totals from 4401 (0.70% of 632560) affected shaders:
Instrs: 3095239 -> 3086109 (-0.29%); split: -0.30%, +0.00%
Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10%
Spill count: 28105 -> 28104 (-0.00%)
Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02%
Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22%
Max dispatch width: 43768 -> 44008 (+0.55%)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
This allows us to not generate 64-bit iadd3 on Intel but continue
generating it for NVIDIA.
No shader-db or fossil-db changes.
v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit
iadd3 on NVIDIA platforms.
v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of
the optimizations from ever occuring.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
It doesn't make sense to have two sets of opcodes for this when all backends
that support the flush_to_zero variant just rely on the global floating point
mode anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>
For drivers that don't lower mediump shader inputs / outputs
to 16-bit, it's better to ignore the mediump flag completely,
letting mediump inputs / outputs work like normal 32-bit IO.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29435>
qsort is not guaranteed to produce a stable sort, and indeed
in MSVC CRT does not. The xfb varying sort functions were
relying on undefined behavior (that qsort would be stable).
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29178>
In the sort functions used to sort varyings in gl_nir_link_varyings,
we were only checking the first input for whether or not it is xfb.
Check both inputs, and also provide a definite order for the xfb vs.
non-xfb varyings (the xfb come last, as the initial sort established).
This fixes a problem encountered on panfrost, where qsort could
mix xfb and non-xfb varyings which started out separate.
Note that the sort is still not stable. We probably should make it
stable, but that is a more extensive change that's handled in a later
commit.
Cc: mesa-stable
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29178>
Dumps the value associated with each SPIR-V ID after parsing the module.
This will show the intermediate vtn_* values that spirv_to_nir uses.
Only a subset of detailed information is printed at the moment (focus on
pointers and pointer types), but it is easy to add for other value types
later.
Example output when running crucible with the debug option enabled.
```
crucible: start : func.compute.num-workgroups.basic.q0
=== SPIR-V values
1 = extension
2 = type void glsl_type=void
3 = type function
4 = function
5 = block
6 = type scalar glsl_type=uint
7 = type vector glsl_type=uvec3
8 = type array glsl_type=uvec3[]
9 = type struct glsl_type=Storage
10 = type pointer deref=9 SpvStorageClassUniform glsl_type=uvec4
11 = pointer ptr_type=10 (pointed-)type=9
12 = type scalar glsl_type=int
13 = constant type=12
14 = type pointer deref=7 SpvStorageClassInput glsl_type=uint
15 = pointer ptr_type=14 (pointed-)type=7
16 = constant type=6
17 = type pointer deref=6 SpvStorageClassInput glsl_type=uint
18 = pointer ptr_type=17 (pointed-)type=6
NIR: 32 %2 = deref_array &(*%0)[0] (system uint) // &gl_LocalInvocationID[0]
19 = ssa glsl_type=uint
20 = pointer ptr_type=14 (pointed-)type=7
21 = ssa glsl_type=uvec3
22 = type pointer deref=7 SpvStorageClassUniform glsl_type=uint
23 = pointer ptr_type=22 (pointed-)type=7
NIR: 32x4 %12 = deref_array &(*%11)[%4] (ssbo uvec3) // &((Storage *)%9)->uv3a[%4]
24 = constant type=6
25 = constant type=6
26 = constant type=7
===
crucible: pass : func.compute.num-workgroups.basic.q0
```
When the environment variable is set, this dump will also be printed
during vtn_fail and its helpers.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29295>
Promoting flat inputs should only happen while assigning FS input
slot groups. Otherwise we risk adding extra input slots, which
is undesireable.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29208>
New pass that shares code with nir_opt_load_store_vectorize but
it only updates the alignment of load/store instructions.
It is useful before running other passes which may
potentially destroy that information (eg. by removing some
instructions from which the alignment may be deduced).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29210>
We'll use the delta for an upcoming internal printf mechanism, where
the PARAM_IDX will be the base printf reloc identifier and the BASE
will be the string id.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
Uses the same memory layout as the print intrinsic lowering. This one
just let's you do the emission without having to deal with variables.
This useful for debug traces.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
This will allow a driver to use a single table of printf strings
across all shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
It can't be reordered globally, since its value is control-flow dependent.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
There seems to be a hardware issue where fragment shaders with side effects get
skipped if depth testing with NEVER. Add a workaround for this case where we
discard programmatically instead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
AGX has limited border colour hardware. To support full
customBorderColorWithoutFormat semantics, we're forced to emulate in shaders at
a substantial performance penalty. Actually, that's needed just to pass CTS
because of other hardware issues stacking on top of each others... Hooray!
Add the texops we need to facilitate efficient custom border colour lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
needed for correct flatshading with fans, without falling back on software input
assembly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
As this is intended to be used also by VC4, change the suffix to
something more convenient, like tlb_color_brcm.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29119>
vk/wsi: Remove unused struct 'wsi_headless_format'
'wsi_headless_format' appears unused, and seems
to have been since initial commit.
radv: Remove unused struct 'blit_region'
'blit_region' appears unused, I think since initial commit.
r600: Remove unused structs
'eg_interp' and 'r600_shader_src' are unused.
I think they are just leftovers from the cleanup
in 20e6c31ba6.
i915: Remove unused struct 'i915_tracked_hw_state'
'i915_tracked_hw_state' appears unused. I think it's just
a leftover from 179cb58795.
llvmpipe: Remove unused struct 'linear_interp'
'linear_interp' doesn't ever seem to have been used.
radeonsi: Remove unused struct 'texture_orig_info'
'texture_orig_info' seems unused, I think since 46b2b3bda8.
svga: Remove unused struct 'svga_3d_invalidate_gb_image'
'svga_3d_invalidate_gb_image' appears unused since 1942c06f9c.
Remove it.
nir: Remove unused struct 'split_struct_state'
'split_struct_state' looks unused since the original commit.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29105>
we can generate 32-bit scalar inverse_ballots from the boolean reduce lowering
which will blow up when trying to lower the resulting inverse_ballot with the
common lowering. but the assert can be quieted just fine.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28993>
this is the lowering from NAK, fixed up for common code. the existing code is
used for boolean scan/reduce. I make no guarantee that this works for subgroup
sizes other than 32.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28993>
this will be useful for AGX, which has many reductions (but not all) in
hardware with the logic too backend-specific to encode with bitflags.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28993>
v2: Save and restore fp_fast_math. Suggested by Georg and Ivan.
v3: Add a message to the static_assert.
Fixes: 750bd9757e ("spirv: gather some float controls bits per instruction")
Reviewed-by: Ivan Briano <ivan.briano@intel.com> [v2]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29091>
Replace shader_info::source_sha1 with shader_info::source_blake3 in compiler, mesa and radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28156>
Also, re-organize a bit to match the spec better. There are now
capabilities which need to be set to constant true which we didn't have
to se in the old caps struct and this makes it all more obvious.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28905>