This introduces enums for SHADER_OPCODE_SEND[_GATHER] sources, similar
similar to what we've done for most of the newer logical opcodes. This
allows us to use actual names for sources rather than remembering their
order, or leaving ourselves comments like /* ex_desc */ all over. It
will also make it easier to add or reorder sources in the future.
While we're at it, we also standardize on the number of sources.
Previously, we allowed SHADER_OPCODE_SEND to have either 3 (monosend) or
4 (split send) sources, but this is mostly for haphazard historical
reasons. We now specify all sources every time, eliminating the need
for careful inst->source checks before accessing the last source.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
Every case but SHADER_OPCODE_SEND and SHADER_OPCODE_BARRIER will be
lowered to SEND before register allocation happens. And the barrier
send has a null destination, so the restriction doesn't apply.
Note that this hack is for Gfx9 only, so we don't need to worry about
Xe3's SHADER_OPCODE_SEND_GATHER feature.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
We used to have other opcodes as well, but we've since transitioned
entirely to logical send lowering prior to register allocation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
NIR is going to use exec_node/list without the C++ code, and may switch to
a different linked list implementation in the future.
GLSL is going to use ir_exec_node/list, which we want to keep private
for GLSL, so that we can change it easily.
Thus, it's better to fork the C++ version of list.h for Intel.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36425>
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
Replace uses of brw_builder::at() with various more descriptive
variants. Use block pointer from instruction when possible.
A couple of special cases remained and will be handled in separate patches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34681>
In debug builds, the assertion should be preferred as it will highlight
the actual problem. In non-debug builds, it is possible to fail register
allocation more gracefully. If the problem only occurs in, for example,
a SIMD32 version of a shader, the application may even continue to
function.
Closes: #13239
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36202>
Broadcast selects one lane from the source to write to all the lanes
of the destination. This makes it possible for the first half to
overwrite the source used by the second half.
No shader-db changes on any Intel platform.
fossil-db:
Lunar Lake
Totals:
Instrs: 208705405 -> 208705374 (-0.00%); split: -0.00%, +0.00%
Cycle count: 31274597098 -> 31273711544 (-0.00%); split: -0.00%, +0.00%
Totals from 77 (0.01% of 707133) affected shaders:
Instrs: 220177 -> 220146 (-0.01%); split: -0.02%, +0.00%
Cycle count: 461694212 -> 460808658 (-0.19%); split: -0.33%, +0.14%
No fossil-db changes on any other Intel platforms.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35903>
Add support for disabling the VRT (Variable Register Thread) feature.
The strategy here is to force the old BRW_MAX_GRF limit for the
register allocator (locks the upper limit) and make sure
ptl_register_blocks() always return that amount of blocks (locks
the lower limit).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35781>
The intention here is to build a SIMD8 value, that will be expanded
as needed -- just like the SHL/ADD case, but with a single instruction.
Found when the was triggering invalid MAD with SIMD32 (that gets compressed)
*and* with overlapping destination and source *and* which would cause
conflict when divided into two SIMD16.
Fixes: 338273dedd ("brw/reg_allocate: Optimize spill offset calculation using integer MAD")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35302>
Gfx12.5 and later allow the use of two 16-bit immediate values in
integer MAD. Gfx11 and Gfx12 allow a single immediate for integer MAD,
but that is not helpful where.
v2: brw_reg_alloc::build_lane_offsets is only used on Gfx12.5+, so the
check around using integer MAD is unnecessary.
No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17119962 -> 17118441 (<.01%)
instructions in affected programs: 65398 -> 63877 (-2.33%)
helped: 32 / HURT: 0
total cycles in shared programs: 895433316 -> 895425578 (<.01%)
cycles in affected programs: 13437376 -> 13429638 (-0.06%)
helped: 30 / HURT: 2
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210052706 -> 209550074 (-0.24%)
Cycle count: 31486266412 -> 31436238696 (-0.16%); split: -0.16%, +0.00%
Totals from 7081 (1.00% of 707082) affected shaders:
Instrs: 16864614 -> 16361982 (-2.98%)
Cycle count: 6323185782 -> 6273158066 (-0.79%); split: -0.79%, +0.00%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
Re-associate the calculation. The current calcuation is
((lane + zero_or_8) << 2) + offset
The first addition is SIMD8, and the shift and second addition are
SIMD16. By switching to
((lane << 2) + offset) + zero_or_32
All operations are SIMD8.
The SHL operates directly on the UW 0x76543210UV value, and that
eliminates the MOV to expand the UW to UD.
v2: Switch to alternate method. Update for SIMD32 on Xe2.
No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17121519 -> 17119962 (<.01%)
instructions in affected programs: 73208 -> 71651 (-2.13%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 129 x̄: 43.25 x̃: 56
helped stats (rel) min: 0.05% max: 4.92% x̄: 2.50% x̃: 2.79%
95% mean confidence interval for instructions value: -56.02 -30.48
95% mean confidence interval for instructions %-change: -3.24% -1.75%
Instructions are helped.
total cycles in shared programs: 895450146 -> 895433316 (<.01%)
cycles in affected programs: 13709400 -> 13692570 (-0.12%)
helped: 31
HURT: 2
helped stats (abs) min: 26 max: 1654 x̄: 543.10 x̃: 672
helped stats (rel) min: <.01% max: 3.43% x̄: 0.43% x̃: 0.51%
HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -652.42 -367.58
95% mean confidence interval for cycles %-change: -0.61% -0.19%
Cycles are helped.
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210566294 -> 210052706 (-0.24%)
Cycle count: 31582309052 -> 31486266412 (-0.30%); split: -0.30%, +0.00%
Totals from 7091 (1.00% of 707082) affected shaders:
Instrs: 17408115 -> 16894527 (-2.95%)
Cycle count: 6443785290 -> 6347742650 (-1.49%); split: -1.49%, +0.00%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
Makes the intention of some comparisons clearer by using the named
helper functions. Add commentary when the straightforward range is not
the one used, e.g. VGRF interference.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
The problem occurs with a series of instructions build the subgroup
invocation value :
mov(8) g23<1>UW 0x76543210V
add(8) g23.8<1>UW g23<8,8,1>UW 0x0008UW
add(16) g23.16<1>UW g23<16,16,1>UW 0x0010UW
Our register spilling code operates on physical registers (64B on
Xe2+) and using the brw_inst::is_partial_write() helper only considers
32B registers. So the spiller doesn't see that the add(16) instruction
is doing a partial write and ends up discarding the previous value.
You can reproduce the issue by running a test like :
INTEL_DEBUG=spill_fs ./deqp-vk -n dEQP-VK.compute.pipeline.cooperative_matrix.khr_a.subgroupscope.constant.uint8_uint8.buffer.rowmajor.linear
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: aa494cbacf ("brw: align spilling offsets to physical register sizes")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33642>
Since brw_inst now has the block it belongs and the block can
reach the shader, the only necessary information to create a
builder is the brw_inst itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
Our name for this enum was brw_message_target, but it's better known as
shared function ID or SFID. Call it brw_sfid to make it easier to find.
Now that brw only supports Gfx9+, we don't particularly care whether
SFIDs were introduced on Gfx4, Gfx6, or Gfx7.5. Also, the LSC SFIDs
were confusingly tagged "GFX12" but aren't available on Gfx12.0; they
were introduced with Alchemist/Meteorlake.
GFX6_SFID_DATAPORT_SAMPLER_CACHE in particular was confusing. It sounds
like the SFID to use for the sampler on Gfx6+, however it has nothing to
do with the sampler at all. BRW_SFID_SAMPLER remains the sampler SFID.
On Haswell, we ran out of messages on the main data cache data port, and
so they introduced two additional ones, for more messages. The modern
Tigerlake PRMs simply call these DP_DC0, DP_DC1, and DP_DC2. I think
the "sampler" name came from some idea about reorganizing messages that
never materialized (instead, the LSC came as a much larger cleanup).
Recently we've adopted the term "HDC" for the legacy data cluster, as
opposed to "LSC" for the modern Load/Store Cache. To make clear which
SFIDs target the legacy HDC dataports, we use BRW_SFID_HDC0/1/2.
We were also citing the G45, Sandybridge, and Ivybridge PRMs for a
compiler that supports none of those platforms. Cite modern docs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33650>
Xe3+ benefits from packing register allocations tightly in order to
make optimal use of the GRF space. The round-robin heuristic
previously in use often causes the whole GRF space to be used even if
register pressure is substantially lower, which would severely
decrease thread-level parallelism on Xe3+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>