The r0.5 thread payload register contains Surface State Offset bits
[27:6] as bits [31:10], so we need to shift the register right by 4 in
order to get the surface state offset expected in ExBSO mode, which is
the only extended descriptor encoding supported by the UGM shared
function for SS addressing on Xe2+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
According to PRMs:
"All parameters are of type IEEE_Float, except those in the The ld*,
resinfo, and the offu, offv of the gather4_po[_c] instruction message
types, which are of type signed integer."
Currently, we load parameters with the correct types, but use them as send
sources with the default float type, which may confuse passes downstream.
Fix this by actually storing the retyped sources.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11118
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29581>
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
Gfx11-12 can support SLM block loads via OWord Block Load messages
(notably, the aligned version, not the unaligned version).
A while back we deleted the SHADER_OPCODE_OWORD_BLOCK_READ opcode.
Rather than bring it back, we continue using UNALIGNED_OWORD_BLOCK_READ
for SLM block access (like we do for SSBOs) but switch it over to the
aligned variant when lowering logical sends. We do ensure the alignment
is at least 16B, however. This is ugly, but it's probably not worth
bringing back a whole extra opcode for a legacy HDC block load quirk.
References: BSpec 47652 and 1689
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9960
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29429>
Both of these helpers do the same thing. We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
We already have a logical opcode and lower to what is basically a send
instruction. We just weren't using SHADER_OPCODE_SEND, instead having
extra redundant infrastructure for no real gain.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
Rework:
* Francisco Jerez: Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file")
* Jordan: Move SHADER_OPCODE_DWORD_SCATTERED_*_LOGICAL from previous
patch, as it seems to make more sense here.
* Jordan: Change `devinfo->has_lsc` ?: to if/else as suggested by idr
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
Reworks:
* Francisco: Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file")
* Jordan: Rebase on 952a523abb ("intel: switch over to unified
atomics")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
We cannot rely on the immediate message descriptor having accurate
values for mlen and rlen at the IR level, since they are updated at
codegen time via 'inst->mlen' and 'inst->size_written', which could
end up with values inconsistent with the message descriptor if
e.g. the split sends optimization had an effect. Instead, define
helpers that do the computation without relying on the message
descriptor, and use the pre-existing
brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully
equivalent to the lsc helpers deleted here) during disassembly.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
src/intel/compiler/brw_lower_logical_sends.cpp: In member function ‘bool fs_visitor::lower_logical_sends()’:
src/intel/compiler/brw_lower_logical_sends.cpp:3170:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
3170 | if (devinfo->has_lsc) {
| ^~
src/intel/compiler/brw_lower_logical_sends.cpp:3174:7: note: here
3174 | case SHADER_OPCODE_DWORD_SCATTERED_READ_LOGICAL:
| ^~~~
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
If you look at the sampler message header on Gfx9+, you'll see that we
mostly only use 2 dwords (dw2 & dw3). DW2 has a bunch of sampler
parameters, DW3 is the sampler handle.
On Gfx9 we can micro optimize by copying r0 into the header because
the HW mostly doesn't care about other DWs. We just have to clear dw2
on non VS/FS stages.
On Gfx11+, we always have to do a careful copy of the r0.3 bits to
mask out the bottom unrelated bits. So there, just clearing the entire
header makes more sense.
On Xe2+, the dw4 of the header references the sampler feedback surface
handle and bit0 is a boolean to know whether to use that surface or
not. So it *REALLY* matters to have that as 0. If we copy r0, we'll
get random bits in dw4, leading to enable that surface.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28082>
We first generate the logical opcodes, and these days fully lower to
SHADER_OPCODE_SEND. In the past, we lowered to a non-logical variant
and handled that in the generator. These days, we were just using the
non-logical opcodes as an awkward intermediate opcode change during
the lowering...which isn't really necessary at all.
This patch eliminates them by using the original logical opcodes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
Thanks to Ken for suggesting this URB refactoring change and pointing
out that the LSC can operate on the byte offset granularity.
This should fix the geometry shader test cases where we have more than
32 vertices since previously we were failing to write the correct
control data bits because of incorrect write mask.
Shader-db results for Xe2:
total instructions in shared programs: 153475 -> 153437 (-0.02%)
instructions in affected programs: 1374 -> 1336 (-2.77%)
helped: 11
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.45 x̃: 3
helped stats (rel) min: 1.67% max: 4.92% x̄: 3.23% x̃: 2.70%
95% mean confidence interval for instructions value: -3.92 -2.99
95% mean confidence interval for instructions %-change: -4.10% -2.36%
Instructions are helped.
total loops in shared programs: 140 -> 140 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 16002649 -> 16002329 (<.01%)
cycles in affected programs: 9174 -> 8854 (-3.49%)
helped: 11
HURT: 0
helped stats (abs) min: 22 max: 38 x̄: 29.09 x̃: 32
helped stats (rel) min: 2.62% max: 5.54% x̄: 3.78% x̃: 3.85%
95% mean confidence interval for cycles value: -33.56 -24.62
95% mean confidence interval for cycles %-change: -4.48% -3.08%
Cycles are helped.
total spills in shared programs: 52 -> 52 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 94 -> 94 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
total sends in shared programs: 4240 -> 4240 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 0
GAINED: 0
Rework: (Sagar)
- Adjust offset/indirect offset calculation.
- Add shader-db results
- Always calculate dword index
- Drop changes for indirect writes
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27602>
On Xe2+, we need to pack LOD with array index for cube array surfaces,
with that mlod parameter gets adjusted to different indices based on the
layout.
So track if we are packing LOD with array index in fs_inst and propogate
that to sampler lowering code to adjust param location.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
Since SIMD8 no longer exists, the SIMD modes enums have different names
and different values.
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3: Update brw_disasm.c with SIMD descriptions.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Note: a future commit will expand the sampler message type to the 6 bits
used on Xe2.
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3: Drop XE2_SAMPLER_MESSAGE_SAMPLE_BIAS_MLOD as it does not actually
exist. This resulted in some bigger changes in brw_disasm.c. Noticed
by Sagar.
v4: Now that XE2_SAMPLER_MESSAGE_SAMPLE_MLODc conflicts with
GFX7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C, the determination of
min_lod_is_first must include devinfo->ver or previous platforms will
break.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
This small refactor simplifies a later commit that will optionally emit
some opcodes before the switch (as is already done with the shadow
comparitor).
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3 (Jordan): SHADER_OPCODE_TXL => SHADER_OPCODE_TXL_LZ (was
SHADER_OPCODE_TXF_LZ).
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Per Ian suggestion. Also clear up a few unnecessary casts around the code and
use `s` for fs_visitor ("shader"). Note to include a reference in ntf we need
to set it during initialization, so create an explicit mem_ctx for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
The remaining users can simply create a new builder at_end() if needed.
In many places a new builder object is already being constructed, so
just give more specific instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Once NIR code is lowered and a few optimization passes have run, there
might be flag register interactions between instructions quite far
away from one another.
In the following case :
f0 = and r0, r1
...
fs_interpolate r2, r3
...
if f0
...
endif
If we lower fs_inteporlate while using the f0 register, we completely
garble the value meant for the if block.
To fix this, emit the predication for fs_interpolate in brw_fs_nir.cpp
when doing the NIR translation to the backend IR. This will guarantee
that the flag register interactions are visible to the optimization
passes, avoiding the problem above.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9757
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26306>
We can end up in situation where we are dispatched with a multisample
framebuffer but not at per-sample. In this case we would request the
at_sample value with the wrong message configuration.
Relying on the BRW_WM_MSAA_FLAG_MULTISAMPLE_FBO flag superseeds
BRW_WM_MSAA_FLAG_PERSAMPLE_DISPATCH.
Fixes piglit tests :
spec@arb_gpu_shader5@arb_gpu_shader5-interpolateatsample*
With Zink on Anv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25854>
This is what most logical SEND messages do when they take a variable
number of components. 'inst->mlen' is expected to be zero for logical
SEND opcodes, which are expected to behave like plain arithmetic
operations, so certain automated transformations (like SIMD lowering)
can manipulate them without opcode-specific special-casing.
Guessing the number of components from 'inst->mlen' has other
disadvantages, because it requires duplicating the logic that infers
the message payload size in every use of the instruction -- Instead we
can just do the computation once during logical send lowering. In
addition on LNL platform this causes the 'inst->mlen' field of URB
writes to have units inconsistent with every other SEND instruction,
which is likely to lead to confusion and bugs down the road.
Rework:
* Marcin: update emit_urb_indirect_vec4_write
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>