Source: BSpec "Instruction Fields" page (56701), SWSB field.
Credits to Caio Oliveira here, since he was helping me while we found
this issue together.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35395>
This would cause Mesa to print this message even if an Intel GPU is just
being enumerated by a Vulkan application. For example, `vulkaninfo
--summary`.
Fixes: 52f73db5b7 ("brw: implement read without format lowering")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35396>
The intention here is to build a SIMD8 value, that will be expanded
as needed -- just like the SHL/ADD case, but with a single instruction.
Found when the was triggering invalid MAD with SIMD32 (that gets compressed)
*and* with overlapping destination and source *and* which would cause
conflict when divided into two SIMD16.
Fixes: 338273dedd ("brw/reg_allocate: Optimize spill offset calculation using integer MAD")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35302>
Load the format enum and then just go through a series of :
if format == R16G16B16A16_UNORM
color = lower_r32g32_uint_tor_r16g16b16a16_unorm(color)
else if format == R16G16B16A16_SNORM
...
For Gfx12.5, there is no in-shader conversion.
For Gfx12/11, the in-shader conversion covers the following formats :
- ISL_FORMAT_R10G10B10A2_UNORM
- ISL_FORMAT_R10G10B10A2_UINT
- ISL_FORMAT_R11G11B10_FLOAT
For Gfx9, the following formats :
- ISL_FORMAT_R16G16B16A16_UNORM
- ISL_FORMAT_R16G16B16A16_SNORM
- ISL_FORMAT_R10G10B10A2_UNORM
- ISL_FORMAT_R10G10B10A2_UINT
- ISL_FORMAT_R8G8B8A8_UNORM
- ISL_FORMAT_R8G8B8A8_SNORM
- ISL_FORMAT_R16G16_UNORM
- ISL_FORMAT_R16G16_SNORM
- ISL_FORMAT_R11G11B10_FLOAT
- ISL_FORMAT_R8G8_UNORM
- ISL_FORMAT_R8G8_SNORM
- ISL_FORMAT_R16_UNORM
- ISL_FORMAT_R16_SNORM
- ISL_FORMAT_R8_UNORM
- ISL_FORMAT_R8_SNORM
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22524>
0e3e5146cf ("intel/brw: Use correct instruction for value change check
when coalescing") enabled some new cases that exposed a pre-existing
bug that would turn something like this :
mul.sat(16) %789:F, %787:F, %788:F
mov.g.f0.0(16) %790:F, %789:F
(+f0.0) sel(16) %800:UD, %790:UD, 0u
into this :
mul.sat(16) %790:F, %787:F, %788:F
mov.g.f0.0(16) null:F, null<8,8,1>:F
(+f0.0) sel(16) %800:UD, %790:UD, 0u
The mov[] array can contain the same instruction because it's repeated
for each REG_SIZE writes and a SIMD16 instruction will write 2
REG_SIZE.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0e3e5146cf ("intel/brw: Use correct instruction for value change check when coalescing")
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35276>
There's always only the ARF scalar register source, so don't
bother printing other information that won't be used. Matches
the assembler code.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35297>
When we have partial VGRF MOVs with offsets, we will reach
`channels_remaining == 0` with `inst` that is not writing the whole VGRF.
Currently, even though we check `can_coalesce_vars()` for each offset
separately, it will always check if the dst value is not changed only
for the offset from the instruction that satisfied the
`channels_remaining == 0` condition.
Instead, we should remember and use the correct instruction for each
written offset separately.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10916
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35062>
This fprintf() was added in commit cce3bea2a7 ("i965/disasm: Align send
instruction meta-information with dst.")) to align the human-readable
send message info (e.g. "render MsgDesc: RT write ...") with the
destination register on the previous line.
Two months later we disabled printing the instruction offset in commit
662f1ccc24 ("i965: Disable hex offset printing in disassembly."),
thereby unaligning the human-readable send message info for the next 11
years.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35077>
Previously, brw_try_override_assembly was only called when a debug flag was
enabled. However, during investigations involving workloads such as Steam
games, enabling the debug flag results in excessive NIR and ISA output to
stderr, making debugging more difficult.
This change ensures that brw_try_override_assembly is called when the
INTEL_SHADER_ASM_READ_PATH is set, regardless of the debug flag. This
improves usability in scenarios where minimal debug output is desired.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35115>
Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.
So just handle load_push_constants in the backend and stop changing
things in Hasvk.
Fixes a bunch of tests in
dEQP-VK.descriptor_indexing.*
dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.
So just handle load_push_constants in the backend and stop changing
things in Anv.
Fixes a bunch of tests in
dEQP-VK.descriptor_indexing.*
dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
This patch improves code readability by centralizing the type stomping
logic for Gen12.5 region restrictions in `brw_lower_alu_restrictions`.
It removes redundant comments and ensures type consistency assertions
in `brw_broadcast`, `generate_mov_indirect`, and `generate_shuffle`.
Thank you Ken for guiding me on this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35006>
This will be necessary to select the right value for flat inputs in
fragment shaders when fragment shader barycentrics are in use.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
Xe driver will be disabling the HW functionality for null any-hit
shaders, drivers need to take care of it instead. This commit brings
back parts of older workaround (see b0624e414f) we used to have to
handle the null any-hit case.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>
18bbcf9a triggered the following building error in Android,
simple fix is to use ffsll() as it was done before 18bbcf9a
to process uint64_t generics argument.
Fixes the following building error:
FAILED: src/intel/compiler/libintel_compiler.a.p/brw_vue_map.c.o
...
../src/intel/compiler/brw_vue_map.c:120:37: error: implicit declaration of function 'ffsl' is invalid in C99 [-Werror,-Wimplicit-function-declaratio
n]
const int first_generic_output = ffsl(generics) - 1;
^
../src/intel/compiler/brw_vue_map.c:120:37: note: did you mean 'ffs'?
/home/utente/r-x86_kernel/bionic/libc/include/strings.h:72:5: note: 'ffs' declared here
int ffs(int __i) __INTRODUCED_IN_X86(18);
^
1 error generated.
Fixes: 18bbcf9a ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34915>
Gfx12.5 and later allow the use of two 16-bit immediate values in
integer MAD. Gfx11 and Gfx12 allow a single immediate for integer MAD,
but that is not helpful where.
v2: brw_reg_alloc::build_lane_offsets is only used on Gfx12.5+, so the
check around using integer MAD is unnecessary.
No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17119962 -> 17118441 (<.01%)
instructions in affected programs: 65398 -> 63877 (-2.33%)
helped: 32 / HURT: 0
total cycles in shared programs: 895433316 -> 895425578 (<.01%)
cycles in affected programs: 13437376 -> 13429638 (-0.06%)
helped: 30 / HURT: 2
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210052706 -> 209550074 (-0.24%)
Cycle count: 31486266412 -> 31436238696 (-0.16%); split: -0.16%, +0.00%
Totals from 7081 (1.00% of 707082) affected shaders:
Instrs: 16864614 -> 16361982 (-2.98%)
Cycle count: 6323185782 -> 6273158066 (-0.79%); split: -0.79%, +0.00%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
Re-associate the calculation. The current calcuation is
((lane + zero_or_8) << 2) + offset
The first addition is SIMD8, and the shift and second addition are
SIMD16. By switching to
((lane << 2) + offset) + zero_or_32
All operations are SIMD8.
The SHL operates directly on the UW 0x76543210UV value, and that
eliminates the MOV to expand the UW to UD.
v2: Switch to alternate method. Update for SIMD32 on Xe2.
No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17121519 -> 17119962 (<.01%)
instructions in affected programs: 73208 -> 71651 (-2.13%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 129 x̄: 43.25 x̃: 56
helped stats (rel) min: 0.05% max: 4.92% x̄: 2.50% x̃: 2.79%
95% mean confidence interval for instructions value: -56.02 -30.48
95% mean confidence interval for instructions %-change: -3.24% -1.75%
Instructions are helped.
total cycles in shared programs: 895450146 -> 895433316 (<.01%)
cycles in affected programs: 13709400 -> 13692570 (-0.12%)
helped: 31
HURT: 2
helped stats (abs) min: 26 max: 1654 x̄: 543.10 x̃: 672
helped stats (rel) min: <.01% max: 3.43% x̄: 0.43% x̃: 0.51%
HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -652.42 -367.58
95% mean confidence interval for cycles %-change: -0.61% -0.19%
Cycles are helped.
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210566294 -> 210052706 (-0.24%)
Cycle count: 31582309052 -> 31486266412 (-0.30%); split: -0.30%, +0.00%
Totals from 7091 (1.00% of 707082) affected shaders:
Instrs: 17408115 -> 16894527 (-2.95%)
Cycle count: 6443785290 -> 6347742650 (-1.49%); split: -1.49%, +0.00%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
I always thought there was a massive issue with pipeline libraries &
mesh shaders. Indeed recent CTS tests have exposed a number of issues.
Some values delivered to the fragment shader are coming from different
places depending on whether the preceding shader is Mesh or not. For
example PrimitiveID is delivered in the per-primitive block in Mesh
pipelines whereas for other pipelines it's coming as a VUE slot (which
is per-vertex). Those are 2 different locations in the payload.
We have to find a layout for fragment shaders that is compatible with
everything. Leaving gaps here and there in the thread payload.
Fixes the following test pattern :
dEQP-VK.mesh_shader.ext.smoke.fast_lib.shared_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Mesh shaders have per vertex block in URB pretty much identical to the
VUE format. Let's just reuse that concept to do all of our layout in
the payload attribute registers. This will ensure that we have
consistent VUE layout between Mesh & non-Mesh pipelines.
We need a new way of laying out the VUE though as we have to
accomodate a HW constraint of maximum (per-primitive + per-vertex) of
32 varying. This means we cannot have 2 locations in the payload for
things like PrimitiveID which can come from either the per-primitive
or the per-vertex block. The new layout places the PrimitiveID at the
end of the per-vertex attributes and shrinks the delivery dynamically
if the mesh stage is active. The shader is compiled with a
MOV_INDIRECT to read the PrimitiveID from the right location in the
attributes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
In a case like this :
block_0:
%5 = ...
%6 = ...
block_1:
%7 = load_interpolated_input %5, %6
The current logic would move load_interpolated_input to block_0 before
%5 but not move %5 & %6 which are sources of that instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Will allow the driver to know if the vertices count is dynamic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Take the opportunity to reuse the backend pass.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Some intrinsics are implemented by reading memory location that could
be rewritten by a further tracing calls. So we need to move those
reads prior to tracing operations in the shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8979
Tested-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34214>
Or current render target cache setting is to key on the binding table
index, meaning the HW associates a number in the range [0, 7] to a
RENDER_SURFACE_STATE description. If you want change the render target
0 between 2 draw calls, you need to insert a PIPE_CONTROL in between
the 2 draw calls with pb-stall + rt-flush in order to flush an writes
to a previous RENDER_SURFACE_STATE that has now becomed disassociated
with the [0, 7] number.
This PIPE_CONTROL taking care of the flush is dealt with in
cmd_buffer_maybe_flush_rt_writes(). This function diffs the current
BTI setup for render targets (first 0 to 7 BTIs) with what the next
fragment shader wants.
The issue here is we might have a render pass with 0 color attachments
and yet in 98cdb9349a we added one pointing to the render target 0,
but in the emit_binding_table() when we finally program the BTI, we
check the render pass color count and program a null surface state
instead of an actual surface state. And this leads to hangs because
the render target cache will end up with inconsistent state data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 98cdb9349a ("anv: ensure null-rt bit in compiler isn't used when there is ds attachment")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12955
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34603>
This brings back c9e33e5cbf ("intel/fs/cse: Make HALT instruction act
as CSE barrier."), from the old CSE pass into the new one.
Fixes new CTS test: dEQP-VK.subgroups.shader_quad_control.terminated_invocation
Fixes: 9690bd369d ("intel/brw: Delete old local common subexpression elimination pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34643>
For Xe2+, from Bspec 64643, bit field "StackID": The maximum number of
StackIDs can be 2^12- 1.
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34709>
Re-organize the configuration lists to make easier to include BFloat16
only for the Gfx125+ that support it, while keeping MTL supporting the
"lowered" configurations from pre-Gfx125.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>