Commit graph

13455 commits

Author SHA1 Message Date
Caio Oliveira
390317a99e brw: Fix size in assembler when compacting
Calculation was wrongly walking uncompacted instructions, even if we had
some compacted in the middle, generating invalid size.  Since we are
here just drop the instruction count, since in practice the caller will
have to walk the instruction stream anyway.

Fixes: 6267585778 ("intel/brw: Also return the size of the assembled shader")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33532>
(cherry picked from commit dd1ca1588d)
2025-03-04 20:24:05 +01:00
Hyunjun Ko
0ea91330c3 anv: Do not support the tiling of DRM modifier if DECODE_DST
Fixes: 04709e4f ("anv: fix video profile lists");

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33784>
(cherry picked from commit f7ff9b240d)
2025-03-03 17:25:22 +01:00
Kevin Chuang
f912436dc9 anv/bvh: Fix copy shader handling sparse buffer
Fixes: 692b5fa9f2 ("anv: Add shader to copy acceleration structures")

This commit fixes the future test "sparse_binding_structures" for
"header_bottom_address" for ray tracing pipeline.

Even on 48-bit ray tracing (Xe1/2), the software-defined part
instance_leaf_part1.bvh_ptr has to be in canonical form for copy.comp
to deference a bvh, which means we have to preserve the upper 16bits.
This is especially relevant in cases where the acceleration structure buffer
is located high, such as sparse buffer.

Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33745>
(cherry picked from commit 87ff7b061f)
2025-03-03 17:25:16 +01:00
Kevin Chuang
614dd4999c anv/bvh: Fix encoder handling sparse buffer
Fixes: 2fe57947e3 ("anv: Implement encode shader to fit in ANV BVH")

This commit resolves the failures in the future tests
"sparse_binding_structures" for rayquery. Sparse buffers' heaps are
located high, and since it's in canonical form, the higher 16bits are
all set to 1. However, the existing encoder did not expect any non-zero
values at the higher 16bits. As a result, the instance flags got
corrupted, causing most triangle tests to fail.

Thanks for Paulo providing insights about sparse buffer properties.

Co-developed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33745>
(cherry picked from commit b9a980ea73)
2025-03-03 17:25:14 +01:00
Paulo Zanoni
bac3b56d51 brw: extend the NOP+WHILE workaround
It turns out that we need to add a NOP not only in between two
consecutive WHILE instructions, but also after every control flow
instruction that immediately precedes a WHILE.

v2: Rebase after the renames.

Fixes: 5ca883505e ("brw: add a NOP in between WHILE instructions on LNL")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33021>
(cherry picked from commit fd10764cff)
2025-02-28 22:17:35 +01:00
Karol Herbst
62747d6bdd intel/brw, lp: enable lower_pack_64_4x16
The compiler won't be able to emit pack_64_4x16, so we should prevent
nir_opt_algebraic to optimize to it. This fixes an infinite optimization
loop inside brw_nir_optimize:

nir_copy_prop
    16x4     %77 = @load_global (%80)
    32    %61995 = pack_32_2x16_split %77.x, %77.y
    32    %61998 = pack_32_2x16_split %77.z, %77.w
    64    %61999 = pack_64_2x32_split %61995, %61998
    64       %76 = iadd %100, %79
                   @store_global (%61999, %76)

nir_opt_algebraic
    16x4     %77 = @load_global (%80)
    32    %61995 = pack_32_2x16_split %77.x, %77.y
    32    %61998 = pack_32_2x16_split %77.z, %77.w
    16x4  %62000 = vec4 %77.x, %77.y, %77.z, %77.w
    64    %62001 = pack_64_4x16 %62000
    64       %76 = iadd %100, %79
                   @store_global (%62001, %76)

nir_lower_pack
    16x4     %77 = @load_global (%80)
    16x4  %62000 = vec4 %77.x, %77.y, %77.z, %77.w
    16    %62002 = mov %62000.y
    16    %62003 = mov %62000.x
    32    %62004 = pack_32_2x16_split %62003, %62002
    16    %62005 = mov %62000.w
    16    %62006 = mov %62000.z
    32    %62007 = pack_32_2x16_split %62006, %62005
    64    %62008 = pack_64_2x32_split %62004, %62007
    64       %76 = iadd %100, %79
                   @store_global (%62008, %76)

// brw_nir_optimize loops here

nir_copy_prop
    16x4     %77 = @load_global (%80)
    32    %62004 = pack_32_2x16_split %77.x, %77.y
    32    %62007 = pack_32_2x16_split %77.z, %77.w
    64    %62008 = pack_64_2x32_split %62004, %62007
    64       %76 = iadd %100, %79
                   @store_global (%62008, %76)

llvmpipe has a similar issue inside lp_build_opt_nir

Fixes: b1bc691b0f ("nir/algebraic: add and improve pack/unpack patterns")
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33347>
(cherry picked from commit dad5ee1039)
2025-02-28 22:17:35 +01:00
Lionel Landwerlin
3630721dc8 anv: fix missing 3DSTATE_PS:Kernel0MaximumPolysperThread programming
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 815d2e3e8b ("anv: move 3DSTATE_PS to partial packing")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33712>
(cherry picked from commit 91f36ba5b6)
2025-02-28 22:17:35 +01:00
Dylan Baker
db51d8f8ac iris: fix handling of GL_*_VERTEX_CONVENTION
By actually setting the state packets according to the program data.
Also ensure that we correctly flag that the program may be dirty when
the geometry shader state changes

Fixes piglit tests: `spec@!opengl 3.2@gl-3.2-adj-prims * pv-first`

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33658>
(cherry picked from commit c33ebf09f5)
2025-02-28 22:17:35 +01:00
Paulo Zanoni
d8ffce96d2 brw: increase brw_reg::subnr size to 6 bits
Since Xe2, the registers are bigger and even the instruction
structures got updated to have 6 bits.

The way I detected this issue was when I tried to use
src/intel/executor to add the following instruction:

    add(8)          g6.8<1>UD      g4<8,8,1>UD    0x00000008UD    { align1 WE_all 1Q I@1 };

Executor would read this and end up emitting an add with dst being
g6<1>UD instead of what we wanted. It turns out that inside
brw_gram.y, at dstoperand and dstoperandex we do:

    $$.subnr = $$.subnr * brw_type_size_bytes($4);

which would overflow subnr back to 0.

The overflow doesn't seem to be a problem with code we emit directly
(unlike the code we parse, like above) due to the fact that we seem to
treat Xe2 registers as smaller all the way until we call phys_nr() and
phys_subnr() during code generation. The phys_subnr() function can
generate a value that would overflow reg.subnr, but this value is
never written back to reg.subnr, it's just returned as an unsigned
int.

Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33539>
(cherry picked from commit 927d7b322b)
2025-02-18 22:46:08 +01:00
Tapani Pälli
3194cae6d0 anv: apply cache flushes on pipeline select with gfx20
This fixes rendering artifacts seen with Hogwarts Legacy and Black
Myth Wukong. Assumption is that we can get rid of these flushes once
RESOURCE_BARRIER work lands but until then we need them.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12540
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12489
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33397>
(cherry picked from commit 765f3b78d5)
2025-02-18 22:46:07 +01:00
Tapani Pälli
961a3fc760 anv: tighten condition for changing barrier layouts
Assertion (or attempting the layout change) is causing crash when
launching Steel Rats. Tighten the condition for change so that it should
affect only when runtime has made changes.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12602
Fixes: eed788213b ("anv: ensure consistent layout transitions in render passes")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33523>
(cherry picked from commit d8381415a6)
2025-02-18 22:46:01 +01:00
Lionel Landwerlin
e2232c0be4 anv: ensure Wa_16012775297 interacts correctly with Wa_18020335297
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: dddd765553 ("anv: implement VF_STATISTICS emit for Wa_16012775297")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>
(cherry picked from commit 6b99bf76ca)
2025-02-15 00:02:54 +01:00
Lionel Landwerlin
399de9dd00 anv: disable VF statistics for memcpy
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>
(cherry picked from commit 462d8e3fab)
2025-02-15 00:02:53 +01:00
Ian Romanick
2ea6b340ac brw/copy: Fix handling of offset in extract_imm
The offset is measured in bytes. Some of the code here acted as though
it were measured in src.type units. Also modify the assertion to check
that all extracted bits come from data in the immediate value.

Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Fixes: da395e6985 ("intel/brw: Fix extract_imm for subregion reads of 64-bit immediates")

Yes, I missed this error *twice* in code review.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33049>
(cherry picked from commit ac4b93571c)
2025-02-11 18:05:27 +01:00
Lionel Landwerlin
cb0d551424 brw: fixup scoreboarding for find_live_channels
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32895>
(cherry picked from commit c08b437db7)
2025-02-05 16:08:29 +01:00
Ernst Persson
26ad2f9149 intel/vulkan: Add bvh build dependency
Fixes: 41baeb3810 ("anv: Implement acceleration structure API")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12558
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33333>
(cherry picked from commit c64871accc)
2025-02-04 20:47:26 +01:00
Hyunjun Ko
cd4ffc319f anv: Fix to set CDEF flter flag correctly for AV1 decoding
and relevant tiny clean-up.

Fixes: 8432b8b282 ("anv: add initial support for AV1 decoding")

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33316>
(cherry picked from commit 52d9edbf05)
2025-02-04 20:47:26 +01:00
Caio Oliveira
f18dee3618 intel/brw: Fallback to SEND from SEND_GATHER if possible
After optimization happen, if the sources are still in one or two
contigous spans for some reason (e.g. some data read from memory
now being written), it is beneficial to just use regular SEND
and avoid having to set the ARF scalar instruction.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
b6b32933ad intel/brw: Use SHADER_OPCODE_SEND_GATHER in Xe3
Add an optimization pass to turn regular SENDs into SEND_GATHERs.
This allows the payload to be "broken" into smaller pieces that
can be further optimized, which _may_ result in

- less register pressure (no need to contiguous space), and
- less instructions (no need to MOV to such space).

For debugging, the INTEL_DEBUG=no-send-gather option skips this
optimization, and reporting how many opportunities were missed.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
26d4d04d63 intel/brw: Add lowering for SHADER_OPCODE_SEND_GATHER
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
650ec7169d intel/brw: Add SHADER_OPCODE_SEND_GATHER
Starting in Xe3, there's a variant of SEND that take the
register numbers from the ARF scalar register, and don't
require them to be contiguous.  The new opcode added here
represents that kind of SEND.

To make the original sources still reachable, we keep them
around during the IR, just ignoring them at generator time.
This allow software scoreboard to properly reason the
dependencies without trying to decode the contents of ARF
scalar register being used.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
2fca22347c intel/brw: Plumb through generator whether SEND is gather variant
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
00fac79f99 intel/brw: Add scoreboard support for scalar register
Xe3 adds a new pipe that handles *only* MOVs from immediate into the
scalar register.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:57 +00:00
Caio Oliveira
fbacf3761f intel: Add meson option -Dintel-elk
Defaults to true.  When set to false Iris and various tools can be
built without ELK support.  In both cases this means supporting
only Gfx9+.  This option must be true to build Crocus or Hasvk.

This allows skipping re-building ELK when developing for newer platforms
with tools/tests enabled.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11575
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Caio Oliveira
31e5d909e7 intel/tools: Merge libaub into libintel_tools
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Caio Oliveira
ec2d20a70d intel/tools: Add helpers for decoder_init/disasm
Isolate the BRW/ELK differences in a single place.  The way is done now,
we are not reusing the isa_info between calls.  For the tools here this
is probably fine, if its someday this gets in the way, we can add an
opaque pointer to store the right data.

This intentionally is not used in Iris, since there the driver need more
detailed view into BRW/ELK and we don't want to create an all
encompassing abstraction for that.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Caio Oliveira
aa2bd16dec intel/tools: Use idep_libintel_common in meson
Since the internal dependency object exists and is already
used in some cases, let's be consistent.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Francisco Jerez
d455d5d86c anv/xe3+: Enable VRT.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
dd1712515b anv/xe3+: Set RegistersPerThread for bindless shader dispatch.
v2: Use MOV and wrap in conditional during BTD spawn header setup
    (Lionel).  Remove references to SIMD8 (Tapani).

v3: Update brw_bsr() to specify number of registers per thread, don't
    initialize Registers Per Thread on BTD spawn header (Lionel).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
b25d0f899b anv/xe3+: Set RegistersPerThread during shader state setup based on prog_data.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
7537f8edee intel/blorp/xe3+: Set RegistersPerThread during shader state setup based on prog_data.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
f6a1c51de7 intel/genxml/xe3+: Update definitions for shader state setup.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
fb40b449cd intel/brw: Define ptl_register_blocks() helper.
Since this calculation will be needed in many places to set up the
state of each shader stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
70fecb1483 intel/brw: Report number of GRF registers used in brw_stage_prog_data.
This is similar to what we used to do on pre-SNB platforms, the number
of GRF registers used by the shader will be used on Xe3+ to adjust the
trade-off between thread-level parallelism and size of the GRF file.
Plumb the value through prog_data so the driver can set up the
hardware state accordingly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
6513bf65c3 intel/brw/xe3+: Optimize CS/TASK/MESH compile time optimistically assuming SIMD32.
This is similar in principle to the previous commit "intel/brw/xe3+:
brw_compile_fs() implementation for Xe3+." but applied to compute-like
shader stages.  It changes the implementation of brw_compile_cs/task/mesh()
to reduce compile time and take advantage of wider dispatch modes more
aggressively than the original logic, since as of Xe3 SIMD32 builds
succeed without spills in most cases thanks to VRT.

The new "optimistic" SIMD selection logic starts with the SIMD width
that is potentially highest performance and only compiles additional
narrower variants if that fails (typically due to spilling), while the
old "pessimistic" logic did the opposite: It started with the
narrowest SIMD width and compiled additional variants with increasing
register pressure until one of them failed to compile.

In typical non-spilling cases where we formerly compiled SIMD16 and
SIMD32 variants of the same compute shader, this change will halve the
number of backend compilations required to build it.

XXX - Possibly don't do this in cases with variable workgroup size
      until effect on runtime performance can be measured directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

v2: Don't do this for now in cases with variable workgroup size, still
    compile every possible variant in such cases.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Sagar Ghuge
7e1362e9c0 intel/brw/xe3+: Don't compile SIMD32 if there is ray queries
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
5b6906076e intel/brw/xe3+: brw_compile_fs() implementation for Xe3+.
This reworks the implementation of brw_compile_fs() to reduce compile
time and take advantage of wider dispatch modes more aggressively than
the original logic.

The new "optimistic" PS compilation logic starts with the SIMD width
that is potentially highest performance and only compiles additional
narrower variants if that fails (typically due to spilling or hardware
restrictions), while the old "pessimistic" logic did the opposite: It
started with the narrowest SIMD width and compiled additional variants
with increasing register pressure until one of them failed to compile.

The main disadvantage of this is that selectively throwing away some
of the compiled variants based on the static analysis of their
performance behavior will no longer be possible, however this is
expected to be less useful on Xe3+ since the GRF space allocated to a
thread can be scaled up or down, which leads to less dramatic
differences in scheduling between SIMD variants.

In typical non-spilling cases where we formerly compiled SIMD16 and
SIMD32 variants of the same fragment shader, this change will halve
the number of backend compilations required to build a shader.  With
multi-polygon PS dispatch enabled (which is disabled by default right
now) this has an even more dramatic effect since the number of
compiler iterations can be reduced down to a fifth in the best case
scenario.

Even though in most cases we will only attempt to return a single
binary from the pixel shader compilation, the hardware allows a pair
of PS kernels to be specified, and we'll still take advantage of this
when the multi-polygon PS kernel has the potential to have worse
performance than the single-polygon shader because only the latter
register-allocates successfully at SIMD32 -- Only in such case
(SIMD2x8 multi-polygon, SIMD32 single-polygon) we'll continue
programming both so the hardware will chose one or the other at
runtime depending on the SIMD fullness and number of polygons it can
buffer at runtime.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
1b2bd1fcb8 intel/brw: Exit early from run_fs() if compilation failed before optimization loop.
This avoids running the optimizer uselessly if compilation of the
current kernel failed due to some hardware (e.g. SIMD-width)
restriction.  This isn't only inefficient but it can break assumptions
throughout the compiler which would lead to crashes on Xe3 when this
arises during translation from NIR.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
afff3eb95e intel/brw: Indent conditional block from brw_compile_fs() not applicable to Xe2+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
d7d08ec2e2 intel/brw: Indent body of brw_compile_fs() not applicable to xe3+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
d03eac3133 intel/brw/xe3+: Disable round-robin allocation heuristic on Xe3+.
Xe3+ benefits from packing register allocations tightly in order to
make optimal use of the GRF space.  The round-robin heuristic
previously in use often causes the whole GRF space to be used even if
register pressure is substantially lower, which would severely
decrease thread-level parallelism on Xe3+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
a67ff3e7e3 intel/brw/xe3+: Bump number of SBID tokens for Xe3.
Xe3 supports 32 SBID tokens per thread regardless of the number of
register blocks allocated per thread.  Take advantage of the increased
number of SBIDs in the scoreboard pass to reduce the frequency of
false dependencies on Xe3+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
8d2331fe4b intel/brw/xe3: Extend regalloc sets to maximum Xe3 GRF size.
Extend our regalloc sets to 256 registers to match the maximum
capacity of the GRF file on Gfx30.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
ca1636d457 intel/brw/xe3: Define XE3_MAX_GRF.
Gfx30 supports up to 256 (512b) GRFs which requires a max GRF define
of 512 in REG_SIZE units.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
67cb23a4b1 intel/common/xe2+: Allow SIMD32 PS for all multisample cases.
These don't seem to be disallowed by recent hardware anymore.  Stop
disabling SIMD32 due to hardware restrictions of multisample
rasterization, since it should have better performance, and on Xe3+
there may be no shader variant available other than SIMD32.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
935f60c13c intel/blorp: Specify a subgroup size requirement of 16 for fast clear or repclear shaders.
Request a fixed subgroup size for pixel shaders that require it due to
the hardware restrictions of fast clears and repeated data clears.
This requires plumbing the "is_fast_clear" boolean across several
callers since blorp_compile_fs_brw() currently has no information
regarding whether the kernel is intended for a fast clear operation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
80b2355b39 intel/brw: Allow specifying a required subgroup size for fragment shaders.
On older hardware the "use_rep_send" compile parameter was being
implicitly used to request the compilation of the SIMD16 variant of
clear pixel shaders that require it due to hardware restrictions.

However starting on Gfx12+ this flag is never set since replicated
data clears are no longer supported, but BLORP still implicitly relies
on the SIMD16 variant being generated even though there's no way for
BLORP to explicitly request it.  This doesn't cause much of a problem
right now since brw_compile_fs() typically generates a SIMD16 kernel
unless the SIMD8 kernel spills or SIMD debugging flags are enabled,
but it won't work reliably on Xe3+ since we'll start using SIMD32 more
aggressively.

In order to avoid these issues use the standard required subgroup_size
parameter from shader_info to signal that the SIMD16 variant of the
shader is needed by the caller.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
a736757275 anv/gfx12.5: Request subgroup size 8 for RT trampoline shader.
The 16-wide variant of the trampoline shader doesn't appear to work
and would be inadvertently enabled by this series on Gfx12.5.  Set the
required subgroup size to avoid changing current behavior.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
8102500b95 intel/brw/xe3+: Mask subgroup shuffle index to be within valid range to avoid VRT hangs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
d2af77aa6b intel/brw: Use urb_read_length instead of nr_attribute_slots to calculate VS first_non_payload_grf.
Makes sure the number of registers reserved for the payload matches
the size of the URB read, which prevents the VS shared function from
writing past the end of the register file on Xe3 with VRT enabled.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00