Commit graph

13451 commits

Author SHA1 Message Date
Paulo Zanoni
bac3b56d51 brw: extend the NOP+WHILE workaround
It turns out that we need to add a NOP not only in between two
consecutive WHILE instructions, but also after every control flow
instruction that immediately precedes a WHILE.

v2: Rebase after the renames.

Fixes: 5ca883505e ("brw: add a NOP in between WHILE instructions on LNL")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33021>
(cherry picked from commit fd10764cff)
2025-02-28 22:17:35 +01:00
Karol Herbst
62747d6bdd intel/brw, lp: enable lower_pack_64_4x16
The compiler won't be able to emit pack_64_4x16, so we should prevent
nir_opt_algebraic to optimize to it. This fixes an infinite optimization
loop inside brw_nir_optimize:

nir_copy_prop
    16x4     %77 = @load_global (%80)
    32    %61995 = pack_32_2x16_split %77.x, %77.y
    32    %61998 = pack_32_2x16_split %77.z, %77.w
    64    %61999 = pack_64_2x32_split %61995, %61998
    64       %76 = iadd %100, %79
                   @store_global (%61999, %76)

nir_opt_algebraic
    16x4     %77 = @load_global (%80)
    32    %61995 = pack_32_2x16_split %77.x, %77.y
    32    %61998 = pack_32_2x16_split %77.z, %77.w
    16x4  %62000 = vec4 %77.x, %77.y, %77.z, %77.w
    64    %62001 = pack_64_4x16 %62000
    64       %76 = iadd %100, %79
                   @store_global (%62001, %76)

nir_lower_pack
    16x4     %77 = @load_global (%80)
    16x4  %62000 = vec4 %77.x, %77.y, %77.z, %77.w
    16    %62002 = mov %62000.y
    16    %62003 = mov %62000.x
    32    %62004 = pack_32_2x16_split %62003, %62002
    16    %62005 = mov %62000.w
    16    %62006 = mov %62000.z
    32    %62007 = pack_32_2x16_split %62006, %62005
    64    %62008 = pack_64_2x32_split %62004, %62007
    64       %76 = iadd %100, %79
                   @store_global (%62008, %76)

// brw_nir_optimize loops here

nir_copy_prop
    16x4     %77 = @load_global (%80)
    32    %62004 = pack_32_2x16_split %77.x, %77.y
    32    %62007 = pack_32_2x16_split %77.z, %77.w
    64    %62008 = pack_64_2x32_split %62004, %62007
    64       %76 = iadd %100, %79
                   @store_global (%62008, %76)

llvmpipe has a similar issue inside lp_build_opt_nir

Fixes: b1bc691b0f ("nir/algebraic: add and improve pack/unpack patterns")
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33347>
(cherry picked from commit dad5ee1039)
2025-02-28 22:17:35 +01:00
Lionel Landwerlin
3630721dc8 anv: fix missing 3DSTATE_PS:Kernel0MaximumPolysperThread programming
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 815d2e3e8b ("anv: move 3DSTATE_PS to partial packing")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33712>
(cherry picked from commit 91f36ba5b6)
2025-02-28 22:17:35 +01:00
Dylan Baker
db51d8f8ac iris: fix handling of GL_*_VERTEX_CONVENTION
By actually setting the state packets according to the program data.
Also ensure that we correctly flag that the program may be dirty when
the geometry shader state changes

Fixes piglit tests: `spec@!opengl 3.2@gl-3.2-adj-prims * pv-first`

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33658>
(cherry picked from commit c33ebf09f5)
2025-02-28 22:17:35 +01:00
Paulo Zanoni
d8ffce96d2 brw: increase brw_reg::subnr size to 6 bits
Since Xe2, the registers are bigger and even the instruction
structures got updated to have 6 bits.

The way I detected this issue was when I tried to use
src/intel/executor to add the following instruction:

    add(8)          g6.8<1>UD      g4<8,8,1>UD    0x00000008UD    { align1 WE_all 1Q I@1 };

Executor would read this and end up emitting an add with dst being
g6<1>UD instead of what we wanted. It turns out that inside
brw_gram.y, at dstoperand and dstoperandex we do:

    $$.subnr = $$.subnr * brw_type_size_bytes($4);

which would overflow subnr back to 0.

The overflow doesn't seem to be a problem with code we emit directly
(unlike the code we parse, like above) due to the fact that we seem to
treat Xe2 registers as smaller all the way until we call phys_nr() and
phys_subnr() during code generation. The phys_subnr() function can
generate a value that would overflow reg.subnr, but this value is
never written back to reg.subnr, it's just returned as an unsigned
int.

Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33539>
(cherry picked from commit 927d7b322b)
2025-02-18 22:46:08 +01:00
Tapani Pälli
3194cae6d0 anv: apply cache flushes on pipeline select with gfx20
This fixes rendering artifacts seen with Hogwarts Legacy and Black
Myth Wukong. Assumption is that we can get rid of these flushes once
RESOURCE_BARRIER work lands but until then we need them.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12540
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12489
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33397>
(cherry picked from commit 765f3b78d5)
2025-02-18 22:46:07 +01:00
Tapani Pälli
961a3fc760 anv: tighten condition for changing barrier layouts
Assertion (or attempting the layout change) is causing crash when
launching Steel Rats. Tighten the condition for change so that it should
affect only when runtime has made changes.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12602
Fixes: eed788213b ("anv: ensure consistent layout transitions in render passes")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33523>
(cherry picked from commit d8381415a6)
2025-02-18 22:46:01 +01:00
Lionel Landwerlin
e2232c0be4 anv: ensure Wa_16012775297 interacts correctly with Wa_18020335297
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: dddd765553 ("anv: implement VF_STATISTICS emit for Wa_16012775297")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>
(cherry picked from commit 6b99bf76ca)
2025-02-15 00:02:54 +01:00
Lionel Landwerlin
399de9dd00 anv: disable VF statistics for memcpy
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>
(cherry picked from commit 462d8e3fab)
2025-02-15 00:02:53 +01:00
Ian Romanick
2ea6b340ac brw/copy: Fix handling of offset in extract_imm
The offset is measured in bytes. Some of the code here acted as though
it were measured in src.type units. Also modify the assertion to check
that all extracted bits come from data in the immediate value.

Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Fixes: da395e6985 ("intel/brw: Fix extract_imm for subregion reads of 64-bit immediates")

Yes, I missed this error *twice* in code review.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33049>
(cherry picked from commit ac4b93571c)
2025-02-11 18:05:27 +01:00
Lionel Landwerlin
cb0d551424 brw: fixup scoreboarding for find_live_channels
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32895>
(cherry picked from commit c08b437db7)
2025-02-05 16:08:29 +01:00
Ernst Persson
26ad2f9149 intel/vulkan: Add bvh build dependency
Fixes: 41baeb3810 ("anv: Implement acceleration structure API")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12558
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33333>
(cherry picked from commit c64871accc)
2025-02-04 20:47:26 +01:00
Hyunjun Ko
cd4ffc319f anv: Fix to set CDEF flter flag correctly for AV1 decoding
and relevant tiny clean-up.

Fixes: 8432b8b282 ("anv: add initial support for AV1 decoding")

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33316>
(cherry picked from commit 52d9edbf05)
2025-02-04 20:47:26 +01:00
Caio Oliveira
f18dee3618 intel/brw: Fallback to SEND from SEND_GATHER if possible
After optimization happen, if the sources are still in one or two
contigous spans for some reason (e.g. some data read from memory
now being written), it is beneficial to just use regular SEND
and avoid having to set the ARF scalar instruction.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
b6b32933ad intel/brw: Use SHADER_OPCODE_SEND_GATHER in Xe3
Add an optimization pass to turn regular SENDs into SEND_GATHERs.
This allows the payload to be "broken" into smaller pieces that
can be further optimized, which _may_ result in

- less register pressure (no need to contiguous space), and
- less instructions (no need to MOV to such space).

For debugging, the INTEL_DEBUG=no-send-gather option skips this
optimization, and reporting how many opportunities were missed.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
26d4d04d63 intel/brw: Add lowering for SHADER_OPCODE_SEND_GATHER
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
650ec7169d intel/brw: Add SHADER_OPCODE_SEND_GATHER
Starting in Xe3, there's a variant of SEND that take the
register numbers from the ARF scalar register, and don't
require them to be contiguous.  The new opcode added here
represents that kind of SEND.

To make the original sources still reachable, we keep them
around during the IR, just ignoring them at generator time.
This allow software scoreboard to properly reason the
dependencies without trying to decode the contents of ARF
scalar register being used.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
2fca22347c intel/brw: Plumb through generator whether SEND is gather variant
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:58 +00:00
Caio Oliveira
00fac79f99 intel/brw: Add scoreboard support for scalar register
Xe3 adds a new pipe that handles *only* MOVs from immediate into the
scalar register.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
2025-01-30 04:43:57 +00:00
Caio Oliveira
fbacf3761f intel: Add meson option -Dintel-elk
Defaults to true.  When set to false Iris and various tools can be
built without ELK support.  In both cases this means supporting
only Gfx9+.  This option must be true to build Crocus or Hasvk.

This allows skipping re-building ELK when developing for newer platforms
with tools/tests enabled.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11575
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Caio Oliveira
31e5d909e7 intel/tools: Merge libaub into libintel_tools
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Caio Oliveira
ec2d20a70d intel/tools: Add helpers for decoder_init/disasm
Isolate the BRW/ELK differences in a single place.  The way is done now,
we are not reusing the isa_info between calls.  For the tools here this
is probably fine, if its someday this gets in the way, we can add an
opaque pointer to store the right data.

This intentionally is not used in Iris, since there the driver need more
detailed view into BRW/ELK and we don't want to create an all
encompassing abstraction for that.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Caio Oliveira
aa2bd16dec intel/tools: Use idep_libintel_common in meson
Since the internal dependency object exists and is already
used in some cases, let's be consistent.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Francisco Jerez
d455d5d86c anv/xe3+: Enable VRT.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
dd1712515b anv/xe3+: Set RegistersPerThread for bindless shader dispatch.
v2: Use MOV and wrap in conditional during BTD spawn header setup
    (Lionel).  Remove references to SIMD8 (Tapani).

v3: Update brw_bsr() to specify number of registers per thread, don't
    initialize Registers Per Thread on BTD spawn header (Lionel).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
b25d0f899b anv/xe3+: Set RegistersPerThread during shader state setup based on prog_data.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
7537f8edee intel/blorp/xe3+: Set RegistersPerThread during shader state setup based on prog_data.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
f6a1c51de7 intel/genxml/xe3+: Update definitions for shader state setup.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
fb40b449cd intel/brw: Define ptl_register_blocks() helper.
Since this calculation will be needed in many places to set up the
state of each shader stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
70fecb1483 intel/brw: Report number of GRF registers used in brw_stage_prog_data.
This is similar to what we used to do on pre-SNB platforms, the number
of GRF registers used by the shader will be used on Xe3+ to adjust the
trade-off between thread-level parallelism and size of the GRF file.
Plumb the value through prog_data so the driver can set up the
hardware state accordingly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
6513bf65c3 intel/brw/xe3+: Optimize CS/TASK/MESH compile time optimistically assuming SIMD32.
This is similar in principle to the previous commit "intel/brw/xe3+:
brw_compile_fs() implementation for Xe3+." but applied to compute-like
shader stages.  It changes the implementation of brw_compile_cs/task/mesh()
to reduce compile time and take advantage of wider dispatch modes more
aggressively than the original logic, since as of Xe3 SIMD32 builds
succeed without spills in most cases thanks to VRT.

The new "optimistic" SIMD selection logic starts with the SIMD width
that is potentially highest performance and only compiles additional
narrower variants if that fails (typically due to spilling), while the
old "pessimistic" logic did the opposite: It started with the
narrowest SIMD width and compiled additional variants with increasing
register pressure until one of them failed to compile.

In typical non-spilling cases where we formerly compiled SIMD16 and
SIMD32 variants of the same compute shader, this change will halve the
number of backend compilations required to build it.

XXX - Possibly don't do this in cases with variable workgroup size
      until effect on runtime performance can be measured directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

v2: Don't do this for now in cases with variable workgroup size, still
    compile every possible variant in such cases.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Sagar Ghuge
7e1362e9c0 intel/brw/xe3+: Don't compile SIMD32 if there is ray queries
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
5b6906076e intel/brw/xe3+: brw_compile_fs() implementation for Xe3+.
This reworks the implementation of brw_compile_fs() to reduce compile
time and take advantage of wider dispatch modes more aggressively than
the original logic.

The new "optimistic" PS compilation logic starts with the SIMD width
that is potentially highest performance and only compiles additional
narrower variants if that fails (typically due to spilling or hardware
restrictions), while the old "pessimistic" logic did the opposite: It
started with the narrowest SIMD width and compiled additional variants
with increasing register pressure until one of them failed to compile.

The main disadvantage of this is that selectively throwing away some
of the compiled variants based on the static analysis of their
performance behavior will no longer be possible, however this is
expected to be less useful on Xe3+ since the GRF space allocated to a
thread can be scaled up or down, which leads to less dramatic
differences in scheduling between SIMD variants.

In typical non-spilling cases where we formerly compiled SIMD16 and
SIMD32 variants of the same fragment shader, this change will halve
the number of backend compilations required to build a shader.  With
multi-polygon PS dispatch enabled (which is disabled by default right
now) this has an even more dramatic effect since the number of
compiler iterations can be reduced down to a fifth in the best case
scenario.

Even though in most cases we will only attempt to return a single
binary from the pixel shader compilation, the hardware allows a pair
of PS kernels to be specified, and we'll still take advantage of this
when the multi-polygon PS kernel has the potential to have worse
performance than the single-polygon shader because only the latter
register-allocates successfully at SIMD32 -- Only in such case
(SIMD2x8 multi-polygon, SIMD32 single-polygon) we'll continue
programming both so the hardware will chose one or the other at
runtime depending on the SIMD fullness and number of polygons it can
buffer at runtime.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
1b2bd1fcb8 intel/brw: Exit early from run_fs() if compilation failed before optimization loop.
This avoids running the optimizer uselessly if compilation of the
current kernel failed due to some hardware (e.g. SIMD-width)
restriction.  This isn't only inefficient but it can break assumptions
throughout the compiler which would lead to crashes on Xe3 when this
arises during translation from NIR.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
afff3eb95e intel/brw: Indent conditional block from brw_compile_fs() not applicable to Xe2+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
d7d08ec2e2 intel/brw: Indent body of brw_compile_fs() not applicable to xe3+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
d03eac3133 intel/brw/xe3+: Disable round-robin allocation heuristic on Xe3+.
Xe3+ benefits from packing register allocations tightly in order to
make optimal use of the GRF space.  The round-robin heuristic
previously in use often causes the whole GRF space to be used even if
register pressure is substantially lower, which would severely
decrease thread-level parallelism on Xe3+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
a67ff3e7e3 intel/brw/xe3+: Bump number of SBID tokens for Xe3.
Xe3 supports 32 SBID tokens per thread regardless of the number of
register blocks allocated per thread.  Take advantage of the increased
number of SBIDs in the scoreboard pass to reduce the frequency of
false dependencies on Xe3+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
8d2331fe4b intel/brw/xe3: Extend regalloc sets to maximum Xe3 GRF size.
Extend our regalloc sets to 256 registers to match the maximum
capacity of the GRF file on Gfx30.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
ca1636d457 intel/brw/xe3: Define XE3_MAX_GRF.
Gfx30 supports up to 256 (512b) GRFs which requires a max GRF define
of 512 in REG_SIZE units.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
67cb23a4b1 intel/common/xe2+: Allow SIMD32 PS for all multisample cases.
These don't seem to be disallowed by recent hardware anymore.  Stop
disabling SIMD32 due to hardware restrictions of multisample
rasterization, since it should have better performance, and on Xe3+
there may be no shader variant available other than SIMD32.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
935f60c13c intel/blorp: Specify a subgroup size requirement of 16 for fast clear or repclear shaders.
Request a fixed subgroup size for pixel shaders that require it due to
the hardware restrictions of fast clears and repeated data clears.
This requires plumbing the "is_fast_clear" boolean across several
callers since blorp_compile_fs_brw() currently has no information
regarding whether the kernel is intended for a fast clear operation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
80b2355b39 intel/brw: Allow specifying a required subgroup size for fragment shaders.
On older hardware the "use_rep_send" compile parameter was being
implicitly used to request the compilation of the SIMD16 variant of
clear pixel shaders that require it due to hardware restrictions.

However starting on Gfx12+ this flag is never set since replicated
data clears are no longer supported, but BLORP still implicitly relies
on the SIMD16 variant being generated even though there's no way for
BLORP to explicitly request it.  This doesn't cause much of a problem
right now since brw_compile_fs() typically generates a SIMD16 kernel
unless the SIMD8 kernel spills or SIMD debugging flags are enabled,
but it won't work reliably on Xe3+ since we'll start using SIMD32 more
aggressively.

In order to avoid these issues use the standard required subgroup_size
parameter from shader_info to signal that the SIMD16 variant of the
shader is needed by the caller.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
a736757275 anv/gfx12.5: Request subgroup size 8 for RT trampoline shader.
The 16-wide variant of the trampoline shader doesn't appear to work
and would be inadvertently enabled by this series on Gfx12.5.  Set the
required subgroup size to avoid changing current behavior.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
8102500b95 intel/brw/xe3+: Mask subgroup shuffle index to be within valid range to avoid VRT hangs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
d2af77aa6b intel/brw: Use urb_read_length instead of nr_attribute_slots to calculate VS first_non_payload_grf.
Makes sure the number of registers reserved for the payload matches
the size of the URB read, which prevents the VS shared function from
writing past the end of the register file on Xe3 with VRT enabled.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Francisco Jerez
7f59708422 intel/brw: Saturate shifted subgroup index to avoid reading past the end of register file.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Valentine Burley
e75e9baff8 anv/ci: Decrease anv-jsl-angle parallelism
One of the DUTs had to be retired in LAVA a while ago, and the pending
times were high on the dashboard. Decrease the parallelism and increase
the fractions to address this.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33082>
2025-01-29 20:34:15 +00:00
Valentine Burley
562e9c5302 iris/ci: Decrease iris-glk-deqp paralellism
There are only 4 DUTs available in LAVA, and their pending durations
were reaching 3–4 minutes. To address this, reduce the parallelism
for iris-glk-deqp and adjust the fractions to maintain the 10-minute
time limit.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33082>
2025-01-29 20:34:13 +00:00
Lionel Landwerlin
ff9cf7a222 anv: reduce alignment for small heaps
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33240>
2025-01-29 17:33:13 +00:00