Commit graph

4403 commits

Author SHA1 Message Date
Caio Oliveira
46e9fe6981 intel/brw: Add TGL_PIPE_SCALAR value
Add the enum value for the (in-order) scalar pipe.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>
2024-12-13 02:18:15 +00:00
Caio Oliveira
7acd84da51 intel/brw: Consider if SEND is gather variant when setting ex_desc
SEND instructions of gather variant will use the upcoming ARF scalar
register.  They use only Src0 and reuse the bits of Src1.Length (part of
ex_desc).  Src1.Length is (implicitly) defined as 0.

Adapt the helper functions to take the new variant into account when
manipulating ex_desc.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>
2024-12-13 02:18:15 +00:00
Ian Romanick
1b1003ca6f brw/algebraic: Pull brw_constant_fold_instruction out of the switch statement
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
f0bf68dd25 brw/const: Remove TODO that isn't allowed by the hardware
There are a lot of restrictions for bfloat16. The one that prevents this
very useful optimization from being possible is, "Broadcast of bfloat16
scalar is not supported."

Part of the reason this MR exists is to build up to implementing BF
support, and there are a couple more commits that implement
this. However, it fails on both real hardware and simulation:

    Instruction is: mad (8|M0) r6.0<1>:f 0xBF80:bf r2.0<8;1>:f r64.0<0>:f

    In bfloat/float mixed mode, bfloat src must be packed.

Alas.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
99d3755bdd brw/const: Allow HF constants in MAD on Gfx11
These can't mix with F values, but if the non-constant sources are
already HF, this is allowed in src0.

No shader-db changes on any Intel platform.

fossil-db:

Ice Lake
Totals:
Instrs: 236027458 -> 236027442 (-0.00%)
Cycle count: 24515944704 -> 24515945379 (+0.00%)

Totals from 8 (0.00% of 798454) affected shaders:
Instrs: 10226 -> 10210 (-0.16%)
Cycle count: 58567 -> 59242 (+1.15%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
4c462b6b32 brw/const: Allow constants in integer MAD
Nothing can generate this currently, but a future commit will.

The Bspec and experimentation support the following limitations:

- Gfx11: Either src0 or src2 can be W or UW.
- Gfx12: Either src0 or src2 can be W or UW.
- Gfx12.5: Both src0 and src2 can be W or UW.
- Gfx20: Both src0 and src2 can be W or UW.

v2: Add missing break statement.

v3: Leave the MAD handling in the case with the other 3 source
instructions. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
9fa6b68f9e brw/const: Refactor checking whether an immediate source is allowed
Should be no functional change here. This simplifies some later changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
69d74739fd brw/algebraic: Don't restrict MAD(a, b, 1) optimization to float32
This is very unlikely for floating point MAD. At some point I intend
to add internal integer MAD uses, and this could occur there.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
b605f76b2a brw/algebraic: Constant fold multiplicands of MAD
v2: Move the full constant folding part to
brw_constant_fold_instruction. Suggested by Caio. I did this by
extracting the core part of the folding to a helper function.

v3: Delete stale comment. Noticed by Caio.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 18090847 -> 18090843 (<.01%)
instructions in affected programs: 150 -> 146 (-2.67%)
helped: 1 / HURT: 0

total cycles in shared programs: 919664648 -> 919663210 (<.01%)
cycles in affected programs: 3426 -> 1988 (-41.97%)
helped: 1 / HURT: 0

LOST:   1
GAINED: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 220496486 -> 220496403 (-0.00%)
Cycle count: 31610880908 -> 31610879044 (-0.00%); split: -0.00%, +0.00%

Totals from 70 (0.01% of 702439) affected shaders:
Instrs: 47018 -> 46935 (-0.18%)
Cycle count: 6335504 -> 6333640 (-0.03%); split: -0.11%, +0.09%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
3a16ad71b7 brw/copy: Commute immediates for MAD multiplicands
This enables constant combining to do its job.

v2: Restore accidentally deleted line from a comment. Noticed by Caio.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 919668392 -> 919669310 (<.01%)
cycles in affected programs: 10125264 -> 10126182 (<.01%)
helped: 348 / HURT: 194

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Cycle count: 31610720660 -> 31610692748 (-0.00%); split: -0.00%, +0.00%

Totals from 9066 (1.29% of 702433) affected shaders:
Cycle count: 810411934 -> 810384022 (-0.00%); split: -0.01%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
e3e58d6f48 brw: Emit immediate value for MAD in canonical position
No shader-db changes on any Intel platform.

fossil-db:

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Cycle count: 25096109024 -> 25096108722 (-0.00%); split: -0.00%, +0.00%

Totals from 4106 (0.51% of 797610) affected shaders:
Cycle count: 63266176 -> 63265874 (-0.00%); split: -0.01%, +0.01%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
d9b019b683 brw/copy: Don't try to be clever about ADD3 constant propagation
Always propagate into any source. Let commute_immedates and constant
combining sort out the mess. It's literally their job.

No shader-db changes on any Intel platform. The fossil-db changes just
appear to be subtle changes in register allocation if the immediate
source changes from src0 to src2.

v2: Update the comment in commute_immediates. Suggested by Caio.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Cycle count: 31610720510 -> 31610720660 (+0.00%); split: -0.00%, +0.00%

Totals from 8 (0.00% of 702433) affected shaders:
Cycle count: 5522382 -> 5522532 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
a84e3a0f55 brw/const: Allow mixing signed and unsigned immediate sources
No shader-db or fossil-db changes on any Intel platform. This commit
just prevents issues with a later commit, "brw/copy: Don't try to be
clever about ADD3 constant propagation."

v2: Use 'can_promote = true; break;' instead of 'return
true;'. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
a738c55d7b brw/algebraic: Partial constant folding of ADD3
Fold the cases where one of the sources is zero or two of the sources
are constants. Both case will result in a regular ADD.

No shader-db or fossil-db changes on any Intel platform. This commit
just prevents issues with a later commit, "brw/copy: Don't try to be
clever about ADD3 constant propagation."

v2: Move the full constant folding part to
brw_constant_fold_instruction. Suggested by Caio.

v3: Eliminate the impossible src.file == BAD_FILE case in
brw_fs_opt_algebraic. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
c52ce6157f brw/emit: Fix typo in recently added ADD3 assertion
The current assertion fails as soon as a MAD with src0 and src2 being
immediate is detected.

The assertion was supposted to catch, "If it's ADD3, only one of src0
and src2 can be immediate." The detect this, the opcode test should have
been !=.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: c1c09e3c4a ("brw/emit: Add correct 3-source instruction assertions for each platform")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
25de9dcd76 brw/algebraic: Fix MUL constant folding
Some callers of brw_constant_fold_instruction depend on the result being
a MOV of immediate when progress is made. Previously `MUL dst:D src0:D
1:D` would be converted to `MOV dst:D src0:D`. There was also no
handling for `MUL dst:D imm0:D imm1:D`.

This could cause problems if one of the immedate values was -1. The
existing code would convert this to a `MOV dst:D imm0:D` and set the
negate flag on src0. That is not correct.

v2: Fix the is_negative_one case handling of the non-negative-one
source. Add a comment explaining the assertion. Both suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2cc1575a31 ("brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
086e83ccd9 brw/algebraic: Fix ADD constant folding
Some callers of brw_constant_fold_instruction depend on the result being
a MOV of immediate when progress is made. Previously `ADD dst:D src0:D
0:D` would be converted to `MOV dst:D src0:D`. There was also no
handling for `ADD dst:D imm0:D imm1:D`.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2cc1575a31 ("brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Caio Oliveira
c8f6d8154f intel/brw: Remove overloads for brw_print_instruction/s functions
Almost all cases now handled with default arguments.  The only real
extra work that was being done was pushed to the client code in
debug_optimizer().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32596>
2024-12-12 22:01:48 +00:00
Paulo Zanoni
d4a54d4f92 brw: don't read past the end of old_src buffer in resize_sources()
In this case, num_sources is bigger than this->sources, so if we loop
up to num_sources (instead of this->sources) we'll end up reading past
the end of old_src[]. Only copy up to what we originally had.

This was found by code inspection, I'm not aware of any applications
failing due to the lack of this patch.

Fixes: d9e737212d ("intel/brw: Add a src array for the common case in fs_inst")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32600>
2024-12-12 20:33:13 +00:00
Kenneth Graunke
6341b3cd87 brw: Combine convergent texture buffer fetches into fewer loads
Borderlands 3 (both DX11 and DX12 renderers) have a common pattern
across many shaders:

  con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture)
  con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture)
  ...
  con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture)
  con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture)

A single basic block contains piles of texelFetches from a 1D buffer
texture, with constant coordinates.  In most cases, only the .x channel
of the result is read.  So we have something on the order of 28 sampler
messages, each asking for...a single uint32_t scalar value.  Because our
sampler doesn't have any support for convergent block loads (like the
untyped LSC transpose messages for SSBOs)...this means we were emitting
SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar,
replicating what's effectively a SIMD1 value to the entire register.
This is hugely wasteful, both in terms of register pressure, and also in
back-and-forth sending and receiving memory messages.

The good news is we can take advantage of our explicit SIMD model to
handle this more efficiently.  This patch adds a new optimization pass
that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic
block, with constant offsets, from the same texture.  It constructs a
new divergent coordinate where each channel is one of the constants
(i.e <10, 11, 12, ..., 26> in the above example).  It issues a new
NoMask divergent texel fetch which loads N useful channels in one go,
and replaces the rest with expansion MOVs that splat the SIMD1 result
back to the full SIMD width.  (These get copy propagated away.)

We can pick the SIMD size of the load independently of the native shader
width as well.  On Xe2, those 28 convergent loads become a single SIMD32
ld message.  On earlier hardware, we use 2 SIMD16 messages.  Or we can
use a smaller size when there aren't many to combine.

In fossil-db, this cuts 27% of send messages in affected shaders, 3-6%
of cycles, 2-3% of instructions, and 8-12% of live registers.  On A770,
this improves performance of Borderlands 3 by roughly 2.5-3.5%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>
2024-12-12 00:05:42 +00:00
Caio Oliveira
abe41b1d2c intel/compiler: Use #pragma once instead of header guards
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32534>
2024-12-11 19:47:44 +00:00
Caio Oliveira
638802d68f intel/brw: Dump errors when brw_assemble() fails EU validation
This will allow executor to show proper inline errors.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32490>
2024-12-10 20:23:25 +00:00
Benjamin Lee
74ccf6cbdc nir: add option to use compact view indices
In panvk we pass absolute view indices to the hardware, so we need to do
the conversion from compacted to absolute at some point. Emitting
absolute indices from nir_lower_multiview initially looks like the
simplest option, but nir_lower_io_to_temporaries will emit a write for
every element of array varyings. This results in unnecessary writes to
disabled views.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
becb014d27 nir: treat per-view outputs as arrayed IO
This is needed for implementing multiview in panvk, where the address
calculation for multiview outputs is not well-represented by lowering to
nir_intrinsic_store_output with a single offset.

The case where a variable is both per-view and per-{vertex,primitive} is
now unsupported. This would come up with drivers implementing
NV_mesh_shader or using nir_lower_multiview on geometry, tessellation,
or mesh shaders. No drivers currently do either of these. There was some
code that attempted to handle the nested per-view case by unwrapping
per-view/arrayed types twice, but it's unclear to what extent this
actually worked.

ANV and Turnip both rely on per-view outputs being assigned a unique
driver location for each view, so I've added on option to configure that
behavior rather than removing it.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Paulo Zanoni
0dc2a5808e brw: don't forget the base when emitting SHADER_OPCODE_MOV_RELOC_IMM
The last argument seems to be used as brw_shader_reloc::delta (from
brw_add_reloc), and we're unconditionally setting it to 0 here, while
the other place where we handle nir_intrinsic_load_reloc_const_intel
seems to be setting the base appropriately.

I found this by inspection while debugging a bug related to this code,
so I'm not aware of any workloads that get improved by this patch.

Related patches:
 - ecbec25e84 ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic")
 - 99047451c9 ("intel/fs: add plumbing for embedded samplers")

Fixes: ecbec25e84 ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32531>
2024-12-09 15:45:49 +00:00
Dylan Baker
33a1acb0da clc: Tell clang to track imported dependencies
Clang is capable of tacking what headers it imports, as long as we set
it up to do that. While that isn't important for rusticl, it would be
useful for the various `_clc` tools, as they can then tell Ninja which
headers they read to make rebuilds more reliable.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505>
2024-12-06 13:48:26 -05:00
Ian Romanick
0754a18621 brw/copy: Allow copy prop into src1 of broadcast
This is the selector, and it must always be a uniform UD, so there's no
reason to not propagate into it.

No shader-db change on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 220507131 -> 220507127 (-0.00%)
Cycle count: 31607052398 -> 31607053364 (+0.00%); split: -0.00%, +0.00%

Totals from 5 (0.00% of 702410) affected shaders:
Instrs: 995 -> 991 (-0.40%)
Cycle count: 86392 -> 87358 (+1.12%); split: -0.07%, +1.19%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>
2024-12-05 00:15:27 +00:00
Ian Romanick
662339a2ff brw/build: Use SIMD8 temporaries in emit_uniformize
The fossil-db results are very different from v1. This is now mostly
helpful on older platforms.

v2: When optimizing BROADCAST or FIND_LIVE_CHANNEL to a simple MOV,
adjust the exec_size to match the size allocated for the destination
register. Fixes EU validation failures in some piglit OpenCL tests
(e.g., atomic_add-global-return.cl).

v3: Use component_size() in emit_uniformize and BROADCAST to properly
account for UQ vs UD destination. This doesn't matter for
emit_uniformize because the type is always UD, but it is technically
more correct.

v4: Update trace checksums. Now amly expects the same checksum as
several other platforms.

v5: Use xbld.dispatch_width() in the builder for when scalar_group()
eventually becomes SIMD1. Suggested by Lionel.

shader-db:

Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18091701 -> 18091586 (<.01%)
instructions in affected programs: 29616 -> 29501 (-0.39%)
helped: 28 / HURT: 18

total cycles in shared programs: 919250494 -> 919123828 (-0.01%)
cycles in affected programs: 12201102 -> 12074436 (-1.04%)
helped: 124 / HURT: 108

LOST:   0
GAINED: 1

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20480808 -> 20480624 (<.01%)
instructions in affected programs: 58465 -> 58281 (-0.31%)
helped: 61 / HURT: 20

total cycles in shared programs: 874860168 -> 874960312 (0.01%)
cycles in affected programs: 18240986 -> 18341130 (0.55%)
helped: 113 / HURT: 158

total spills in shared programs: 4557 -> 4555 (-0.04%)
spills in affected programs: 93 -> 91 (-2.15%)
helped: 1 / HURT: 0

total fills in shared programs: 5247 -> 5243 (-0.08%)
fills in affected programs: 224 -> 220 (-1.79%)
helped: 1 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 220486064 -> 220486959 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 14102592 -> 14102624 (+0.00%)
Cycle count: 31602733838 -> 31604733270 (+0.01%); split: -0.01%, +0.02%
Max live registers: 65371025 -> 65355084 (-0.02%)

Totals from 12130 (1.73% of 702392) affected shaders:
Instrs: 5162700 -> 5163595 (+0.02%); split: -0.06%, +0.08%
Subgroup size: 388128 -> 388160 (+0.01%)
Cycle count: 751721956 -> 753721388 (+0.27%); split: -0.54%, +0.81%
Max live registers: 1538550 -> 1522609 (-1.04%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 241601142 -> 241599114 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 9631168 -> 9631216 (+0.00%)
Cycle count: 25101781573 -> 25097909570 (-0.02%); split: -0.03%, +0.01%
Max live registers: 41540611 -> 41514296 (-0.06%)
Max dispatch width: 6993456 -> 7000928 (+0.11%); split: +0.15%, -0.05%

Totals from 16852 (2.11% of 796880) affected shaders:
Instrs: 6303937 -> 6301909 (-0.03%); split: -0.11%, +0.07%
Subgroup size: 323592 -> 323640 (+0.01%)
Cycle count: 625455880 -> 621583877 (-0.62%); split: -1.20%, +0.58%
Max live registers: 1072491 -> 1046176 (-2.45%)
Max dispatch width: 76672 -> 84144 (+9.75%); split: +14.04%, -4.30%

Tiger Lake
Totals:
Instrs: 235190395 -> 235193286 (+0.00%); split: -0.00%, +0.00%
Cycle count: 23130855720 -> 23128936334 (-0.01%); split: -0.02%, +0.01%
Max live registers: 41644106 -> 41620052 (-0.06%)
Max dispatch width: 6959160 -> 6981512 (+0.32%); split: +0.34%, -0.02%

Totals from 15102 (1.90% of 793371) affected shaders:
Instrs: 5771042 -> 5773933 (+0.05%); split: -0.06%, +0.11%
Cycle count: 371062226 -> 369142840 (-0.52%); split: -1.04%, +0.52%
Max live registers: 989858 -> 965804 (-2.43%)
Max dispatch width: 61344 -> 83696 (+36.44%); split: +38.42%, -1.98%

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 236063150 -> 236063242 (+0.00%); split: -0.00%, +0.00%
Cycle count: 24516187174 -> 24516027518 (-0.00%); split: -0.00%, +0.00%
Spill count: 567071 -> 567049 (-0.00%)
Fill count: 701323 -> 701273 (-0.01%)
Max live registers: 41914047 -> 41913281 (-0.00%)
Max dispatch width: 7042608 -> 7042736 (+0.00%); split: +0.00%, -0.00%

Totals from 3904 (0.49% of 798473) affected shaders:
Instrs: 2809690 -> 2809782 (+0.00%); split: -0.02%, +0.03%
Cycle count: 182114259 -> 181954603 (-0.09%); split: -0.34%, +0.25%
Spill count: 1696 -> 1674 (-1.30%)
Fill count: 2523 -> 2473 (-1.98%)
Max live registers: 341695 -> 340929 (-0.22%)
Max dispatch width: 32752 -> 32880 (+0.39%); split: +0.44%, -0.05%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>
2024-12-05 00:15:27 +00:00
Ian Romanick
d2b266187d brw: Use resize_sources several more places
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>
2024-12-05 00:15:27 +00:00
Ian Romanick
12d1886b87 brw/lower: Don't "fix" regioning of broadcast
The next two commits modify the destination regioning in a way that,
which still correct, trigger assertion failures if we try to fix the
regioning here.

Broadcast gets lowered in brw_eu_emit. For the purposes of region
restrictions, let's assume that the final code emission will do the
right thing. Doing a bunch of shuffling here is only going to make a
mess of things.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>
2024-12-05 00:15:27 +00:00
Caio Oliveira
cbc45ac99e intel/brw: Enable EU validation and compaction tests for PTL
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32195>
2024-12-04 23:03:11 +00:00
Sagar Ghuge
9afb0480c4 intel/compiler: Extend nir_intrinsic_load_topology_id_intel for xe3
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32426>
2024-12-04 19:20:51 +00:00
Lionel Landwerlin
69edf4144a brw: use transpose unspill messages when possible
This simplifies the unspill messages quite a bit.

A/B testing on DG2 :

BlackOps3 : +0.96%
TotalWarPharaoh: +0.31%

DG2 shader changes :

  Assassin's Creed Valhalla:
  Totals from 19 (0.89% of 2131) affected shaders:
  Instrs: 70542 -> 64369 (-8.75%)
  Cycle count: 18810945 -> 18560169 (-1.33%); split: -1.40%, +0.06%

  Black Ops 3:
  Totals from 55 (3.41% of 1612) affected shaders:
  Instrs: 389549 -> 350646 (-9.99%)
  Cycle count: 344168275 -> 340652311 (-1.02%); split: -1.17%, +0.15%

  Control:
  Totals from 1 (0.11% of 878) affected shaders:
  Instrs: 3409 -> 3212 (-5.78%)
  Cycle count: 255991 -> 250411 (-2.18%)

  Cyberpunk 2077:
  Totals from 1 (0.08% of 1264) affected shaders:
  Instrs: 2363 -> 2337 (-1.10%)
  Cycle count: 69283 -> 69186 (-0.14%)

  Fallout 4:
  Totals from 1 (0.06% of 1601) affected shaders:
  Instrs: 27946 -> 20056 (-28.23%)
  Cycle count: 2391398 -> 2153658 (-9.94%)

  Fortnite:
  Totals from 273 (3.65% of 7470) affected shaders:
  Instrs: 634377 -> 601519 (-5.18%)
  Cycle count: 31870433 -> 31624089 (-0.77%); split: -0.78%, +0.01%

  Hogwarts Legacy:
  Totals from 50 (3.02% of 1656) affected shaders:
  Instrs: 110455 -> 103339 (-6.44%)
  Cycle count: 6613728 -> 6530832 (-1.25%); split: -1.28%, +0.03%

  Metro Exodus:
  Totals from 70 (0.16% of 43076) affected shaders:
  Instrs: 253847 -> 245321 (-3.36%)
  Cycle count: 13269473 -> 13209131 (-0.45%)
  Spill count: 1111 -> 1108 (-0.27%)
  Fill count: 2868 -> 2865 (-0.10%)

  Red Dead Redemption 2:
  Totals from 139 (2.38% of 5847) affected shaders:
  Instrs: 496551 -> 450180 (-9.34%)
  Cycle count: 43233944 -> 40947386 (-5.29%); split: -5.33%, +0.04%
  Spill count: 6322 -> 6326 (+0.06%)
  Fill count: 15558 -> 15568 (+0.06%)

  Rise Of The Tomb Raider:
  Totals from 1 (0.56% of 178) affected shaders:
  Instrs: 1682 -> 1437 (-14.57%)
  Cycle count: 603670 -> 586766 (-2.80%)

  Spiderman Remastered:
  Totals from 820 (11.77% of 6965) affected shaders:
  Instrs: 4622877 -> 3984893 (-13.80%)
  Cycle count: 235094963186 -> 234483925430 (-0.26%); split: -0.42%, +0.16%
  Spill count: 73414 -> 73581 (+0.23%); split: -0.02%, +0.25%
  Fill count: 215090 -> 215627 (+0.25%); split: -0.02%, +0.27%
  Scratch Memory Size: 3520512 -> 3528704 (+0.23%); split: -0.12%, +0.35%

Some of stats show spilling changes which is telling of how our spill
code is not adequate. Some of the spilled values are probably being
respilled which shouldn't be the case.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32110>
2024-12-04 08:59:07 +00:00
Kenneth Graunke
2ade3ec2a9 brw: Allow SIMD32 math instructions on Xe2
There's no restriction here AFAICT - only when HF types are involved.

fossil-db results on Lunar Lake:

   Totals:
   Instrs: 143665291 -> 142654109 (-0.70%)
   Cycle count: 22516049016 -> 22514172014 (-0.01%); split: -0.02%, +0.01%
   Max live registers: 49038116 -> 49017687 (-0.04%); split: -0.04%, +0.00%

   Totals from 117623 (21.07% of 558370) affected shaders:
   Instrs: 25098642 -> 24087460 (-4.03%)
   Cycle count: 1038884570 -> 1037007568 (-0.18%); split: -0.48%, +0.29%
   Max live registers: 12423219 -> 12402790 (-0.16%); split: -0.16%, +0.00%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32471>
2024-12-04 02:42:34 +00:00
Kenneth Graunke
815236b417 brw: Fix register unit calculation in SIMD32 LOAD_PAYLOAD lowering
We were wanting to check if the destination region spanned multiple
registers.  But we were checking against REG_SIZE, when the register
size is actually REG_SIZE * reg_unit(devinfo) now.

This meant that SIMD32 LOAD_PAYLOAD was always getting SIMD-split
on Xe2 platforms, generating a lot of unnecessary mess for compute
shaders.

fossil-db results on Lunar Lake:

   Totals:
   Instrs: 146178614 -> 143291988 (-1.97%); split: -1.98%, +0.00%
   Subgroup size: 11089632 -> 11089376 (-0.00%); split: +0.00%, -0.00%
   Cycle count: 22528892444 -> 22507551650 (-0.09%); split: -0.12%, +0.03%
   Max live registers: 48834202 -> 48886685 (+0.11%); split: -0.09%, +0.20%

   Totals from 134306 (24.10% of 557327) affected shaders:
   Instrs: 28806335 -> 25919709 (-10.02%); split: -10.02%, +0.00%
   Subgroup size: 4297680 -> 4297424 (-0.01%); split: +0.00%, -0.01%
   Cycle count: 956867650 -> 935526856 (-2.23%); split: -2.84%, +0.61%
   Max live registers: 13085711 -> 13138194 (+0.40%); split: -0.33%, +0.73%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32471>
2024-12-04 02:42:34 +00:00
Caio Oliveira
dfa4c55a4f intel/brw: Add is_control_source for the new subgroup ops
Fixes: 019770f026 ("intel/brw: Add SHADER_OPCODE_VOTE_*")
Fixes: 9537b62759 ("intel/brw: Add SHADER_OPCODE_REDUCE")
Fixes: 0ba1159b0a ("intel/brw: Add SHADER_OPCODE_*_SCAN")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32411>
2024-12-04 01:19:37 +00:00
Marek Olšák
7f4e36ff7d gallium: replace PIPE_SHADER_CAP_INDIRECT_INPUT/OUTPUT_ADDR with NIR options
This is a prerequisite for enabling nir_opt_varyings for all gallium
drivers.

nir_lower_io_passes (called by the GLSL linker) only uses NIR options
to lower indirect IO access before lowering IO and calling
nir_opt_varyings.

Most drivers report full support for indirect IO and lower it themselves,
which prevents compaction of lowered indirectly accessed varyings because
nir_opt_varyings doesn't touch indirect varyings.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> (Rb for asahi)
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> (for r300)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32423>
2024-12-03 12:57:36 +00:00
Kenneth Graunke
6fd10a6620 brw: Tune vectorizer conditions to allow overfetching with holes
Notably, our convergent block loads were already overfetching - we
rounded up to block sizes of 8, 16, 32, or 64(LSC-only).  But we did
so in the backend, rather than NIR.

With recent changes, nir_opt_load_store_vectorizer allows holes of up
to 28 bytes (7 components at 4 bytes each).  This allows us to detect
cases where we did a convergent block load for 1 component (but loaded
a whole vec8), then another load for the next vec8, and combine them
into a single V16 load.  Single component loads aren't the most common,
but convergent loads of a vec2 in one group and a vec3 in another are
quite common, and it makes no sense to do V8+V8 loads instead of V16.

For non-block loads, we allow a max hole of 4 bytes.  This allows the
common case of XYZ_ + XYZ_ loads (where the last component is unread)
to combine into a single larger load.

fossil-db results on Lunarlake:

   Totals:
   Instrs: 146692608 -> 146246432 (-0.30%); split: -0.33%, +0.02%
   Subgroup size: 11100528 -> 11100512 (-0.00%)
   Send messages: 7003425 -> 6862529 (-2.01%); split: -2.01%, +0.00%
   Cycle count: 22396273274 -> 22523048654 (+0.57%); split: -1.08%, +1.64%
   Spill count: 67671 -> 67594 (-0.11%); split: -1.59%, +1.48%
   Fill count: 128999 -> 130223 (+0.95%); split: -1.73%, +2.68%
   Scratch Memory Size: 5986304 -> 6042624 (+0.94%); split: -1.40%, +2.34%
   Max live registers: 48898858 -> 48881655 (-0.04%); split: -0.05%, +0.01%
   Non SSA regs after NIR: 172397792 -> 167577380 (-2.80%); split: -2.80%, +0.00%

   Totals from 451003 (80.87% of 557667) affected shaders:
   Instrs: 134111754 -> 133665578 (-0.33%); split: -0.36%, +0.03%
   Subgroup size: 9039104 -> 9039088 (-0.00%)
   Send messages: 6127775 -> 5986879 (-2.30%); split: -2.30%, +0.00%
   Cycle count: 20306336726 -> 20433112106 (+0.62%); split: -1.19%, +1.81%
   Spill count: 56230 -> 56153 (-0.14%); split: -1.92%, +1.78%
   Fill count: 112920 -> 114144 (+1.08%); split: -1.97%, +3.06%
   Scratch Memory Size: 3769344 -> 3825664 (+1.49%); split: -2.23%, +3.72%
   Max live registers: 43750259 -> 43733056 (-0.04%); split: -0.05%, +0.01%
   Non SSA regs after NIR: 158449343 -> 153628931 (-3.04%); split: -3.04%, +0.00%

   In particular, sends get cut by 20.85% for Borderlands 3 DX12, 13.82%
   on Cyberpunk 2077, 10.75% on Strange Brigade, and 10.20% on Red Dead
   Redemption 2.  Yet, spill/fills remain about the same.

fossil-db results on Alchemist are similar though not quite as good.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Kenneth Graunke
f88eb48ff2 anv: Don't consider nir_var_mem_global for vectorizer robustness checks
nir_opt_load_store_vectorize checks for potential address wrapping
when vectorizing two loads ("low" and "high").  It looks for cases where
"low" might have a large address, and "high" has a positive offset
which, when added together, could trigger integer wraparound.  The issue
here is that if the large address of "low" was considered out-of-bounds,
adding offset could wrap around to a small address, which might actually
be in-bounds.  Thus, when loaded separately, "low" will fail and trigger
robustness out-of-bound-read behavior, but "high" would read correctly.
When vectorized, the entire load would fail.  This is explicitly tested
for with 32-bit SSBO addresses in the Vulkan CTS.

However, anv's 64-bit global addresses and VMA handling effectively
prevent this case.  Addresses 0-4095 are a reserved page so that if
people try to use 0 as a NULL pointer, it never maps to a valid BO.
That alone guarantees that the above case where "high" gets a small
address would never be in-bounds, so we don't need to check for it.

In fact, we allocate most user allocations out of high addresses,
and have specialized allocation heaps for certain types of GPU data
structures in the lower GB of memory.  For a load to wrap around and
successfully land in the right heap, it would have to load gigabytes.

Disabling this allows load vectorization and overfetching in more cases.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Kenneth Graunke
01680a66a9 brw: Simplify choose_oword_block_size_dwords()
Just calculate the block size using util_logbase2() - it's simpler.

Also drop the name "oword" as this refers to legacy HDC messages,
rather than the newer LSC "vector size" field.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Kenneth Graunke
e8c85f8476 brw: Only consider components read for UBO push analysis
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Kenneth Graunke
e703ff5e02 brw: Only consider components read for UBO loads
This will matter more with overfetching, where we may suggest loading
additional data that we don't actually need for vectorization purposes.

We want to make sure that push ranges have the data we actually need;
any extra padding is irrelevant.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Kenneth Graunke
da93b13f8b brw: Use nir_combined_align in brw_nir_should_vectorize_mem
Better than open-coding this.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:32 +00:00
Kenneth Graunke
8c795af0b8 brw: Drop a few crocus references in comments
crocus no longer uses brw.  It uses elk.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:32 +00:00
Kenneth Graunke
46af23649c brw: Drop "regular uniform" concept from UBO push analysis
i965 used to upload its own regular GL uniforms and push those in
addition to UBO ranges.  st/mesa instead uploads regular uniforms
and presents those to use as UBO 0.  So this really isn't a thing
anymore.

nir_intrinsic_load_uniform is still used today but it represents
Vulkan push constants.  anv_nir_compute_push_layout already takes
care of ensuring too many ranges aren't present, so it doesn't need
the pass to do so.  iris doesn't use this intrinsic at all.

We can also drop the compute shader check, because neither iris nor
anv use UBO push analysis for compute shaders - except for anv's
internal kernels, which already have well specified push layouts.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:32 +00:00
Kenneth Graunke
586a470a00 brw: Drop image deref handling from brw_analyze_ubo_ranges
This was for pre-Skylake image load/store handling with image params.

We don't support that in brw anymore.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:32 +00:00
Lionel Landwerlin
cafec54c79 Revert in correct commit "fix"
This reverts commit 38c7e40bc02578585cc56c3a2d016d0b06ade184.

Fixes: b625a573 ("fix")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32354>
2024-11-26 16:36:06 +02:00
Lionel Landwerlin
b625a573da fix
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>
2024-11-26 13:05:30 +00:00
Lionel Landwerlin
6eb48a3e47 brw: move fs_msaa_flags logic to intel_shader_enums.h
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>
2024-11-26 13:05:30 +00:00
Lionel Landwerlin
ba3ff8b3bb brw: move barycentric_mode enum to intel_shader_enums.h
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>
2024-11-26 13:05:30 +00:00