Commit graph

13183 commits

Author SHA1 Message Date
Caio Oliveira
f308be16a0 intel/brw: Add validation for some Xe2 register regioning restrictions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>
2024-12-14 02:15:18 +00:00
Caio Oliveira
6a5a316312 intel/brw: Extract format enum in EU validation code
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>
2024-12-14 02:15:18 +00:00
Caio Oliveira
57b703cec3 intel/brw: Skip some regioning EU validation for Vx1 and VxH modes
Skip the ones that check the VertStride -- which is set to a special
value in those modes.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>
2024-12-14 02:15:18 +00:00
Felix DeGrood
0f46c53b0c anv: Use vfg distribution mode = RR_STRICT for Xe2+
Performance tuning. Round Robin strict faster on Xe2 for some
workloads.

Speedup:
 - Borderlands3-dx11-trace: +4%
 - WolfensteinYoungblood-vk.g6: +1.5%
 - Cyberpunk2077-dx12vk-2160p-ultra: +0.5%

Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32566>
2024-12-13 19:15:48 +00:00
Deborah Brouwer
b6435207ab ci: python-test rename artifacts
The current python-test job creates and compresses python related
artifacts for use by future jobs. The artifacts are currently named
`mesa-python-test` which is somewhat misleading because they are not
needed for testing python scripts or libraries.

Rename the artifacts generated by the python-test job to be more
descriptive of their purpose.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32340>
2024-12-13 10:04:03 -08:00
Sagar Ghuge
d3f9139e49 intel: Use Morton compute walk order
According to HSD 14016252163 if compute shader uses the sample
operation, morton walk order and set the thread group batch size to 4 is
expected to increase sampler cache hit rates by increasing sample
address locality within a subslice.

Rework:
 * Caio: "||" => "&&" for type checking in instr_uses_sampler()
 * Jordan: Use nir's foreach macros rather than
   nir_shader_lower_instructions()

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>
2024-12-12 19:56:47 -08:00
Sagar Ghuge
4bd958243d intel/genxml: Update COMPUTE_WALKER_BODY
For PTL, we can have one more additional walk order along with the
"Thread Group Batch Size" field.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>
2024-12-12 19:56:47 -08:00
Sagar Ghuge
41eda955af intel/genxml: Drop morton walk field from Xe2
Looks like this one got added accidently for Xe2. Xe2 doesn't support
Morton dispatch walk order.

Thanks to Rohan for bringing up this during review.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>
2024-12-12 19:56:47 -08:00
Caio Oliveira
0af8133f09 intel/executor: Add example using scalar register and send gather
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>
2024-12-13 02:18:15 +00:00
Caio Oliveira
5420c027e6 intel/brw: Add validation for ARF scalar register
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>
2024-12-13 02:18:15 +00:00
Caio Oliveira
f8c7348468 intel/brw: Add assembly support for ARF scalar register
And the SEND gather variant that uses a scalar register as its only
source.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>
2024-12-13 02:18:15 +00:00
Caio Oliveira
46e9fe6981 intel/brw: Add TGL_PIPE_SCALAR value
Add the enum value for the (in-order) scalar pipe.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>
2024-12-13 02:18:15 +00:00
Caio Oliveira
7acd84da51 intel/brw: Consider if SEND is gather variant when setting ex_desc
SEND instructions of gather variant will use the upcoming ARF scalar
register.  They use only Src0 and reuse the bits of Src1.Length (part of
ex_desc).  Src1.Length is (implicitly) defined as 0.

Adapt the helper functions to take the new variant into account when
manipulating ex_desc.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>
2024-12-13 02:18:15 +00:00
Ian Romanick
1b1003ca6f brw/algebraic: Pull brw_constant_fold_instruction out of the switch statement
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
f0bf68dd25 brw/const: Remove TODO that isn't allowed by the hardware
There are a lot of restrictions for bfloat16. The one that prevents this
very useful optimization from being possible is, "Broadcast of bfloat16
scalar is not supported."

Part of the reason this MR exists is to build up to implementing BF
support, and there are a couple more commits that implement
this. However, it fails on both real hardware and simulation:

    Instruction is: mad (8|M0) r6.0<1>:f 0xBF80:bf r2.0<8;1>:f r64.0<0>:f

    In bfloat/float mixed mode, bfloat src must be packed.

Alas.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
99d3755bdd brw/const: Allow HF constants in MAD on Gfx11
These can't mix with F values, but if the non-constant sources are
already HF, this is allowed in src0.

No shader-db changes on any Intel platform.

fossil-db:

Ice Lake
Totals:
Instrs: 236027458 -> 236027442 (-0.00%)
Cycle count: 24515944704 -> 24515945379 (+0.00%)

Totals from 8 (0.00% of 798454) affected shaders:
Instrs: 10226 -> 10210 (-0.16%)
Cycle count: 58567 -> 59242 (+1.15%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
4c462b6b32 brw/const: Allow constants in integer MAD
Nothing can generate this currently, but a future commit will.

The Bspec and experimentation support the following limitations:

- Gfx11: Either src0 or src2 can be W or UW.
- Gfx12: Either src0 or src2 can be W or UW.
- Gfx12.5: Both src0 and src2 can be W or UW.
- Gfx20: Both src0 and src2 can be W or UW.

v2: Add missing break statement.

v3: Leave the MAD handling in the case with the other 3 source
instructions. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
9fa6b68f9e brw/const: Refactor checking whether an immediate source is allowed
Should be no functional change here. This simplifies some later changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
69d74739fd brw/algebraic: Don't restrict MAD(a, b, 1) optimization to float32
This is very unlikely for floating point MAD. At some point I intend
to add internal integer MAD uses, and this could occur there.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
b605f76b2a brw/algebraic: Constant fold multiplicands of MAD
v2: Move the full constant folding part to
brw_constant_fold_instruction. Suggested by Caio. I did this by
extracting the core part of the folding to a helper function.

v3: Delete stale comment. Noticed by Caio.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 18090847 -> 18090843 (<.01%)
instructions in affected programs: 150 -> 146 (-2.67%)
helped: 1 / HURT: 0

total cycles in shared programs: 919664648 -> 919663210 (<.01%)
cycles in affected programs: 3426 -> 1988 (-41.97%)
helped: 1 / HURT: 0

LOST:   1
GAINED: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 220496486 -> 220496403 (-0.00%)
Cycle count: 31610880908 -> 31610879044 (-0.00%); split: -0.00%, +0.00%

Totals from 70 (0.01% of 702439) affected shaders:
Instrs: 47018 -> 46935 (-0.18%)
Cycle count: 6335504 -> 6333640 (-0.03%); split: -0.11%, +0.09%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
3a16ad71b7 brw/copy: Commute immediates for MAD multiplicands
This enables constant combining to do its job.

v2: Restore accidentally deleted line from a comment. Noticed by Caio.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 919668392 -> 919669310 (<.01%)
cycles in affected programs: 10125264 -> 10126182 (<.01%)
helped: 348 / HURT: 194

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Cycle count: 31610720660 -> 31610692748 (-0.00%); split: -0.00%, +0.00%

Totals from 9066 (1.29% of 702433) affected shaders:
Cycle count: 810411934 -> 810384022 (-0.00%); split: -0.01%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
e3e58d6f48 brw: Emit immediate value for MAD in canonical position
No shader-db changes on any Intel platform.

fossil-db:

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Cycle count: 25096109024 -> 25096108722 (-0.00%); split: -0.00%, +0.00%

Totals from 4106 (0.51% of 797610) affected shaders:
Cycle count: 63266176 -> 63265874 (-0.00%); split: -0.01%, +0.01%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
d9b019b683 brw/copy: Don't try to be clever about ADD3 constant propagation
Always propagate into any source. Let commute_immedates and constant
combining sort out the mess. It's literally their job.

No shader-db changes on any Intel platform. The fossil-db changes just
appear to be subtle changes in register allocation if the immediate
source changes from src0 to src2.

v2: Update the comment in commute_immediates. Suggested by Caio.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Cycle count: 31610720510 -> 31610720660 (+0.00%); split: -0.00%, +0.00%

Totals from 8 (0.00% of 702433) affected shaders:
Cycle count: 5522382 -> 5522532 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
a84e3a0f55 brw/const: Allow mixing signed and unsigned immediate sources
No shader-db or fossil-db changes on any Intel platform. This commit
just prevents issues with a later commit, "brw/copy: Don't try to be
clever about ADD3 constant propagation."

v2: Use 'can_promote = true; break;' instead of 'return
true;'. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
a738c55d7b brw/algebraic: Partial constant folding of ADD3
Fold the cases where one of the sources is zero or two of the sources
are constants. Both case will result in a regular ADD.

No shader-db or fossil-db changes on any Intel platform. This commit
just prevents issues with a later commit, "brw/copy: Don't try to be
clever about ADD3 constant propagation."

v2: Move the full constant folding part to
brw_constant_fold_instruction. Suggested by Caio.

v3: Eliminate the impossible src.file == BAD_FILE case in
brw_fs_opt_algebraic. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
c52ce6157f brw/emit: Fix typo in recently added ADD3 assertion
The current assertion fails as soon as a MAD with src0 and src2 being
immediate is detected.

The assertion was supposted to catch, "If it's ADD3, only one of src0
and src2 can be immediate." The detect this, the opcode test should have
been !=.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: c1c09e3c4a ("brw/emit: Add correct 3-source instruction assertions for each platform")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
25de9dcd76 brw/algebraic: Fix MUL constant folding
Some callers of brw_constant_fold_instruction depend on the result being
a MOV of immediate when progress is made. Previously `MUL dst:D src0:D
1:D` would be converted to `MOV dst:D src0:D`. There was also no
handling for `MUL dst:D imm0:D imm1:D`.

This could cause problems if one of the immedate values was -1. The
existing code would convert this to a `MOV dst:D imm0:D` and set the
negate flag on src0. That is not correct.

v2: Fix the is_negative_one case handling of the non-negative-one
source. Add a comment explaining the assertion. Both suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2cc1575a31 ("brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Ian Romanick
086e83ccd9 brw/algebraic: Fix ADD constant folding
Some callers of brw_constant_fold_instruction depend on the result being
a MOV of immediate when progress is made. Previously `ADD dst:D src0:D
0:D` would be converted to `MOV dst:D src0:D`. There was also no
handling for `ADD dst:D imm0:D imm1:D`.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2cc1575a31 ("brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>
2024-12-13 01:24:26 +00:00
Caio Oliveira
c8f6d8154f intel/brw: Remove overloads for brw_print_instruction/s functions
Almost all cases now handled with default arguments.  The only real
extra work that was being done was pushed to the client code in
debug_optimizer().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32596>
2024-12-12 22:01:48 +00:00
Paulo Zanoni
d4a54d4f92 brw: don't read past the end of old_src buffer in resize_sources()
In this case, num_sources is bigger than this->sources, so if we loop
up to num_sources (instead of this->sources) we'll end up reading past
the end of old_src[]. Only copy up to what we originally had.

This was found by code inspection, I'm not aware of any applications
failing due to the lack of this patch.

Fixes: d9e737212d ("intel/brw: Add a src array for the common case in fs_inst")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32600>
2024-12-12 20:33:13 +00:00
Lionel Landwerlin
e0b5179869 blorp: use 2D dimension for 1D tiled images
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 31eeb72e45 ("blorp: Add support for blorp_copy via XY_BLOCK_COPY_BLT")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32608>
2024-12-12 17:10:45 +00:00
Lionel Landwerlin
2bb98a8f99 anv: document UBO descriptor range alignments
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32347>
2024-12-12 07:35:18 +00:00
Lionel Landwerlin
99bb2a087a intel/decoder: fix COMPUTE_WALKER handling
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 17096f87 ("intel: Switch to COMPUTE_WALKER_BODY")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32347>
2024-12-12 07:35:18 +00:00
Kenneth Graunke
6341b3cd87 brw: Combine convergent texture buffer fetches into fewer loads
Borderlands 3 (both DX11 and DX12 renderers) have a common pattern
across many shaders:

  con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture)
  con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture)
  ...
  con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture)
  con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture)

A single basic block contains piles of texelFetches from a 1D buffer
texture, with constant coordinates.  In most cases, only the .x channel
of the result is read.  So we have something on the order of 28 sampler
messages, each asking for...a single uint32_t scalar value.  Because our
sampler doesn't have any support for convergent block loads (like the
untyped LSC transpose messages for SSBOs)...this means we were emitting
SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar,
replicating what's effectively a SIMD1 value to the entire register.
This is hugely wasteful, both in terms of register pressure, and also in
back-and-forth sending and receiving memory messages.

The good news is we can take advantage of our explicit SIMD model to
handle this more efficiently.  This patch adds a new optimization pass
that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic
block, with constant offsets, from the same texture.  It constructs a
new divergent coordinate where each channel is one of the constants
(i.e <10, 11, 12, ..., 26> in the above example).  It issues a new
NoMask divergent texel fetch which loads N useful channels in one go,
and replaces the rest with expansion MOVs that splat the SIMD1 result
back to the full SIMD width.  (These get copy propagated away.)

We can pick the SIMD size of the load independently of the native shader
width as well.  On Xe2, those 28 convergent loads become a single SIMD32
ld message.  On earlier hardware, we use 2 SIMD16 messages.  Or we can
use a smaller size when there aren't many to combine.

In fossil-db, this cuts 27% of send messages in affected shaders, 3-6%
of cycles, 2-3% of instructions, and 8-12% of live registers.  On A770,
this improves performance of Borderlands 3 by roughly 2.5-3.5%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>
2024-12-12 00:05:42 +00:00
Caio Oliveira
abe41b1d2c intel/compiler: Use #pragma once instead of header guards
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32534>
2024-12-11 19:47:44 +00:00
Tapani Pälli
97fc987497 intel/dev: update mesa_defs.json from internal database
This updates entry for 14017823839 which fixes issues on BMG with:
   dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32550>
2024-12-11 17:32:52 +00:00
Caio Oliveira
638802d68f intel/brw: Dump errors when brw_assemble() fails EU validation
This will allow executor to show proper inline errors.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32490>
2024-12-10 20:23:25 +00:00
Jordan Justen
1027b071f9 intel/dev: Add intel_check_hwconfig_items()
Rather than checking hwconfig items when using them, wait until after
devinfo has been fully initialized. This includes having workarounds
implemented.

We can then check if the hwconfig data and final Mesa initialization
agree. If the match fails, we need to investigate if Mesa or the
hwconfig data is wrong.

This code becomes a no-op when not on a release build.

Fixes: a4c5bfd34c ("intel/dev: Use hwconfig for urb min/max entry values")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12141
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>
2024-12-10 09:01:45 +00:00
Jordan Justen
4eb10bc25e intel/dev: Don't process hwconfig table to apply items when not required
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>
2024-12-10 09:01:45 +00:00
Jordan Justen
5a8107cef4 intel/dev: Split apply and check paths for hwconfig
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>
2024-12-10 09:01:45 +00:00
Jordan Justen
832de579e1 intel/dev: Split hwconfig warning check into hwconfig_item_warning()
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>
2024-12-10 09:01:45 +00:00
Benjamin Lee
74ccf6cbdc nir: add option to use compact view indices
In panvk we pass absolute view indices to the hardware, so we need to do
the conversion from compacted to absolute at some point. Emitting
absolute indices from nir_lower_multiview initially looks like the
simplest option, but nir_lower_io_to_temporaries will emit a write for
every element of array varyings. This results in unnecessary writes to
disabled views.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
becb014d27 nir: treat per-view outputs as arrayed IO
This is needed for implementing multiview in panvk, where the address
calculation for multiview outputs is not well-represented by lowering to
nir_intrinsic_store_output with a single offset.

The case where a variable is both per-view and per-{vertex,primitive} is
now unsupported. This would come up with drivers implementing
NV_mesh_shader or using nir_lower_multiview on geometry, tessellation,
or mesh shaders. No drivers currently do either of these. There was some
code that attempted to handle the nested per-view case by unwrapping
per-view/arrayed types twice, but it's unclear to what extent this
actually worked.

ANV and Turnip both rely on per-view outputs being assigned a unique
driver location for each view, so I've added on option to configure that
behavior rather than removing it.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
975c3ecd1e nir: handle arbitrary per-view outputs in nir_lower_multiview
This is needed for panvk, where multiview is "all or nothing". When
multiview is enabled, all outputs may be written with separate values
for each view.

The edge case mentioned in the previous `nir_can_lower_multiview` is now
handled because we now handle an arbitrary number of per-view output
vars instead of expecting to find exactly one.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Mi, Yanfeng
06d3eb8e01 anv:increase instruction heap to 3Gb
Black Myth Wukong is generating more than 2Gb of shaders in
pre-compiling stage after VK_EXT_shader_image_atomic_int64 extension
enabled. Driver will crash in create shader stages due to dereference
null pointer of kernel map.

Signed-off-by: Mi, Yanfeng <yanfeng.mi@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32548>
2024-12-09 19:14:38 +00:00
Mi, Yanfeng
0a5a04f509 anv:Fix memory grow calculation overflow issue
when old buffer size is large than 2G, 32bit cannot hold
2 times buffer size (>4G).

Fixes: 8d813a90d6 ("anv: fail pool allocation when over the maximal size")

Signed-off-by: Mi, Yanfeng <yanfeng.mi@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32551>
2024-12-09 18:49:17 +00:00
Paulo Zanoni
0dc2a5808e brw: don't forget the base when emitting SHADER_OPCODE_MOV_RELOC_IMM
The last argument seems to be used as brw_shader_reloc::delta (from
brw_add_reloc), and we're unconditionally setting it to 0 here, while
the other place where we handle nir_intrinsic_load_reloc_const_intel
seems to be setting the base appropriately.

I found this by inspection while debugging a bug related to this code,
so I'm not aware of any workloads that get improved by this patch.

Related patches:
 - ecbec25e84 ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic")
 - 99047451c9 ("intel/fs: add plumbing for embedded samplers")

Fixes: ecbec25e84 ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32531>
2024-12-09 15:45:49 +00:00
Lionel Landwerlin
de00fe3f66 anv: add BVH building tracking through u_trace
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32483>
2024-12-09 14:45:00 +00:00
José Roberto de Souza
2aae000edb intel/dev/xe: Fix size of eu_per_dss_mask
Real Xe KMD actually returns a uint64, so here changing from uint32
to uint64.

Fixes: 04bdbeec31 ("intel/dev/xe: Fix access to eu_per_dss_mask")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32527>
2024-12-06 19:52:50 +00:00
Alyssa Rosenzweig
972f8aa287 vulkan: rename depth bias graphics states
"constant" is a special keyword in OpenCL C, and we'd like to #define it
suitably in host C23 to facilitate compatiblity between host/device headers.
That means we can't have any identifiers named "global" or "constant".
Fortunately, this is the only 'constant' in any file I'm hitting.

To avoid the clash, don't abbreviate "constant factor", use "constant_factor"
instead. For consistency, "slope factor" then becomes "slope_factor".
The new names are longer but match the Vulkan API exactly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [Intel]
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> [NVK and panvk]
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [V3DV]
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com> [IMG]
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505>
2024-12-06 13:48:26 -05:00