Commit graph

90 commits

Author SHA1 Message Date
Sagar Ghuge
cb423ee636 anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921
WA states that we need to allocate maximum number of stackIDs per DSS
from RT_DISPATCH_GLOBALS to 2048.

We can still throttle/control the CFE_STATE::StackID to be in range
specified by the field.

This does impact performance having CFE_STATE::stackIDs capped to 2K
by default. More the outstanding ray queries, larger the working set and
have more impact on cache hit rate.

This affect performance on Xe2+ onwards:
* Boundary Benchmark:            36.2%
* Solar Bay extreme:             9.8%
* Hitman world of assassination: 3.9%

Fixes: c1a44e8d43 ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40310>
2026-03-10 22:41:54 +00:00
Sagar Ghuge
3a62dc0218 anv: Set max outstanding ray queries to 1024
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Set max outstanding ray queries to 1024. This value can be tuned later
specific to apps.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40182>
2026-03-04 21:45:14 +00:00
Lionel Landwerlin
db964068bf anv: add drirc option to workaround missing application barriers on typed/untyped data
Enable it for Horizon Forbidden West (only seems to have untyped data
issue).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14889
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40187>
2026-03-04 20:40:59 +00:00
Jordan Justen
0b94b15a3c anv: Add Xe3P (GFX_VERx10==350)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40208>
2026-03-04 11:10:34 -08:00
Caio Oliveira
df4042371f anv: Set PIPELINE_SELECT systolic mode based on shader usage
For Gfx125 workloads that use systolic mode, this might mean
an extra PIPELINE_SELECT when flipping between a compute shader
that use the mode and another that doesn't use the mode
(or vice-versa).

Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40014>
2026-02-26 19:05:56 +00:00
Lionel Landwerlin
487586fefa anv: implement inline parameter promotion from push constants
Push constants on bindless stages of Gfx12.5+ don't get the data
delivered in the registers automatically. Instead the shader needs to
load the data with SEND messages.

Those stages do get a single InlineParameter 32B block of data
delivered into the EU. We can use that to promote some of the push
constant data that has to be pulled otherwise.

The driver will try to promote all push constant data (app + driver
values) if it can, if it can't it'll try to promote only the driver
values (usually a shader will only use a few driver values). If even
the drivers values won't fit, give up and don't use the inline
parameter at all.

LNL internal fossil-db:

Totals from 315738 (20.08% of 1572649) affected shaders:
Instrs: 155053691 -> 154920901 (-0.09%); split: -0.09%, +0.00%
CodeSize: 2578204272 -> 2574991568 (-0.12%); split: -0.15%, +0.02%
Send messages: 8235628 -> 8184485 (-0.62%); split: -0.62%, +0.00%
Cycle count: 43911938816 -> 43901857748 (-0.02%); split: -0.05%, +0.03%
Spill count: 481329 -> 473185 (-1.69%); split: -1.82%, +0.13%
Fill count: 405617 -> 399243 (-1.57%); split: -1.86%, +0.28%
Max live registers: 34309395 -> 34309300 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 8298224 -> 8299168 (+0.01%)
Non SSA regs after NIR: 18492887 -> 17631285 (-4.66%); split: -4.73%, +0.08%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
2026-02-25 10:44:09 +00:00
Lionel Landwerlin
4fa1eddb4c anv: optimize binding table flushing
Split emission from pointers programming.

That way we can switch back & forth between blorp & applications
shaders and never emit binding tables, we just reprogram the pointers.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
2026-02-25 10:44:05 +00:00
Lionel Landwerlin
79a56ef448 anv: add a debug printout for dirty descriptors
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
2026-02-25 10:44:04 +00:00
Lionel Landwerlin
2ef29502ed brw: enable ex_bso for LSC_SS
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:22 +00:00
Calder Young
895ff7fe92 Revert "anv,brw: Allow multiple ray queries without spilling to a shadow stack"
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization doesn't work when the ray query index isn't uniform across
the subgroup, which is something the spec allows. While there are some smart
ways to fix this and still avoid unnecessary spilling, its not worth investing
the time until we find a realtime raytracing workload that actually needs to
use multiple live ray queries for something.

Fixes: 1f1de7eb ("anv,brw: Allow multiple ray queries without spilling to a shadow stack")
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39445>
2026-01-23 21:33:55 +00:00
Tapani Pälli
840e6e855b anv: add handling for Wa_14026600921
This is the Xe3 version of the earlier workaround.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39404>
2026-01-23 11:10:07 +00:00
Calder Young
d69daf28d0 anv,brw: Add helper to get stack ids per dss for ray queries
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38778>
2026-01-16 09:21:50 +00:00
Calder Young
1f1de7ebd6 anv,brw: Allow multiple ray queries without spilling to a shadow stack
Allows a shader to have multiple ray queries without spilling them to a shadow
stack. Instead, the driver provides the shader with an array of multiple
RTDispatchGlobals structs to give each query its own dedicated stack.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38778>
2026-01-16 09:21:50 +00:00
Lionel Landwerlin
6d19b898e7 anv/brw: prep work for SIMD32 ray queries
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36181>
2026-01-12 12:19:21 +00:00
Lionel Landwerlin
f2c571fabf anv: add tracking of involved stages in pipe flushes
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
2025-12-15 08:25:32 +00:00
Lionel Landwerlin
578d2f0daa anv: move load_num_workgroups tracking to driver
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38735>
2025-12-02 22:44:04 +00:00
Lionel Landwerlin
c478b6355a anv/blorp/iris: rework Wa_14025112257
Drivers already have to track this workaround, so remove the logic
from Blorp and let the driver manage this.

Also in Anv don't accumulate this workaround, emit it directly in
place right after COMPUTE_WALKER. Accumulating can be problematic when
you want to dispatch concurrent compute shaders that do not need any
cache flush interaction (typical example with the internal
simple_shader framework).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3e0ad0176b ("anv: Emit state cache invalidation after every compute dispatch")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38306>
2025-11-10 08:57:06 +00:00
José Roberto de Souza
a21b925caa anv: Rename anv_shader_bin to anv_shader_internal
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It is now only used by internal shaders to the rename make it more clear.

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37749>
2025-10-08 19:58:30 +00:00
Dylan Baker
1c930a505e anv: don't attempt to memcpy if allocation fails
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Based on git history thhese appears to be a subset of
`anv_batch_emit_batch`, so I've structured the code similarly, if
`anv_batch_emit_dwords` returns `nullptr`, we just move on without
copying the memory.

CID: 1665339
CID: 1664814
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37534>
2025-09-24 15:29:48 +00:00
Lionel Landwerlin
e76ed91d3f anv: switch over to runtime pipelines
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:20 +00:00
Sagar Ghuge
3e0ad0176b anv: Emit state cache invalidation after every compute dispatch
Implement HSD 16028171704/14025112257:
   LSC state cache livelock:- Once state cache entries are full,
   subsequent walker dispatches with two threads per thread group maybe
   gets stuck infinitely because of state cache live lock.

   One thread continuously stuck in loop doing UGM fence + evict and UGM
   read is waiting on UGM read to have certain value. while other thread
   supposed to update the value that first thread is waiting for. But
   since entries are full in state cache, there is second thread never
   make progress.

Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
2025-09-04 00:14:48 +00:00
Sagar Ghuge
cac3b4f404 anv: Mask off excessive invocations
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
For unaligned invocations, don't launch two COMPUTE_WALKER, instead we
can mask off excessive invocations in the shader itself at nir level and
launch one additional workgroup.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36245>
2025-08-12 23:17:02 +00:00
Lionel Landwerlin
5a2fb0da32 anv: actually use the COMPUTE_WALKER_BODY prepacked field
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36711>
2025-08-11 11:14:52 +00:00
Lionel Landwerlin
e7aeed1f09 anv: pass active stages to push descriptor flushing
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36711>
2025-08-11 11:14:51 +00:00
Qiang Yu
196569b1a4 all: rename gl_shader_stage to mesa_shader_stage
It's not only for GL, change to a generic name.

Use command:
  find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:40 +08:00
Lionel Landwerlin
8966088cc5 anv: store gfx/compute bound shaders on command buffer state
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36512>
2025-08-01 11:35:08 +00:00
Lionel Landwerlin
094ddc35cc anv: constify some helpers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36512>
2025-08-01 11:35:08 +00:00
Lionel Landwerlin
18f234a8a2 anv: avoid looking at the pipeline to flush push descriptors
We do this at the cost of recomputing some values that where available
on the pipeline at vkCmdBindPipeline() time.

We can look at the shaders on graphics/compute which will work nicely
with the runtime.

The runtime doesn't have support for ray tracing pipelines so we keep
using them.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36512>
2025-08-01 11:35:07 +00:00
Lionel Landwerlin
99016a893a anv: avoid storing L3 config on the pipeline
On Gfx9 we only use 2 L3 config depending on SLM use or not. So it's
the same config for all Gfx pipelines.

On Gfx11+ there is only one config (since SLM is allocated from
somewhere else).

So avoid store this on the pipeline, pick the config when flushing the
pipeline.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36512>
2025-08-01 11:35:05 +00:00
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00
Lionel Landwerlin
ac78693b6a intel/genxml: rename body field
So that the body field has the same name in COMPUTE_WALKER &
EXECUTE_INDIRECT_DISPATCH.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36146>
2025-07-16 01:01:11 +00:00
Sagar Ghuge
e761c45390 anv: Set TG size based on number of threads
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Series shows improvement on
TotalWarPharaoh-trace-dx11-1440p-ultra-n=2080 title by 0.96% (not a lot
but still it's improvement, so will take that.)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35904>
2025-07-10 22:08:36 +00:00
José Roberto de Souza
59019a05f6 anv: Program DispatchWalkOrder and ThreadGroupBatchSize with optimized values for regular computer walkers
It was only added to indirect compute walkers while HSD don't say
anything about this optimization be specific to indirect compute
walkers.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36058>
2025-07-10 20:54:30 +00:00
Sushma Venkatesh Reddy
29fc96cb80 anv: Add GPU breakpoint before/after specific compute dispatch call
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13089
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35353>
2025-07-07 17:43:41 +00:00
José Roberto de Souza
bdd20457ed anv: Emit STATE_COMPUTE_MODE before COMPUTE_WALKER when new async compute limits are needed
Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:25 +00:00
Sagar Ghuge
3696f85b63 anv: Drop unused helper cmd_buffer_dispatch_kernel
Drop some more unused fields: (Lionel)
- kernel_args_size, kernel_arg_count & kernel_args
- anv_kernel_arg
- anv_kernel
- max_grl_scratch_size

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35530>
2025-06-16 15:22:09 +00:00
Iván Briano
99405647a4 anv: vkCmdTraceRays* are not covered by conditional rendering
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The spec says:
  Certain rendering commands can be executed conditionally based on a
  value in buffer memory. These rendering commands are limited to
  drawing commands, dispatching commands, and clearing attachments with
  vkCmdClearAttachments within a conditional rendering block which is
  defined by commands vkCmdBeginConditionalRenderingEXT and
  vkCmdEndConditionalRenderingEXT. Other rendering commands remain
  unaffected by conditional rendering.

It would seem that vkCmdTraceRays* are not covered by that.

Fixes new tests dEQP-VK.conditional_rendering.conditional_ignore.trace_rays*

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34864>
2025-05-08 21:08:06 +00:00
Sagar Ghuge
0463e14b94 anv: Enable 64bit memory structure mode for RT
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>
2025-04-21 20:10:45 +00:00
Sagar Ghuge
6deb1950a4 anv: Update RT dispatch globals to use 64bit data structure
Rework (Kevin)
- Fix Hit/Miss/Resume shader group table value

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>
2025-04-21 20:10:45 +00:00
Lionel Landwerlin
72bc74f0be anv: add shader-hash debug option
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Emits a dummy MI_STORE_DATA_IMM with the shader hash in front of :
   - 3DSTATE_VS
   - 3DSTATE_HS
   - 3DSTATE_DS
   - 3DSTATE_HS
   - 3DSTATE_PS
   - COMPUTE_WALKER / GPGPU_WALKER

Example :

0x00000000:  0x10000002:  MI_STORE_DATA_IMM
0x00000000:  0x10000002 : Dword 0
    DWord Length: 2
    Force Write Completion Check : false
    Store Qword: 0
    Use Global GTT: false
0x00000004:  0xffffe0c0 : Dword 1
    Core Mode Enable: 0
0x00000008:  0x0000effe : Dword 2
    Address: 0xeffeffffe0c0
0x0000000c:  0x126e815a : Dword 3  <------------ shader hash
0x00000010:  0x78100007 : Dword 4
    Immediate Data: 309231962
0x00000000:  0x78100007:  3DSTATE_VS
0x00000000:  0x78100007 : Dword 0
    DWord Length: 7
0x00000004:  0x00000000 : Dword 1
0x00000008:  0x00000000 : Dword 2
    Kernel Start Pointer: 0x00000000
0x0000000c:  0x00040000 : Dword 3
    Software Exception Enable: false
    Accesses UAV: false

It'll correlate with the value emitted in the pipeline stats from fossil replay :

  $ grep -i 126e815a /tmp/stats.csv
  fossilize.aab93c5c3f965151.1.foz,GRAPHICS,de1b925dec8a8083,507378,498283,303434,vertex,8,50,4,0,1826,0,0,0,8,17,0,0x00000000126e815a,15

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34332>
2025-04-04 15:18:28 +00:00
Caleb Callaway
c37ece75ea anv: add INTEL_DEBUG=rt_notrace
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34169>
2025-03-26 00:52:53 +00:00
Lionel Landwerlin
e4f31b8744 intel/ds: rework RT tracepoints
That way we can identify single dispatch within each step.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Michael Cheng <michael.cheng@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33684>
2025-02-24 08:08:02 +00:00
Lionel Landwerlin
84f96a0199 anv: switch to use brw's prog_data source_hash
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Michael Cheng <michael.cheng@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33643>
2025-02-22 08:30:22 +00:00
Lionel Landwerlin
d75849aaea anv: make compute state flush helper visible
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33550>
2025-02-15 18:38:24 +02:00
Lionel Landwerlin
9aef4ceb13 anv: hold a prepacked COMPUTE_WALKER instruction on CS pipelines
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33550>
2025-02-15 18:38:18 +02:00
Lionel Landwerlin
a8b84e1898 anv: use A64 messages for push constants loads on Gfx12.5+
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32895>
2025-02-05 09:56:04 +00:00
Tapani Pälli
4e80045ae0 intel/genxml/anv: fix the layout of call stack handler struct
Patch adds new CALL_STACK_HANDLER struct which has offset to
start and end of RegistersPerThread field, this spec changes is
described in Wa_22019854901 (see HSD 22019967134).

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33342>
2025-02-04 08:44:04 +00:00
Francisco Jerez
b25d0f899b anv/xe3+: Set RegistersPerThread during shader state setup based on prog_data.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
2025-01-29 23:39:32 +00:00
Michael Cheng
c3c05ffb5f intel : Expose Shader hashes for utrace and Perfetto
This patch exposes shader hashes (computes and draws) to Perfetto and
utrace. By including these hashes in traces, developers can correlate
compute and draw calls with their assoicated ASM dumps when analyzing
the traces.

To achieve this, intel_tracepoint.py has been reworked to preprocess
tracepoint arguments dynamically. Any argument containing "hash" in its
variable name is now forrmated as hexadecimal before being passed to the
tracepoint definition.

Signed-off-by: Michael <michael.cheng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32708>
2025-01-10 17:38:16 +00:00
Sagar Ghuge
d3f9139e49 intel: Use Morton compute walk order
According to HSD 14016252163 if compute shader uses the sample
operation, morton walk order and set the thread group batch size to 4 is
expected to increase sampler cache hit rates by increasing sample
address locality within a subslice.

Rework:
 * Caio: "||" => "&&" for type checking in instr_uses_sampler()
 * Jordan: Use nir's foreach macros rather than
   nir_shader_lower_instructions()

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>
2024-12-12 19:56:47 -08:00