Commit graph

14562 commits

Author SHA1 Message Date
Mel Henning
17876a00af nir: Add a faster lowest common ancestor algorithm
On a fossil from the blender 4.5.0 vulkan backend, this improves compile
times in nak by about 17%. Compile time of other shaders improves by a
more modest 1.2%.

No stat changes on shader-db.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
2025-09-08 23:03:13 +00:00
Eric Engestrom
03ff41e747 intel/perf: fix enum type for eu stall props
src/intel/perf/xe/intel_perf.c:420:27: warning: implicit conversion from enumeration type 'enum drm_xe_eu_stall_property_id' to different enumeration type 'enum drm_xe_oa_property_id' [-Wenum-conversion]

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37172>
2025-09-08 14:12:13 +00:00
Caio Oliveira
f37c9c873c brw: Fix printing of blocks in disassembly when BRW is available
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
When disassembling and BRW IR is available (which happens in the
generator), there will be pointers to the BRW's basic block structures
that are used to print the block numbers and predecessor/successors
in the output.

There are two challenges:

- Because DO and FLOW instructions are not real instructions, they are
  not emitted in the output but would still cause the output to contain
  empty blocks.  Previous code accounted for DO but still had problems.

- DO blocks have special physical links that don't make sense when the
  DO is not emitted at the end, but they would be shown even if that
  block was omitted.

These issues can be seen here (edited to remove non-essential bits)

```
   START B0 (2 cycles)
mov(8)          g126<1>UD       0x3f800000UD
   END B0 ->B1
   START B2 <-B1 <-B4 (0 cycles)
   END B2 ->B3
   START B3 <-B2 (260 cycles)

LABEL1:
mov(8)          g1<1>D          0D
cmp.ge.f0.0(8)  null<1>D        g2<0,1,0>D      10D
sync nop(1)                     null<0,1,0>UB
send(1)         g0UD            g1UD            nullUD
(+f0.0) break(8) JIP:  LABEL0         UIP:  LABEL0
   END B3 ->B1 ->B5 ->B4
   START B4 <-B3 (1000 cycles)
sync nop(1)                     null<0,1,0>UB
mov(8)          g126<1>UD       g0<0,1,0>UD

LABEL0:
while(8)        JIP:  LABEL1
   END B4 ->B2
   START B5 <-B1 <-B3 (20 cycles)
```

For example:
- Block 1 is missing (a skipped DO block)
- Block 2 is empty (it was a FLOW block)
- Block 3 ends with a link to Block 1 (the special links involving DO
  blocks).

Two key changes were made to fix this.  First, skip the DO and FLOW
blocks completely.  The use_tail ensures that the instruction group is
reused to avoid empty blocks.  Second, when printing, the successors and
predecessors, walk through the skipped blocks.  And finally, don't print
the special blocks.

With the fix, here's the output.  Note the blocks retain their original
BRW IR number.

```
   START B0 (2 cycles)
mov(8)          g127<1>UD       0x3f800000UD
   END B0 ->B3
   START B3 <-B0 <-B4 (260 cycles)

LABEL1:
mov(8)          g1<1>D          0D
cmp.ge.f0.0(8)  null<1>D        g2<0,1,0>D      10D
sync nop(1)                     null<0,1,0>UB
send(1)         g0UD            g1UD            nullUD
(+f0.0) break(8) JIP:  LABEL0         UIP:  LABEL0
   END B3 ->B5 ->B4
   START B4 <-B3 (1000 cycles)
sync nop(1)                     null<0,1,0>UB
mov(8)          g127<1>UD       g0<0,1,0>UD

LABEL0:
while(8)        JIP:  LABEL1
   END B4 ->B3
   START B5 <-B3 (20 cycles)
```

Issue was spotted by Ken.

Fixes: d2c39b1779 ("intel/brw: Always have a (non-DO) block after a DO in the CFG")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36226>
2025-09-06 16:42:05 +00:00
Georg Lehmann
87f451aa39 intel/ci: update restricted trace checksums
Caused by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37113

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37211>
2025-09-06 11:59:16 +02:00
Faith Ekstrand
446d5ef103 vulkan: Drop the driver_internal from vk_image_view_init/create()
It alwways comes in through the create flags now.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36957>
2025-09-05 23:34:14 +00:00
Christoph Pillmayer
f81f3c85e2 nir/opt_algebraic: Convert a + b + a to b + 2a
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This allows fusing into one FMA later.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37113>
2025-09-05 11:39:51 +00:00
Lionel Landwerlin
07039cdb3d anv: fixup robust_ubo_range mask
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7e48f79b7 ("brw,anv: Reduce UBO robustness size alignment to 16 bytes")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13834
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37183>
2025-09-05 08:56:47 +00:00
Lionel Landwerlin
d8add9866b anv: add an undocumented HW workaround for Gfx12.5
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:21 +00:00
Lionel Landwerlin
4314c891f4 anv: expose VK_EXT_shader_object
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:21 +00:00
Lionel Landwerlin
1de9f367e8 anv: remove unused gfx/compute pipeline code
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:20 +00:00
Lionel Landwerlin
e76ed91d3f anv: switch over to runtime pipelines
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:20 +00:00
Lionel Landwerlin
4d9dd5c3a2 anv: store a few default instructions
We will use those where no associated shaders is active but we still
need some default values programmed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:20 +00:00
Lionel Landwerlin
69b6b4cb28 anv: add shader instruction emission
Should replace much of genX_pipeline.c

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:19 +00:00
Lionel Landwerlin
8f4c2bd566 anv: add runtime shader statistic support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:19 +00:00
Lionel Landwerlin
91abb0e0af anv: move internal RT shaders around
anv_pipeline.c is about to go.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:18 +00:00
Lionel Landwerlin
d39e443ef8 anv: add infrastructure for common vk_pipeline
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:18 +00:00
Lionel Landwerlin
50fd669294 anv: prep work for separate tessellation shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:17 +00:00
Lionel Landwerlin
a91e0e0d61 brw: add support for separate tessellation shader compilation
Tessellation factors have to be written dynamically (based on the next
shader primitive topology) and the builtins read using a dynamic
offset (based on the preceeding shader's VUE).

Anv is updated to use this new infrastructure for dynamic
patch_control_points.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:17 +00:00
Lionel Landwerlin
a18835a9ca anv/brw/iris: move VS VUE computation to backend
Drivers can provide the inputs required for the backend to call the
compute function.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:16 +00:00
Lionel Landwerlin
8dee4813b0 brw: add ability to compute VUE map for separate tcs/tes
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
2025-09-05 07:46:16 +00:00
Ian Romanick
1ce90ad5e1 elk: Use nir_opt_sink and more nir_opt_move
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
I spent a bunch of time playing around with the various enable bits, and
this was the best I could come up with. Enabling any of
nir_move_comparisons or nir_move_load_ubo in nir_opt_sink helped
instructions quite a bit, but it also caused a large pile of added
spills and fills.

shader-db:

Broadwell
total instructions in shared programs: 18428980 -> 18427957 (<.01%)
instructions in affected programs: 425245 -> 424222 (-0.24%)
helped: 1522 / HURT: 405

total cycles in shared programs: 954756705 -> 953755695 (-0.10%)
cycles in affected programs: 623470486 -> 622469476 (-0.16%)
helped: 17989 / HURT: 21175

total spills in shared programs: 8349 -> 8356 (0.08%)
spills in affected programs: 285 -> 292 (2.46%)
helped: 7 / HURT: 13

total fills in shared programs: 10426 -> 10192 (-2.24%)
fills in affected programs: 675 -> 441 (-34.67%)
helped: 25 / HURT: 1

LOST:   346
GAINED: 554

Haswell
total instructions in shared programs: 16809730 -> 16801634 (-0.05%)
instructions in affected programs: 772251 -> 764155 (-1.05%)
helped: 3055 / HURT: 840

total cycles in shared programs: 945179935 -> 944315696 (-0.09%)
cycles in affected programs: 549177588 -> 548313349 (-0.16%)
helped: 34143 / HURT: 23605

total spills in shared programs: 7699 -> 7666 (-0.43%)
spills in affected programs: 353 -> 320 (-9.35%)
helped: 10 / HURT: 16

total fills in shared programs: 8184 -> 7671 (-6.27%)
fills in affected programs: 1006 -> 493 (-50.99%)
helped: 30 / HURT: 2

total sends in shared programs: 1016676 -> 1016682 (<.01%)
sends in affected programs: 49 -> 55 (12.24%)
helped: 0 / HURT: 6

LOST:   415
GAINED: 441

Ivy Bridge
total instructions in shared programs: 15764955 -> 15757178 (-0.05%)
instructions in affected programs: 707453 -> 699676 (-1.10%)
helped: 2893 / HURT: 547

total cycles in shared programs: 430017934 -> 429720104 (-0.07%)
cycles in affected programs: 251816726 -> 251518896 (-0.12%)
helped: 33110 / HURT: 22056

total spills in shared programs: 1537 -> 1525 (-0.78%)
spills in affected programs: 18 -> 6 (-66.67%)
helped: 6 / HURT: 0

total fills in shared programs: 926 -> 905 (-2.27%)
fills in affected programs: 24 -> 3 (-87.50%)
helped: 6 / HURT: 0

total sends in shared programs: 816646 -> 816652 (<.01%)
sends in affected programs: 49 -> 55 (12.24%)
helped: 0 / HURT: 6

LOST:   332
GAINED: 417

Sandy Bridge
total instructions in shared programs: 14055229 -> 14045281 (-0.07%)
instructions in affected programs: 1436142 -> 1426194 (-0.69%)
helped: 5858 / HURT: 757

total cycles in shared programs: 772123170 -> 813543451 (5.36%)
cycles in affected programs: 521342483 -> 562762764 (7.94%)
helped: 27928 / HURT: 35923

total spills in shared programs: 1742 -> 1741 (-0.06%)
spills in affected programs: 66 -> 65 (-1.52%)
helped: 1 / HURT: 0

total fills in shared programs: 970 -> 967 (-0.31%)
fills in affected programs: 93 -> 90 (-3.23%)
helped: 1 / HURT: 0

total sends in shared programs: 1239222 -> 1238992 (-0.02%)
sends in affected programs: 6137 -> 5907 (-3.75%)
helped: 342 / HURT: 112

LOST:   244
GAINED: 434

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8366385 -> 8363954 (-0.03%)
instructions in affected programs: 162761 -> 160330 (-1.49%)
helped: 600 / HURT: 195

total cycles in shared programs: 248992618 -> 252119334 (1.26%)
cycles in affected programs: 50774708 -> 53901424 (6.16%)
helped: 3435 / HURT: 5131

total sends in shared programs: 623693 -> 623681 (<.01%)
sends in affected programs: 351 -> 339 (-3.42%)
helped: 12 / HURT: 0

LOST: 0
GAINED: 6

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25463>
2025-09-04 15:01:18 -07:00
Ian Romanick
6f30cf71fe brw: Use nir_opt_sink and more nir_opt_move
The shader-db results on most platforms are pretty mixed. However, this
seems to be a decent improvement in fossil-db.

shader-db::

Lunar Lake
total instructions in shared programs: 17019147 -> 17023017 (0.02%)
instructions in affected programs: 1200847 -> 1204717 (0.32%)
helped: 814 / HURT: 2458

total cycles in shared programs: 880532116 -> 880406462 (-0.01%)
cycles in affected programs: 798253846 -> 798128192 (-0.02%)
helped: 30064 / HURT: 33008

total spills in shared programs: 3262 -> 3260 (-0.06%)
spills in affected programs: 66 -> 64 (-3.03%)
helped: 1 / HURT: 2

total fills in shared programs: 1616 -> 1637 (1.30%)
fills in affected programs: 89 -> 110 (23.60%)
helped: 1 / HURT: 2

LOST:   241
GAINED: 356

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19859724 -> 19865383 (0.03%)
instructions in affected programs: 2166810 -> 2172469 (0.26%)
helped: 942 / HURT: 3563

total cycles in shared programs: 879095859 -> 878616086 (-0.05%)
cycles in affected programs: 753840990 -> 753361217 (-0.06%)
helped: 33442 / HURT: 35053

total spills in shared programs: 4679 -> 4677 (-0.04%)
spills in affected programs: 80 -> 78 (-2.50%)
helped: 1 / HURT: 2

total fills in shared programs: 4113 -> 4175 (1.51%)
fills in affected programs: 87 -> 149 (71.26%)
helped: 1 / HURT: 2

LOST:   706
GAINED: 563

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20610947 -> 20615741 (0.02%)
instructions in affected programs: 2138334 -> 2143128 (0.22%)
helped: 979 / HURT: 3635

total cycles in shared programs: 863103771 -> 862153697 (-0.11%)
cycles in affected programs: 731626072 -> 730675998 (-0.13%)
helped: 34060 / HURT: 34256

total spills in shared programs: 3992 -> 3949 (-1.08%)
spills in affected programs: 504 -> 461 (-8.53%)
helped: 8 / HURT: 6

total fills in shared programs: 3640 -> 3573 (-1.84%)
fills in affected programs: 1505 -> 1438 (-4.45%)
helped: 8 / HURT: 5

LOST:   622
GAINED: 1018

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 232649299 -> 232485503 (-0.07%); split: -0.16%, +0.09%
Subgroup size: 15932144 -> 15933056 (+0.01%); split: +0.01%, -0.00%
Loop count: 137431 -> 137430 (-0.00%)
Cycle count: 32619860020 -> 32714539770 (+0.29%); split: -0.80%, +1.09%
Spill count: 540835 -> 519861 (-3.88%); split: -4.79%, +0.91%
Fill count: 700278 -> 663650 (-5.23%); split: -6.46%, +1.23%
Scratch Memory Size: 37258240 -> 35654656 (-4.30%); split: -5.24%, +0.94%
Max live registers: 72561256 -> 71501759 (-1.46%); split: -1.62%, +0.16%
Non SSA regs after NIR: 67682385 -> 67692495 (+0.01%); split: -0.00%, +0.02%

Totals from 617432 (78.20% of 789594) affected shaders:
Instrs: 217754449 -> 217590653 (-0.08%); split: -0.17%, +0.10%
Subgroup size: 12656912 -> 12657824 (+0.01%); split: +0.01%, -0.00%
Loop count: 133283 -> 133282 (-0.00%)
Cycle count: 32367979192 -> 32462658942 (+0.29%); split: -0.81%, +1.10%
Spill count: 540770 -> 519796 (-3.88%); split: -4.79%, +0.91%
Fill count: 700277 -> 663649 (-5.23%); split: -6.46%, +1.23%
Scratch Memory Size: 37182464 -> 35578880 (-4.31%); split: -5.25%, +0.94%
Max live registers: 64912683 -> 63853186 (-1.63%); split: -1.81%, +0.18%
Non SSA regs after NIR: 60158776 -> 60168886 (+0.02%); split: -0.00%, +0.02%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25463>
2025-09-04 15:01:18 -07:00
Lionel Landwerlin
262baafe27 anv: fix partial queries
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Partial results should be computed for all types of queries.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36916>
2025-09-04 13:25:26 +03:00
Sagar Ghuge
ebbc358db5 blorp: Emit state cache invalidation after every compute dispatch
Implement HSD 16028171704/14025112257:
   LSC state cache livelock:- Once state cache entries are full,
   subsequent walker dispatches with two threads per thread group maybe
   gets stuck infinitely because of state cache live lock.

   One thread continuously stuck in loop doing UGM fence + evict and UGM
   read is waiting on UGM read to have certain value. while other thread
   supposed to update the value that first thread is waiting for. But
   since entries are full in state cache, there is second thread never
   make progress.

Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
2025-09-04 00:14:48 +00:00
Sagar Ghuge
3e0ad0176b anv: Emit state cache invalidation after every compute dispatch
Implement HSD 16028171704/14025112257:
   LSC state cache livelock:- Once state cache entries are full,
   subsequent walker dispatches with two threads per thread group maybe
   gets stuck infinitely because of state cache live lock.

   One thread continuously stuck in loop doing UGM fence + evict and UGM
   read is waiting on UGM read to have certain value. while other thread
   supposed to update the value that first thread is waiting for. But
   since entries are full in state cache, there is second thread never
   make progress.

Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
2025-09-04 00:14:48 +00:00
Caio Oliveira
4e253184de brw: Run validation as soon as we have the CFG around
Fixes: affa7567c2 ("intel/brw: Add phases to backend")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37148>
2025-09-03 20:42:05 +00:00
Yiwei Zhang
c0e51bcf24 anv: fix broken utrace
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The non-compute end flag should be INTEL_DS_TRACEPOINT_FLAG_END_OF_PIPE.
This fixes the broken anv utrace for anything non-compute that can
potentially overlap (execute in parallel).

Fixes: 6281b207db ("anv: add tracepoints timestamp mode for empty dispatches")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37155>
2025-09-03 08:12:28 +00:00
Calder Young
a8e64e83c2 anv: Update video test expectations for layered_dpb
Remove all layered_dpb fails that have a passing separated_dpb equivalent

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young
0b911356e5 anv: Report disjoint images as unsupported for video usage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young
9bbb68a817 anv: Add support for using layered surfaces in VP9 video decoding
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young
d0bf3a96f6 anv: Add support for using layered surfaces in AV1 video decoding
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young
30b763f6e2 anv: Add support for using layered surfaces in H.264 and H.265 video coding
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young
3fb25cc78a anv: Add support for creating layered surfaces for video encode/decode
Layered surfaces (array textures) with video encode/decode usage bits
will have their slices aligned to make them addressable to the media
engine. Multi-planar layered surfaces will be stored with their slices
interleaved so that a relative offset can be programmed between the
gamma and chroma slices.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young
73608eb8b7 isl: Add support for creating layered surfaces for video encode/decode
Adds support for creating layered surfaces with slices that are addressable
to the media engine for video encoding and decoding.

Co-authored-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Lionel Landwerlin
0e198f796c anv/utrace: avoid memseting timestamp buffers by using tracepoint flags
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Using the flag we can deduce how the timestamp was written and avoid
guessing when reading back.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13806
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37111>
2025-09-02 21:59:56 +00:00
Lionel Landwerlin
f262865a90 anv: fix pipeline barriers with pre-rasterization stages
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Pre-rasterization stages need a CS stall if they need to wait on the
flushes from a PIPE_CONTROL.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37132>
2025-09-02 20:13:11 +00:00
Tapani Pälli
4035520ca9 anv: change some image qualifiers as coherent for Last Of Us
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes graphics artifacts happening with particular shader.

This 'heuristic' hits few very similar shaders but should provide better
performance than current fix to turn off caching from all shaders.

Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35929>
2025-09-02 11:04:35 +00:00
Renato Pereyra
443446aa82 anv: Enable anv_emulate_read_without_format for Android 15+
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
shaderStorageImageReadWithoutFormat is required by Android 15+

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37073>
2025-08-29 22:36:12 +00:00
Tim Van Patten
c585341552 intel/ds: Skip expensive timestamp query until necessary
The Xe ioctl DRM_XE_DEVICE_QUERY_ENGINE_CYCLES provides accurate
timestamps correlated between the CPU and GPU. However, it is slow and
impacts performance while collecting Perfetto traces.

Instead, use Perfetto's GetBootTimeNs() to track when to emit the
BUILTIN_CLOCK_BOOTTIME clock sync event so it only occurs every 1
second. This reduces the impact of recording gpu.renderstages from
-8% to -4%.

More concretely, FPS measurements when tracing Unity BoatAttack demo on
an Intel ADL device:

* gpu.renderstages disabled:            48.044293667
* gpu.renderstages enabled:             38.119778333 (-20.66%)
* gpu.renderstages enabeled + this fix: 42.641818333 (-11.24%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37095>
2025-08-29 21:34:43 +00:00
Sagar Ghuge
90daa80d1d anv: Apply pipe flushes for outstanding PC bits
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Apply any outstanding accumulated PC bits before we proceed on building
Acceleration Structure.

2 reasons for this :
   - some of the data accessed by the build might need to be flushed
     as a result of a previous barrier
   - the scratch buffer might get reused between builds

Cc: mesa-stable
Closes: #13711
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36951>
2025-08-29 20:19:45 +00:00
Lionel Landwerlin
23a4aef14a Revert "brw: move texture offset packing to NIR"
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This reverts commit 4346210ae6.

Fixes: 4346210ae6 ("brw: move texture offset packing to NIR")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37050>
2025-08-29 06:29:14 +00:00
Lionel Landwerlin
1f279e6a08 Revert "anv: enable non uniform texture offset lowering"
This reverts commit 23de5abcb5.

Fixes: 23de5abcb5 ("anv: enable non uniform texture offset lowering")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37050>
2025-08-29 06:29:14 +00:00
Lionel Landwerlin
d0e1dffcb7 anv: temporary disable KHR_maintenance8
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 47cfc77085 ("anv: expose VK_KHR_maintenance8 support")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37050>
2025-08-29 06:29:13 +00:00
Ian Romanick
49141ad5f2 brw: Strategically place flags initialization to help cmod prop
v2: Rebase on ac2b072312 ("brw: Add more specific brw_builder
helpers"), and fix a bug that caused the new instruction to possibly be
put in the wrong place.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233675305 -> 233641585 (-0.01%)
Cycle count: 32593658094 -> 32591467794 (-0.01%); split: -0.01%, +0.00%

Totals from 33513 (4.25% of 789264) affected shaders:
Instrs: 5200332 -> 5166612 (-0.65%)
Cycle count: 1499831128 -> 1497640828 (-0.15%); split: -0.15%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Ian Romanick
3018849535 brw: Don't emit redundant flags initialization for subgroup op lowering
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233676039 -> 233675305 (-0.00%)
Cycle count: 32594097814 -> 32593658094 (-0.00%); split: -0.00%, +0.00%

Totals from 325 (0.04% of 789264) affected shaders:
Instrs: 104491 -> 103757 (-0.70%)
Cycle count: 1183870034 -> 1183430314 (-0.04%); split: -0.04%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Ian Romanick
4a238f461d brw: Do cmod prop again after brw_lower_subgroup_ops
shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17114300 -> 17114294 (<.01%)
instructions in affected programs: 3617 -> 3611 (-0.17%)
helped: 6 / HURT: 0

total cycles in shared programs: 886397556 -> 886397454 (<.01%)
cycles in affected programs: 511400 -> 511298 (-0.02%)
helped: 6 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233683694 -> 233676039 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32602038466 -> 32594097814 (-0.02%); split: -0.03%, +0.01%
Spill count: 540908 -> 540704 (-0.04%)
Fill count: 700935 -> 700258 (-0.10%)

Totals from 2200 (0.28% of 789264) affected shaders:
Instrs: 2062360 -> 2054705 (-0.37%); split: -0.37%, +0.00%
Cycle count: 2506073282 -> 2498132630 (-0.32%); split: -0.41%, +0.09%
Spill count: 14423 -> 14219 (-1.41%)
Fill count: 34219 -> 33542 (-1.98%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 263545171 -> 263543341 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26480835985 -> 26484748317 (+0.01%); split: -0.01%, +0.03%
Spill count: 554335 -> 554338 (+0.00%)
Fill count: 645486 -> 645498 (+0.00%)

Totals from 610 (0.07% of 903944) affected shaders:
Instrs: 1139871 -> 1138041 (-0.16%); split: -0.17%, +0.01%
Cycle count: 2274612327 -> 2278524659 (+0.17%); split: -0.15%, +0.33%
Spill count: 15153 -> 15156 (+0.02%)
Fill count: 36831 -> 36843 (+0.03%)

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 268713723 -> 268712817 (-0.00%); split: -0.00%, +0.00%
Cycle count: 24653238085 -> 24652269669 (-0.00%); split: -0.00%, +0.00%
Fill count: 671369 -> 671361 (-0.00%)

Totals from 666 (0.07% of 899711) affected shaders:
Instrs: 924423 -> 923517 (-0.10%); split: -0.11%, +0.01%
Cycle count: 840380565 -> 839412149 (-0.12%); split: -0.13%, +0.02%
Fill count: 13006 -> 12998 (-0.06%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Lionel Landwerlin
c0cfd16da6 anv: move input coverage mask setup to runtime flush
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37060>
2025-08-28 19:08:33 +00:00
Caio Oliveira
84963d6833 intel/brw: Take shader in the brw_generator::generate_code() parameters
Simplify the calls in all the stage compile functions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00
Caio Oliveira
c19a4150b5 intel/brw: Simplify variant tracking in brw_compile_fs
Remove the cfg variables and use the shader pointers directly.  Reset
the variant pointer if a shader failed or will not be used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00
Caio Oliveira
834e30d244 intel/brw: Simplify tracking of dispatch_width_limit in brw_compile_fs
Keep it in a variable, that way don't need to check which shader to look
for the limit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00