Commit graph

200240 commits

Author SHA1 Message Date
Christopher Michael
084754a5e5 v3d: Add support for PIPE_QUERY_TIMESTAMP_DISJOINT
When supporting PIPE_QUERY_TIMESTAMP, we use os_time_get_nano so the
disjoint timer frequency should be nanoseconds.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
2025-01-14 09:56:08 +00:00
Christopher Michael
5982a69f90 v3d: Add support for time elapsed queries
Add support for getting time elapsed values via glBeginQuery/glEndQuery.
When recording query start & end time, we ensure that all pending jobs have
been completed by using v3d cpu_queue & the multisync extension.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
2025-01-14 09:56:08 +00:00
Christopher Michael
9a35894d61 v3d: Add support for timestamp queries
Add support for getting timestamp values via
glGet(GL_TIMESTAMP) and glQueryCounter(GL_TIMESTAMP). For the case of
glQueryCounter, we make use of v3d cpu jobs via
DRM_IOCTL_V3D_SUBMIT_CPU and DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
2025-01-14 09:56:08 +00:00
Christopher Michael
8e1b27138c v3d: Add check to see if v3d supports multisync
Add support to check if v3d supports the multisync
extension. This will be used in future patches to enable support for
PIPE_CAP_QUERY_TIMESTAMP & PIPE_CAP_QUERY_TIME_ELAPSED.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
2025-01-14 09:56:07 +00:00
Christopher Michael
5e728db32a v3d: Add check to see if v3d supports cpu_queue
Add support to check if v3d supports cpu_queue. This
will be used in future patches to enable support for
PIPE_CAP_QUERY_TIMESTAMP & PIPE_CAP_QUERY_TIME_ELAPSED.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
2025-01-14 09:56:07 +00:00
Samuel Pitoiset
94da1edbe4 radv: rename attr_ring to ge_rings
This is better naming.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32994>
2025-01-14 00:59:38 -08:00
Samuel Pitoiset
ab96333490 radv: fix configuring the attribute ring size on GFX12
The attribute ring size per SE is different than GFX11 and it was
already computed correctly in common code but RADV was using the old
GFX11 style.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32994>
2025-01-14 00:59:37 -08:00
Chia-I Wu
776199ea77 panvk/csf: add a comment on query synchronization
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
2025-01-14 05:43:46 +00:00
Chia-I Wu
655b7c464a panvk/csf: no need to flush caches after query copy
The spec says

  vkCmdCopyQueryPoolResults is considered to be a transfer operation,
  and its writes to buffer memory must be synchronized using
  VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT before
  using the results.

While STORE_MULTIPLE is not exactly VK_PIPELINE_STAGE_TRANSFER_BIT /
VK_ACCESS_TRANSFER_WRITE_BIT, we can still rely on user barriers to do
the right thing (e.g., flush caches for host access).

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
2025-01-14 05:43:46 +00:00
Chia-I Wu
8948ca1024 panvk/csf: no need to sb wait on query copy
When VK_QUERY_RESULT_WAIT_BIT is set, we rely on sync wait.  When
VK_QUERY_RESULT_WAIT_BIT is not set, no wait is needed.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
2025-01-14 05:43:46 +00:00
Chia-I Wu
d04437845f panvk/csf: no need to sb wait on query end
We can guarantee ordering with this sequence of async cmds

  RUN_FRAGMENT ->
  (signal and wait SB_ITER) ->
  FLUSH_CACHE2 ->
  (signal and wait DEFERRED_FLUSH) ->
  SYNC_SET32

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
2025-01-14 05:43:46 +00:00
Chia-I Wu
50a3b4765e panvk/csf: no need to sb wait on query begin
The spec says

  VUID-vkCmdBeginQueryIndexedEXT-None-00807
  All queries used by the command must be unavailable

and panvk_cmd_reset_occlusion_queries is synchronous.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
2025-01-14 05:43:46 +00:00
Chia-I Wu
12ce26a1d1 panvk: no need to zero results on query reset
The spec says

  Resetting a query via vkCmdResetQueryPool or vkResetQueryPool sets the
  status to unavailable and makes the numerical results undefined.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
2025-01-14 05:43:46 +00:00
Chia-I Wu
0b3e10d6fd panvk: no need to check query count on query create
The spec says

  VUID-VkQueryPoolCreateInfo-queryCount-02763
  queryCount must be greater than 0

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
2025-01-14 05:43:46 +00:00
Chia-I Wu
04e899f125 panvk: no need to zero availability on query create
The spec says

  After query pool creation, each query is in an uninitialized state and
  must be reset before it is used.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
2025-01-14 05:43:46 +00:00
Nanley Chery
cd8e120b97 anv: Allow more single subresource fast-clears with FCV
Format re-interpretation is no longer a problem with texture views. The
clear color address now points to a clear color that is in the expected
format.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>
2025-01-14 03:43:55 +00:00
Nanley Chery
35f02d8f36 anv: Inline can_fast_clear_with_non_zero_color
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>
2025-01-14 03:43:55 +00:00
Nanley Chery
5549cb921d Revert "anv: turn off non zero fast clears for CCS_E"
This reverts commit 25a232238f.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11110
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11325
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>
2025-01-14 03:43:55 +00:00
Nanley Chery
3e62401df3 anv: Drop bpc check for non-zero fast clears
Use the matching clear color address for an image view format to support
any clear color.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31374>
2025-01-14 03:43:55 +00:00
Nanley Chery
83cd73385a anv: Use L3 Fabric flush in fast-clear post-amble on TGL
Replace the Tile Cache flush with an L3 Fabric flush. According to HSD
1604687438, this should be faster.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>
2025-01-14 03:14:00 +00:00
Nanley Chery
cec086a074 anv: Reduce fast-clear post-amble synchronization
On gfx12+, the pre-amble and post-amble flushes contain the stalls
necessary to ensure the prior operation is complete. Remove the extra
uses of ANV_PIPE_END_OF_PIPE_SYNC_BIT in post-amble flushes. Also do
this for the pre-amble flushes, but this doesn't have any impact. The
flush application function will implicitly add the bit.

For A750, this improves the TWWH3 trace in the performance CI by 0.52%
(n=2).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>
2025-01-14 03:14:00 +00:00
Nanley Chery
e9a85dd3ac iris: Use L3 Fabric flush in fast-clear post-amble on TGL
Replace the Tile Cache flush with an L3 Fabric flush. According to HSD
1604687438, this should be faster.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>
2025-01-14 03:14:00 +00:00
Nanley Chery
2e7f344508 iris: Reduce fast-clear post-amble flushes
On gfx12+, the post-amble flushes contain the stalls necessary to ensure
the prior operation is complete. Remove the extra uses
iris_emit_end_of_pipe_sync().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>
2025-01-14 03:14:00 +00:00
Caio Oliveira
634daf2827 intel/brw: Rename brw_fs_validate to brw_validate
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32843>
2025-01-13 23:56:22 +00:00
Caio Oliveira
d37cbfad66 util/ra: Don't store a pointer to a ra_regs per ra_reg
Each reg may store a list of conflict regs.  This was handled by
util_dynarray, however each of those hold an extra pointer for
the ra_regs (which serves as mem_ctx for that).  Since the usage
here is very simple, we just handle the array growth manually.
The initial size remains the same as before.

The mem_ctx of each ra_reg was being used to identify the case
in which the list wasn't used.  Change to use a bool in the
ra_regs struct instead.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
2025-01-13 23:10:51 +00:00
Caio Oliveira
298740d7a1 util/ra: Bump the initial size of adjacency lists
For Intel, looking at a few fossils, the majority of nodes
have more than 32 entries in the list.  I'd expect other backends
to have similar numbers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
2025-01-13 23:10:51 +00:00
Caio Oliveira
9cccb89dbc util/ra: Don't store a pointer to graph per ra_node
Each node stores a list of adjacent nodes.  This was handled by
util_dynarray, however each of those hold an extra pointer for
the ra_graph (which serves as mem_ctx for that).  Since the usage
here is very simple, we just handle the array growth manually.

For now keep using the same initial size as was being used by dynarray.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
2025-01-13 23:10:51 +00:00
Caio Oliveira
3753c9ed1b util/ra: Move less used data out of ra_node
Create a parallel array to hold them.  In particular, the `spill_cost` is
used at a completely different moment than the main node data.

Reduces the `struct ra_node` size to 40 bytes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
2025-01-13 23:10:51 +00:00
Nanley Chery
052d7e1a9c anv: Slow clear if fast-clear cost is not mitigated
Fast-clears require expensive flushes beforehand and afterwards. The
cost of flushes are decreased in a series of back-to-back fast-clears as
no extra fast-clear flushes are required in between them. If the ratio
of a command buffer's recorded back-to-back fast clears over independent
fast-clears falls below 1/2, prevent that command buffer from recording
any further fast-clears.

Averaging two runs of our Factorio trace on an A750 shows a +14.37%
improvement in FPS.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32984>
2025-01-13 20:42:31 +00:00
Brian Paul
24107f2f67 svga: fix printing 64-bit value for 32-bit build
Closes: #12449, #12451
Fixes: b13e2a495e ("svga: add svga_resource_create_with_modifiers() function")
Signed-off-by: Brian Paul <brian.paul@broadcom.com>
Reviewed-by: Neha Bhende <neha.Bhende@broadcom.com>
Reviewed-by: Neha Bhende <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32995>
2025-01-13 18:25:55 +00:00
Zan Dobersek
7c927144b3 freedreno/registers: fix RBBM_PRIMCTR understanding and usage
RBBM_PRIMCTR registers are used for different pipeline statistics that can
be queried, but current usage was wrong in some cases. Comments in the
register file are updated, and the per-statistic register index mapping is
updated accordingly.

Fixes on a750:
  test_query_pipeline_statistics in vkd3d-proton
  arb_query_buffer_object failures in piglit (zink)

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32900>
2025-01-13 15:46:20 +00:00
Sergi Blanch Torne
3fed68b607 Revert "ci: disable Collabora's farm due to maintenance"
This reverts commit 02f8b22a1a.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32993>
2025-01-13 13:43:53 +00:00
David Rosca
5a5628284a frontends/va: Allow creating DRM PRIME surfaces without surface descriptor
If we don't have surface descriptor, treat this as a hint from
application that it will export the surface later.
This matches Intel driver behavior.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32970>
2025-01-13 10:26:02 +00:00
Samuel Pitoiset
10e424f586 aco: always use ds_bpermute for shuffle/rotate on GFX12
ds_bpermute supports both 32 and 64 lanes now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32974>
2025-01-13 08:33:38 +00:00
Samuel Pitoiset
b3d4d65f5a radv: fix CP DMA clears/copies on GFX12
CP DMA on GFX12 doesn't always use L2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32971>
2025-01-13 08:07:58 +00:00
Samuel Pitoiset
603541f1a2 ac/gpu_info: add cp_dma_use_L2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32971>
2025-01-13 08:07:58 +00:00
Sergi Blanch Torne
02f8b22a1a ci: disable Collabora's farm due to maintenance
Planned downtime in the farm:
* Start: 2025-01-13 08:00 UTC
* End: 2025-01-13 14:00 UTC

Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32734>
2025-01-13 07:36:17 +00:00
Job Noorman
c1dfe22b7b ir3: emit uniform iadd3 as two adds
The `sad` instruction (used for iadd3) doesn't support the scalar ALU.
This means we might fall back to non-earlypreamble whenever we use it in
the preamble. Prevent this by emitting it as two adds instead.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32943>
2025-01-13 07:06:03 +00:00
Lucas Stach
bed748d5f6 etnaviv: fix polygon offset disable
If a polygon offset is set via glPolygonOffset, but the functionality
isn't enabled via glEnable(GL_POLYGON_OFFSET_FILL) the offset must not
be taken into account when computing the sample depth. As the Vivante
hardware does not have a separate enable state, the offset units and
scale must both be set to 0 to keep the sample depth unchanged.

Fixes dEQP-GLES2.functional.polygon_offset.default_enable

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32982>
2025-01-12 21:06:33 +00:00
duncan.hopkins
20b806284a glx: Add back in applegl_create_display() so the OpenGL.framework, on MacOS, pointer get setup.
Fixes: 4e8740370a ("glx: rework __glXInitialize")

Tested-by: Yurii Kolesnykov <root@yurikoles.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32656>
2025-01-12 16:49:33 +00:00
duncan.hopkins
48ebbe2777 glx: Guard some of the bind_extensions() code with the same conditions as glx_screens frontend_screen member.
Configution like simple MacOS builds do not have `frontend_screen` and fail to build.

Fixes: 34dea2b38e ("glx: unify extension binding")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12317

Tested-by: Yurii Kolesnykov <root@yurikoles.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32656>
2025-01-12 16:49:33 +00:00
Karol Herbst
0aa218328d rusticl/kernel: store memory arguments as Weak references
Through the spec it's required that cl_kernel doesn't hold references to
its bound kernel arguments.

There is a CL CTS test verifying this, but because the arguments were not
used in the test kernel, a reference was never taken. This will change
with SVM and BDA as we need to know all bound memory objects even if they
aren't directly used in kernels.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32961>
2025-01-12 15:26:14 +00:00
Rob Clark
114a47544f ir3: Add preamble instr count metric
Turnip already had a pipeline stat to indicate whether we were using
early-preamble or not.  But no way to tell if there was a preamble at
all.  Adding a preamble instruction count tells us whether there is a
preamble, but also how big it is.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32977>
2025-01-11 18:10:17 +00:00
Kenneth Graunke
894393470a brw: Fix Xe2 spilling code to limit to SIMD32 rather than SIMD16
LSC can do native SIMD32 messages on Xe2.

Cuts spill/fills on Lunarlake:
- q2rtx-rt-pipeline: -20.83% / -16.85%
- Borderlands 3 DX12: -18.26% / -2.09%
- Cyberpunk 2077: -2.18% / -0.11%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32986>
2025-01-11 09:33:09 +00:00
Lionel Landwerlin
8ac7802ac8 brw: move final send lowering up into the IR
Because we do emit the final send message form in code generation, a
lot of emissions look like this :

  add(8)  vgrf0,    u0, 0x100
  mov(1)   a0.1, vgrf0          # emitted by the generator
  send(8)   ...,  a0.1

By moving address register manipulation in the IR, we can get this
down to :

  add(1)  a0.1,   u0, 0x100
  send(8)  ..., a0.1

This reduce register pressure around some send messages by 1 vgrf.

All lost shaders in the below results are fragment SIMD32, due to the
throughput estimator. If turned off, we loose no SIMD32 shaders with
this change.

DG2 results:

  Assassin's Creed Valhalla:
  Totals from 2044 (96.87% of 2110) affected shaders:
  Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00%
  Subgroup size: 23832 -> 23824 (-0.03%)
  Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82%
  Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39%
  Fill count: 2005 -> 1256 (-37.36%)
  Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00%
  Max live registers: 116765 -> 115058 (-1.46%)
  Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67%

  Cyberpunk 2077:
  Totals from 1181 (93.43% of 1264) affected shaders:
  Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01%
  Subgroup size: 13016 -> 13032 (+0.12%)
  Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39%
  Spill count: 12 -> 8 (-33.33%)
  Fill count: 9 -> 6 (-33.33%)

  Dota2:
  Totals from 173 (11.59% of 1493) affected shaders:
  Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34%
  Max live registers: 5787 -> 5779 (-0.14%)
  Max dispatch width: 1344 -> 1152 (-14.29%)

  Hitman3:
  Totals from 5072 (95.39% of 5317) affected shaders:
  Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00%
  Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48%
  Spill count: 3942 -> 3200 (-18.82%)
  Fill count: 10158 -> 8846 (-12.92%)
  Scratch Memory Size: 257024 -> 223232 (-13.15%)
  Max live registers: 328467 -> 324631 (-1.17%)
  Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73%

  Fortnite:
  Totals from 360 (4.82% of 7472) affected shaders:
  Instrs: 778068 -> 777925 (-0.02%)
  Subgroup size: 3128 -> 3136 (+0.26%)
  Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19%
  Max live registers: 50689 -> 50658 (-0.06%)

  Hogwarts Legacy:
  Totals from 1376 (84.00% of 1638) affected shaders:
  Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03%
  Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12%
  Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36%
  Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23%
  Scratch Memory Size: 99328 -> 89088 (-10.31%)
  Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23%
  Max dispatch width: 11848 -> 11920 (+0.61%)

  Metro Exodus:
  Totals from 92 (0.21% of 43072) affected shaders:
  Instrs: 262995 -> 262968 (-0.01%)
  Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25%
  Max live registers: 11152 -> 11140 (-0.11%)

  Red Dead Redemption 2 :
  Totals from 451 (7.71% of 5847) affected shaders:
  Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00%
  Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00%
  Max live registers: 42294 -> 42185 (-0.26%)

  Spiderman Remastered:
  Totals from 6820 (98.02% of 6958) affected shaders:
  Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65%
  Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25%
  Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61%
  Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58%
  Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74%
  Max live registers: 493149 -> 487458 (-1.15%)
  Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20%

  Strange Brigade:
  Totals from 3769 (91.21% of 4132) affected shaders:
  Instrs: 1354476 -> 1321474 (-2.44%)
  Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59%
  Max live registers: 199057 -> 193656 (-2.71%)
  Max dispatch width: 30272 -> 30240 (-0.11%)

  Witcher 3:
  Totals from 25 (2.40% of 1041) affected shaders:
  Instrs: 24621 -> 24606 (-0.06%)
  Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05%
  Max live registers: 1963 -> 1955 (-0.41%)

LNL results:

  Assassin's Creed Valhalla:
  Totals from 1928 (98.02% of 1967) affected shaders:
  Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11%
  Subgroup size: 41264 -> 41280 (+0.04%)
  Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11%
  Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90%
  Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60%
  Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56%
  Max live registers: 205483 -> 202192 (-1.60%)

  Cyberpunk 2077:
  Totals from 1177 (96.40% of 1221) affected shaders:
  Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03%
  Subgroup size: 24912 -> 24944 (+0.13%)
  Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81%
  Spill count: 8 -> 3 (-62.50%)
  Fill count: 6 -> 3 (-50.00%)
  Max live registers: 126922 -> 125472 (-1.14%)

  Dota2:
  Totals from 428 (32.47% of 1318) affected shaders:
  Instrs: 89355 -> 89740 (+0.43%)
  Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55%
  Max live registers: 32863 -> 32847 (-0.05%)

  Fortnite:
  Totals from 5354 (81.72% of 6552) affected shaders:
  Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53%
  Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65%
  Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72%
  Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35%
  Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71%

  Hitman3:
  Totals from 4912 (97.09% of 5059) affected shaders:
  Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00%
  Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55%
  Spill count: 3739 -> 3136 (-16.13%)
  Fill count: 10657 -> 9564 (-10.26%)
  Scratch Memory Size: 373760 -> 318464 (-14.79%)
  Max live registers: 597566 -> 589460 (-1.36%)

  Hogwarts Legacy:
  Totals from 1471 (96.33% of 1527) affected shaders:
  Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05%
  Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68%
  Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95%
  Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83%
  Scratch Memory Size: 251904 -> 217088 (-13.82%)
  Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12%

  Metro Exodus:
  Totals from 18356 (49.81% of 36854) affected shaders:
  Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83%
  Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84%
  Spill count: 595 -> 546 (-8.24%)
  Fill count: 1604 -> 1408 (-12.22%)
  Max live registers: 2086937 -> 2086933 (-0.00%)

  Red Dead Redemption 2:
  Totals from 4171 (79.31% of 5259) affected shaders:
  Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83%
  Subgroup size: 86416 -> 86432 (+0.02%)
  Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53%
  Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59%
  Scratch Memory Size: 401408 -> 385024 (-4.08%)

  Spiderman Remastered:
  Totals from 6639 (98.94% of 6710) affected shaders:
  Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98%
  Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59%
  Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82%
  Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76%
  Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17%
  Max live registers: 918240 -> 906604 (-1.27%)

  Strange Brigade:
  Totals from 3675 (92.24% of 3984) affected shaders:
  Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00%
  Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09%
  Max live registers: 361849 -> 351265 (-2.92%)

  Witcher 3:
  Totals from 13 (46.43% of 28) affected shaders:
  Instrs: 593 -> 660 (+11.30%)
  Cycle count: 28302 -> 28714 (+1.46%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
a27d98e933 brw: avoid having the scratch surface handle partially written
Allows it to be visible through the def_analysis.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
aac906c16c brw: add scheduler support for address registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
0a5bdf1199 brw: add infra to make use of the address register in the IR
This limits the address register to simple cases inside a block.

Validation ensures that the address register is only written once and
read once.

Instruction scheduling makes sure that instructions using the address
register in the generator are not scheduled while there is an usage of
the register in the IR.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
c9fa235c28 brw: split validation iteration into blocks
No functional change.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
9b73a73a6e brw: use phys_nr() more in generation
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00