Commit graph

12673 commits

Author SHA1 Message Date
Kenneth Graunke
b8f264cfe4 intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
8a6903e50d intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop"
This is going to handle more than atomics shortly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
e8883bd40b intel/brw: Use size_written for NoMask instructions in is_partial_write
The intention of inst->is_partial_write() is that it should return true
when any REG_SIZE (32B) chunk of inst's destination is written but not
fully overwritten.  This can be used to tell whether inst combines new
data with existing data, or screens off any previous writes, so the old
values are no longer required.

The existing (exec_size * brw_type_size_bytes(this->dst.type) < 32)
check doesn't work in a number of cases.  For example, LSC block loads
have exec_size == 1 and force_writemask_all set, but may write multiple
full registers of data.  (Currently, we only see them with exec_size 1
after logical-send-lowering, so our SHADER_OPCODE_SEND special case
was covering those.)  We had also special cased UNDEF.

Instead, we can simply check:

   1. Predication
   2. !inst->dst.contiguous()
   3. inst->dst.offset % REG_SIZE != 0
   4. inst->size_written % REG_SIZE != 0

We had the first three already, but #4 is new.  If either #3 or #4
are true, then that implies there is a REG_SIZE chunk of the destination
which is written, but not entirely written, so it's a partial write.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
ab0b9b6792 intel/brw: Use NUM_BRW_OPCODES in can_omit_write() check
The intention here is to detect ALU hardware instructions, but not
virtual instructions that haven't been explicitly whitelisted.

For some reason we had arbitrarily hardcoded 128 here, but our virtual
opcodes don't start at 128.  They start at NUM_BRW_OPCODES.  So, use
that instead.

This prevents regressions later when we delete some opcodes, shifting
some virtual opcodes into the 72-128 range.

Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
David Heidelberg
e4b247ec9b ci/intel: Officially switch intel-adl-cl to pre-merge
It has proven to be useful.

Due to the .rusticl-rules reference, job was already running in pre-merge,
so let's make it official.

Reviewed-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31144>
2024-09-12 15:40:03 +00:00
Nanley Chery
e0157abec6 anv,iris: Pack depth pixels into initialized arrays
Coverity alerts that the uint32_t pointer I was passing into
isl_color_value_pack() could possibly be used as an array. The value is
being used as such, but only the first element of that array should be
accessed. That's because the depth buffer formats I'm also passing into
the function only have a single channel, R. Nonetheless, let's update
the code to avoid the warning.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31123>
2024-09-11 22:04:30 +00:00
Iván Briano
e4ee0a2ce1 anv: be consistent about aux usage with modifiers
In c1a7d520f3, we disabled AUX usage for imported images when they are
using an explicit modifier that doesn't support it.
We need to do the same when the modifier is picked by the driver,
otherwise the memory requirements reported for an exported image don't
match those we report for import.

Fixes: c1a7d520f3 ("anv: Disable aux if the explicit modifier lacks it")

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31051>
2024-09-10 22:09:41 +00:00
Caio Oliveira
eb68e6e84c anv: Advertise VK_KHR_compute_shader_derivatives
This was promoted from VK_NV_compute_shader_derivatives.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30956>
2024-09-10 18:22:42 +00:00
Dave Airlie
7531f6fd9c radv/anv/video: handling encoding both sps and pps in same buffer
This API should allow encoding these back to back into the same
buffer, so handle it properly.

Cc: mesa-stable
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31086>
2024-09-10 06:03:15 +00:00
Jordan Justen
c5c349a690 intel/dev: Fix warning for max_threads_per_psd when devinfo->verx10 == 120
Although we don't want to rely on hwconfig for devinfo->verx10 == 120,
due to the dependence on closed source software, we do check to see if
hwconfig reports different values in the DEVINFO_HWCONFIG macro.

Matt was seeing this warning on 8086:a7a0:

> MESA: warning: INTEL_HWCONFIG_TOTAL_PS_THREADS (128) != devinfo->max_threads_per_psd (64)

Reported-by: Matt Turner <mattst88@gmail.com>
Fixes: 3e4f73b3a0 ("intel/dev: Update hwconfig => max_threads_per_psd for Xe2")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31077>
2024-09-10 03:21:12 +00:00
Nanley Chery
c92e49e8f4 intel/isl: Always set EnableUnormPathInColorPipe
The TGL PRM says,

   This bit should never be programmed to 0

So, set it to true. I chose not to use the MBO attribute in genxml
because the field lacks the "Format: MBO" line in the PRM.

We previously made this programming conditional with commit 2e1be771e4
because of tests failing in
dEQP-GLES3.functional.texture.specification.tex*depth*. However, those
failures were fixed when we started using gl_FragDepth for depth buffer
copies in commit 6cec618e82.

Note: when bisecting this, I cherry-picked commit 7a68045b5d in order
to get past build failures related to a deprecated python function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31066>
2024-09-09 23:48:31 +00:00
Sviatoslav Peleshko
fa51595c7f brw: Fix mov cmod propagation when there's int signedness mismatch
If there's difference between scan_inst dest type and inst src type we
should be more careful, because difference in signedness can cause
incorrect results after the propagation.

Updated ror-default.trace hash, as the change fixes misrendering there.

Fixes: b23432c5 ("intel/fs: Fix a cmod prop bug when the source type of a mov doesn't match the dest type of scan_inst")
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30998>
2024-09-09 22:13:08 +00:00
Lionel Landwerlin
05dc524c75 anv: selectively disable binding table usage on Gfx20
Workaround broken Gfx20 dynamic BTI.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30931>
2024-09-09 20:33:25 +00:00
Rohan Garg
7f65035078 hasvk: enable VK_KHR_shader_relaxed_extended_instruction
The extension only affects non semantic instructions that need no
handling in the backend compiler.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31098>
2024-09-09 17:46:32 +00:00
Rohan Garg
5f3339e44a anv: enable the VK_KHR_shader_relaxed_extended_instruction feature
Fixes: 29a2e5 ('anv: enable KHR_shader_relaxed_extended_instruction')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31098>
2024-09-09 17:46:32 +00:00
Daniel Stone
a78539e704 intel/tests: Reduce load from anv_tests
anv_tests tries to create a large number of threads, all of which wait
to be able to execute simultaneously, then launch a reasonable-size
workload.

Under load, cloning each of the 16 threads takes 15ms serially, for a
delay of 240ms before the tests start running; running the test 64
times gives us 15.36s for a single testcase in isolation, assuming that
the bits which aren't forking are free.

To give it the best shot at completing in time, mark it as a
non-parallelisable test (since Meson will also try to parallelise it
out), and also halve the number of runs it attempts. And then give it a
longer timeout so it doesn't fail even in extremis.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31059>
2024-09-09 12:54:34 +00:00
Caio Oliveira
2a5a12cb71 intel/executor: Small fixes to the help message
Add missing @eot to the example.
Reword INTEL_DEBUG=color description.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31076>
2024-09-07 16:32:50 +00:00
Alyssa Rosenzweig
1753bf599c ci: update traces
🤕

thanks Mike

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
2024-09-07 00:54:35 +00:00
Tapani Pälli
39a1f53890 anv: initialize pixel struct to zero when setting clear color
Otherwise we can end up with uninitialized values, this fixes following
valgrind warning:

==31283== Uninitialised byte(s) found during client check request
==31283==    at 0x503E4DE: anv_batch_bo_finish (anv_batch_chain.c:345)
==31283==    by 0x504220A: anv_cmd_buffer_end_batch_buffer (anv_batch_chain.c:1103)
==31283==    by 0x55A0E4F: end_command_buffer (genX_cmd_buffer.c:3455)
==31283==    by 0x55A0E82: gfx11_EndCommandBuffer (genX_cmd_buffer.c:3466)
==31283==    by 0x11233A: ??? (in /usr/bin/vkcube)
==31283==    by 0x10BDEE: ??? (in /usr/bin/vkcube)
==31283==    by 0x49B5149: (below main) (in /usr/lib64/libc.so.6)
==31283==  Address 0xc10c4d8 is 1,240 bytes inside a block of size 8,192 client-defined
==31283==    at 0x5036EF6: anv_bo_pool_alloc (anv_allocator.c:1284)
==31283==    by 0x503E0E1: anv_batch_bo_create (anv_batch_chain.c:262)
==31283==    by 0x5040D3F: anv_cmd_buffer_init_batch_bo_chain (anv_batch_chain.c:868)
==31283==    by 0x504F9C1: anv_create_cmd_buffer (anv_cmd_buffer.c:147)
==31283==    by 0x6B718C4: vk_common_AllocateCommandBuffers (vk_command_pool.c:206)
==31283==    by 0x4FB06B2: vkAllocateCommandBuffers (trampoline.c:1996)
==31283==    by 0x111E6B: ??? (in /usr/bin/vkcube)
==31283==    by 0x10BDEE: ??? (in /usr/bin/vkcube)
==31283==    by 0x49B5149: (below main) (in /usr/lib64/libc.so.6)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30990>
2024-09-06 13:19:04 +00:00
David Heidelberg
d16581652f ci/iris: implement nightly CL testing using piglit on ADL
Reviewed-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29516>
2024-09-05 08:57:51 +00:00
Lionel Landwerlin
aa494cbacf brw: align spilling offsets to physical register sizes
In commit fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message
width for Xe2 regs.") we aligned the width of scratch messages to
physical register sizes (32B prior to Xe2, 64B for Xe2+).

But our spilling offsets are computed using the register allocations
sizes which are in units of 32B. That means on Xe2, you can end up
spilling a virtual register allocated at 32B (which we use for surface
state computations with exec_all) and then the spilling of that
register will be emitted in SIMD16, having the upper 8 lanes
overwriting the next spilled register.

We could potentially limit spills to SIMD8 messages on Xe2 (only
writing 32B of data), but we're also unlikely to have all 32B virtual
register spilled next to one another. And if not tightly packed, we
would have 64B registers stored on 2 different cachelines which sounds
inefficient.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.")
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30983>
2024-09-04 23:05:31 +00:00
Jordan Justen
f817870aa9 anv: Don't warn about unsupported devices if INTEL_FORCE_PROBE was used
The user must have used INTEL_FORCE_PROBE to force the device to be
loaded, so they specifically opted-in to enabled unsupported device
support.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31011>
2024-09-04 12:09:12 -07:00
Jordan Justen
ee727d7b66 intel/dev: Add devinfo::probe_forced based on INTEL_FORCE_PROBE
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31011>
2024-09-04 12:09:08 -07:00
Jordan Justen
aaaf9a3b87 anv: Do hasvk devices check first
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31011>
2024-09-04 12:09:05 -07:00
Jordan Justen
16a835ed3d anv: Drop "not yet supported" warning for Xe2
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31011>
2024-09-04 12:09:01 -07:00
José Roberto de Souza
ca13e35304 anv: Add anv_device_perf_close()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31026>
2024-09-04 10:04:38 -07:00
José Roberto de Souza
2d216c12fa anv: Drop useless '>= 0' check over a unsigned
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31026>
2024-09-04 10:04:38 -07:00
José Roberto de Souza
023120d1fc intel/perf: Fix intel_gem.h include
The intention here was to get include the common intel_gem.h to
get the intel_ioctl() signature.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31026>
2024-09-04 10:04:38 -07:00
José Roberto de Souza
5d4e319aec anv: Nuke perf_metric
This is not used.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31026>
2024-09-04 10:04:37 -07:00
Caio Oliveira
74be809237 compiler: Allow derivative_group to be used for all stages in shader_info
These will now also be used by stages that have workgroups.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30950>
2024-09-03 20:03:18 +00:00
Vignesh Raman
1eb98bc457 ci: move mtl-fw.json to .gitlab-ci directory
Placing mtl-fw.json in src/intel/ci/mtl-fw.json works for the
mesa build, but it fails to fetch in drm-ci. Move it to the
.gitlab-ci directory so it is included in the artifacts used
for building the kernel/rootfs in drm-ci.

Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30947>
2024-09-03 19:25:49 +00:00
Caio Oliveira
5be6f3b089 intel/executor: Fix SWSB for sync.nop
Surfaced after recent improvements on SWSB handling, the previous
assembly code was gracefully lowering the $1 into $1.dst.

Fixes: 37674196221 ("intel: Add executor tool")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30960>
2024-09-02 16:07:55 +00:00
Caio Oliveira
3f6b5ea27a intel/brw: Use linear walk when shader requires DERIVATIVE_GROUP_LINEAR
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30955>
2024-08-30 20:24:42 +00:00
Sai Teja
05f6e9f11e ci: Disable angle jobs for GL changes
Mesa's GL stack changes doesn't affect angle in any
way for now. Thus, drop angle jobs for GL changes from
intel and amd CI.

Signed-off-by: Sai Teja <saiteja13427@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30943>
2024-08-30 15:09:15 +00:00
Jordan Justen
3e4f73b3a0 intel/dev: Update hwconfig => max_threads_per_psd for Xe2
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30887>
2024-08-30 01:53:55 -07:00
Caio Oliveira
e4f090d3a6 intel/brw: Remove special treatment for 2-src in emit() helper
For Gfx9+ no 2-src instructions need sources to fixed up.  Special
treatment remains for 3-src instructions.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30911>
2024-08-30 04:33:47 +00:00
Ian Romanick
73f365e208 intel/brw: load_offset cannot be constant on this path
Literally inside an if-statement (about 26 lines before this hunk)
that checks for !nir_src_is_const(instr->src[1]).

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
fef175de09 intel/brw: Enable constant propagation for a couple more logical sends
This prevents some regressions later in the MR. Once load_const
operations are marked as is_scalar, they will cesase to get the
automatic constant propagation that occurs in try_rebuild_source.

No shader-db or fossil-db changes on any Intel platform.

v2: Slightly relax source restrictions on
SHADER_OPCODE_UNALIGNED_OWORD_BLOCK_READ_LOGICAL. Add a comment
explaining the restriction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
c6a8b382fd intel/brw: Relax is_partial_write check in cmod propagation
The is_partial_write check is too strict because it tests two separate
things. It tests whether or not the instruction always writes a value
(i.e., is it predicated), and it tests whether or not the instruction
writes a complete register. This latter check is problematic as it
perevents cmod propagation in SIMD1, and it prevents cmod propagation in
SIMD8 when the destination size is 16 bits.

This check is unnecessary. Cmod propagation already checks that the
region written and region read overlap. It also already checks that the
execution sizes of the instructions match. Further restriction based on
the specific parts of the register written only generates false
negatives.

v2: Relax all of the calls to is_partial_write. Suggested by Caio.

No shader-db changes on any Intel platform.

fossil-db:

Meteor Lake
Totals:
Instrs: 151505520 -> 151502923 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17201385104 -> 17194901423 (-0.04%); split: -0.06%, +0.02%
Spill count: 80827 -> 80837 (+0.01%)
Fill count: 152693 -> 152692 (-0.00%); split: -0.01%, +0.01%

Totals from 346 (0.05% of 630198) affected shaders:
Instrs: 1257205 -> 1254608 (-0.21%); split: -0.21%, +0.00%
Cycle count: 5532845647 -> 5526361966 (-0.12%); split: -0.18%, +0.06%
Spill count: 32903 -> 32913 (+0.03%)
Fill count: 64338 -> 64337 (-0.00%); split: -0.03%, +0.03%

DG2
Totals:
Instrs: 151531440 -> 151528055 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17200238927 -> 17197996676 (-0.01%); split: -0.03%, +0.02%
Spill count: 81003 -> 80971 (-0.04%); split: -0.04%, +0.00%
Fill count: 152975 -> 152912 (-0.04%); split: -0.05%, +0.01%

Totals from 346 (0.05% of 630198) affected shaders:
Instrs: 1260363 -> 1256978 (-0.27%); split: -0.27%, +0.00%
Cycle count: 5532019670 -> 5529777419 (-0.04%); split: -0.09%, +0.05%
Spill count: 33046 -> 33014 (-0.10%); split: -0.11%, +0.01%
Fill count: 64581 -> 64518 (-0.10%); split: -0.13%, +0.03%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149972324 -> 149972289 (-0.00%)
Cycle count: 15566495293 -> 15565151171 (-0.01%); split: -0.01%, +0.00%

Totals from 16 (0.00% of 629912) affected shaders:
Instrs: 351194 -> 351159 (-0.01%)
Cycle count: 3922227030 -> 3920882908 (-0.03%); split: -0.04%, +0.00%

Skylake
Totals:
Instrs: 140787999 -> 140787983 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14665614947 -> 14665515855 (-0.00%); split: -0.00%, +0.00%
Spill count: 58500 -> 58501 (+0.00%)
Fill count: 102097 -> 102100 (+0.00%)

Totals from 16 (0.00% of 625685) affected shaders:
Instrs: 343560 -> 343544 (-0.00%); split: -0.01%, +0.01%
Cycle count: 3354997898 -> 3354898806 (-0.00%); split: -0.01%, +0.01%
Spill count: 16864 -> 16865 (+0.01%)
Fill count: 27479 -> 27482 (+0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
13332c236b intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup
I observed some ray tracing shaders where a resource_intel inside a
loop was non-uniform, and some code was lowered to account for
that. Eventually the loop containing the resource_intel was unrolled,
and the resource_intel became uniform.

For example, nir_opt_uniform_subgroup can transform something like

    con loop {
        con block b5:        // preds: b4 b8
        con 32    %330 = @read_first_invocation (%329)
        con 1     %331 = ieq %330, %329
                         // succs: b6 b7
        if %331 {
            con block b6:        // preds: b5
            con 32    %332 = iadd %120.b, %330
            con 32    %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless|non-uniform, resource_block_intel=-1)
            div 32x4  %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler)
                             break
                             // succs: b9
        } else {
            con block b7:  // preds: b5, succs: b8
        }
        con block b8:  // preds: b7, succs: b5
    }

into

    con loop {
        con block b5:        // preds: b4 b8
        con 1     %331 = ieq %329, %329
                         // succs: b6 b7
        if %331 {
            con block b6:        // preds: b5
            con 32    %332 = iadd %120.b, %329
            con 32    %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless|non-uniform, resource_block_intel=-1)
            div 32x4  %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler)
                             break
                             // succs: b9
        } else {
            con block b7:  // preds: b5, succs: b8
        }
        con block b8:  // preds: b7, succs: b5
    }

Notice that %331 is now a tautology. Running brw_nir_optimize again
eliminates the loop.

v2: Add a comment in the code explaining the rationale. Suggested by
Ken. Update the commit message. Suggested by Caio.

shader-db:

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733448 -> 19733330 (<.01%)
instructions in affected programs: 14120 -> 14002 (-0.84%)
helped: 32 / HURT: 3

total cycles in shared programs: 916254496 -> 916226288 (<.01%)
cycles in affected programs: 2035116 -> 2006908 (-1.39%)
helped: 19 / HURT: 13

total spills in shared programs: 5807 -> 5807 (0.00%)
spills in affected programs: 26 -> 26 (0.00%)
helped: 1 / HURT: 1

total fills in shared programs: 6794 -> 6792 (-0.03%)
fills in affected programs: 84 -> 82 (-2.38%)
helped: 1 / HURT: 1

LOST:   1
GAINED: 1

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20393084 -> 20392971 (<.01%)
instructions in affected programs: 21750 -> 21637 (-0.52%)
helped: 31 / HURT: 4

total cycles in shared programs: 880273065 -> 880247818 (<.01%)
cycles in affected programs: 2546748 -> 2521501 (-0.99%)
helped: 18 / HURT: 9

total spills in shared programs: 4628 -> 4630 (0.04%)
spills in affected programs: 287 -> 289 (0.70%)
helped: 1 / HURT: 2

total fills in shared programs: 5381 -> 5376 (-0.09%)
fills in affected programs: 711 -> 706 (-0.70%)
helped: 2 / HURT: 2

LOST:   1
GAINED: 1

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 151513669 -> 151505520 (-0.01%); split: -0.01%, +0.00%
Send messages: 7459339 -> 7459396 (+0.00%)
Loop count: 49111 -> 47588 (-3.10%)
Cycle count: 17208178205 -> 17201385104 (-0.04%); split: -0.05%, +0.01%
Spill count: 80830 -> 80827 (-0.00%); split: -0.02%, +0.01%
Fill count: 152754 -> 152693 (-0.04%); split: -0.04%, +0.00%
Scratch Memory Size: 4136960 -> 4130816 (-0.15%)
Max live registers: 32016493 -> 32015955 (-0.00%); split: -0.00%, +0.00%

Totals from 672 (0.11% of 630198) affected shaders:
Instrs: 1352428 -> 1344279 (-0.60%); split: -0.78%, +0.17%
Send messages: 54302 -> 54359 (+0.10%)
Loop count: 6124 -> 4601 (-24.87%)
Cycle count: 1260266379 -> 1253473278 (-0.54%); split: -0.69%, +0.16%
Spill count: 15967 -> 15964 (-0.02%); split: -0.09%, +0.08%
Fill count: 36245 -> 36184 (-0.17%); split: -0.18%, +0.01%
Scratch Memory Size: 740352 -> 734208 (-0.83%)
Max live registers: 50699 -> 50161 (-1.06%); split: -1.45%, +0.39%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149976046 -> 149971100 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 7685264 -> 7685256 (-0.00%)
Cycle count: 15566401168 -> 15566405478 (+0.00%); split: -0.00%, +0.00%
Spill count: 61238 -> 61240 (+0.00%)
Fill count: 107301 -> 107289 (-0.01%)
Max live registers: 31992969 -> 31993857 (+0.00%); split: -0.00%, +0.00%

Totals from 553 (0.09% of 629912) affected shaders:
Instrs: 557027 -> 552081 (-0.89%); split: -0.90%, +0.01%
Subgroup size: 8648 -> 8640 (-0.09%)
Cycle count: 150154496 -> 150158806 (+0.00%); split: -0.23%, +0.24%
Spill count: 181 -> 183 (+1.10%)
Fill count: 440 -> 428 (-2.73%)
Max live registers: 33698 -> 34586 (+2.64%); split: -0.02%, +2.65%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
65eb7ed5fc intel/brw: Run intel_nir_lower_conversions only after brw_nir_optimize
Without this, the next commit tiggers assertions.

v2: Unconditionally do the lowering after brw_nir_optimize. Suggested by
Caio.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
572e00dd66 intel/brw: Copy prop from raw integer moves with mismatched types
The specific pattern from the unit test was observed in ray tracing
trampoline shaders.

v2: Refactor the is_raw_move tests out to a utility function. Suggested
by Ken.

v3: Fix a regression caused by being too picky about source
modifiers. This was introduced somewhere between when I did initial
shader-db runs an v2.

v4: Fix typo in comment. Noticed by Caio.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19734086 -> 19733997 (<.01%)
instructions in affected programs: 135388 -> 135299 (-0.07%)
helped: 76 / HURT: 2

total cycles in shared programs: 916290451 -> 916264968 (<.01%)
cycles in affected programs: 41046002 -> 41020519 (-0.06%)
helped: 32 / HURT: 29

fossil-db:

Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531355 -> 151513669 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17209372399 -> 17208178205 (-0.01%); split: -0.01%, +0.00%
Max live registers: 32016490 -> 32016493 (+0.00%)

Totals from 17361 (2.75% of 630198) affected shaders:
Instrs: 2642048 -> 2624362 (-0.67%); split: -0.67%, +0.00%
Cycle count: 79803066 -> 78608872 (-1.50%); split: -1.75%, +0.25%
Max live registers: 421668 -> 421671 (+0.00%)

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149995644 -> 149977326 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15567293770 -> 15566524840 (-0.00%); split: -0.02%, +0.01%
Spill count: 61241 -> 61238 (-0.00%)
Fill count: 107304 -> 107301 (-0.00%)
Max live registers: 31993109 -> 31993112 (+0.00%)

Totals from 17813 (2.83% of 629912) affected shaders:
Instrs: 3738236 -> 3719918 (-0.49%); split: -0.49%, +0.00%
Cycle count: 4251157049 -> 4250388119 (-0.02%); split: -0.06%, +0.04%
Spill count: 28268 -> 28265 (-0.01%)
Fill count: 50377 -> 50374 (-0.01%)
Max live registers: 470648 -> 470651 (+0.00%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Lionel Landwerlin
14d772d678 anv: fix utrace compute timestamp reads on Gfx20
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30923>
2024-08-29 20:10:11 +00:00
Tapani Pälli
096acf8c0c anv: change existing ICL workaround to depend on BLEND_STATE
Commit f900b763b1 we started to dirty MS as WM changes. However
later on things changed with eebb6cd236, we need to dirty with
BLEND_STATE now.

Fixes: eebb6cd236 ("anv: stop using 3DSTATE_WM::ForceThreadDispatchEnable")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30920>
2024-08-29 13:58:08 +00:00
Rohan Garg
51e05c2844 iris,anv: simplify and inline sampler count calculations
Use the CLAMP macro to clamp the value and simplify the sampler count
encoding.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30922>
2024-08-29 11:49:56 +00:00
Rohan Garg
32f606486f anv: prefetch samplers when dispatching compute shaders
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30922>
2024-08-29 11:49:56 +00:00
Tapani Pälli
44e1cf2748 anv: set correct miplevel for anv_image_hiz_op
Fixes: 5efecc9782 ("anv: Enable HiZ on multi-LOD depth buffers.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11787
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30892>
2024-08-29 04:50:44 +00:00
Faith Ekstrand
42114aa723 vulkan: Handle VIEW_INDEX_FROM_DEVICE_INDEX_BIT in the runtime
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30876>
2024-08-29 03:30:31 +00:00
Faith Ekstrand
8c60f1461b vulkan: Take a VkPipelineCreateFlags2KHR in vk_pipeline_*shader_stage*()
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30876>
2024-08-29 03:30:31 +00:00
Jesse Natalie
03655dfda1 compiler, vk: Support subgroup size of 4
Relax the assert and assign it an enum value

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30876>
2024-08-29 03:30:31 +00:00