fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 13:20:14 +01:00

Author	SHA1	Message	Date
Lionel Landwerlin	ba119c73c6	intel: replace RANGE_BASE by BASE for uniform block loads We're not currently using RANGE_BASE and we'll use BASE for offset optimizations on Xe2+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>	2025-06-22 10:55:23 +00:00
Lionel Landwerlin	098249ba66	brw: print descriptor & extended descriptors Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>	2025-06-22 10:55:22 +00:00
Emma Anholt	cd981e27f7	intel/elk: Move wpos_w setup right into nir_intrinsic_load_frag_w. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Given that the intrinsic will be CSEed at the NIR level, we don't need to preemptively set it up at the top of the shader. No change in HSW shader-db. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:43 +00:00
Emma Anholt	269fbcb144	intel/elk: Use pixel_z for gl_FragCoord.z on pre-gen6. Unless I've seriously missed something, we have the Z in the payload (which we can always request if we need access to it and it's not already passed to us due other WM IZ settings). total instructions in shared programs: 4408303 -> 4408186 (<.01%) instructions in affected programs: 1164 -> 1047 (-10.05%) total cycles in shared programs: 142485036 -> 142484566 (<.01%) cycles in affected programs: 26820 -> 26350 (-1.75%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:43 +00:00
Emma Anholt	dc55b47a58	intel/elk: Move pre-gen6 smooth interpolation 1/w multiply to NIR. NIR catches that if you're just doing something like adding two smooth inputs, we can do the multiply once on the result instead of on each input. BRW shader-db results: total instructions in shared programs: 4409146 -> 4408303 (-0.02%) instructions in affected programs: 800761 -> 799918 (-0.11%) total cycles in shared programs: 143203198 -> 142485036 (-0.50%) cycles in affected programs: 79081682 -> 78363520 (-0.91%) total sends in shared programs: 363044 -> 363042 (<.01%) sends in affected programs: 33 -> 31 (-6.06%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:42 +00:00
Emma Anholt	fb9b2261a1	intel/elk: Move pre-gen6 gl_FragCoord.w -> interpolation lowering to NIR. BRW shader-db: total instructions in shared programs: 4409143 -> 4409146 (<.01%) instructions in affected programs: 330 -> 333 (0.91%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:41 +00:00
Emma Anholt	17ab39fbf8	intel/elk: Fix some tabs in gen4 URB setup. This formatted terribly in my editor, just use spaces. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:40 +00:00
Emma Anholt	9d7a016ed1	intel/elk: Retire the global float pixel_x/y values. Nothing used them any more. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:40 +00:00
Emma Anholt	e1bf014b6e	intel/elk: Reduce this->pixel_x/y usage in gfx4 interp setup. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:40 +00:00
Emma Anholt	241bc5da70	intel/elk: Use the pixel_coord UW x/y values for noncoherent FB reads. No need to force generating the float cast just to turn it back to an int. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:39 +00:00
Emma Anholt	1134cdc198	intel/elk: Lower load_frag_coord to load_{pixel_coord,frag_coord_z/w} in NIR. This moves some conversions to NIR that may get eliminated, and also distinguishes gl_FragCoord.z/w loads at the shader info level so we don't need to flag uses_src_depth/uses_src_w when only gl_FragCoord.xy get used (as is typical). This reduces thread payload setup on many shaders. Also, interestingly, blorp shaders stop reserving space for z/w despite not putting them in the payload (since PS_EXTRA isn't filled out for z/w). HSW shader-db is noise: total instructions in shared programs: 9942649 -> 9942997 (<.01%) instructions in affected programs: 143167 -> 143515 (0.24%) total cycles in shared programs: 314768862 -> 314299112 (-0.15%) cycles in affected programs: 62951452 -> 62481702 (-0.75%) LOST: 44 GAINED: 26 Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:39 +00:00
Emma Anholt	88f1656133	intel/elk: Save the UW pixel x/y as a temp. This will be used for representing gl_FragCoord in NIR and reducing payload registers pushed. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:38 +00:00
Emma Anholt	5222c35924	intel/elk: Save the UW pixel x/y as a temp on gfx6+. This will be used for representing gl_FragCoord in NIR and reducing payload registers pushed. HSW results: total instructions in shared programs: 9940636 -> 9948574 (0.08%) instructions in affected programs: 852560 -> 860498 (0.93%) total cycles in shared programs: 314804525 -> 314900080 (0.03%) cycles in affected programs: 39786599 -> 39882154 (0.24%) LOST: 5 GAINED: 11 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:38 +00:00
Emma Anholt	af74abd68c	intel/fs: Don't bother checking if load_frag_coord uses interpolation. This was leftover dead code from `4bb6e6817e` ("intel: Use a system value for gl_FragCoord") -- the sysval doesn't do any interpolation and doesn't have sources that could use a barycentric. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:37 +00:00
Emma Anholt	0bf114736a	intel: Use the common NIR lowering for fquantize2f16. This generates one extra instruction to set the rounding mode to RTE due to f2f16_rtne in the lowering. This changes the result for fquantize2f16(65505.0) from 65536 to 65504, which fixes SPIR-V conformance for this value: If Value is positive with a magnitude too large to represent as a 16-bit floating-point value, the result is positive infinity. If Value is negative with a magnitude too large to represent as a 16-bit floating-point value, the result is negative infinity. SPIR-V doesn't specify whether this overflow check is before or after rounding, but IEEE specifies rounding first, which is what produces our 65504. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25552>	2025-06-18 22:45:08 +00:00
Lionel Landwerlin	1d8382b88e	brw: enable more lowering for bitfield manipulation at non 32bit sizes Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35381>	2025-06-11 14:09:56 +00:00
Paulo Zanoni	12192f6489	brw: properly decode TGL_PIPE_SCALAR Source: BSpec "Instruction Fields" page (56701), SWSB field. Credits to Caio Oliveira here, since he was helping me while we found this issue together. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35395>	2025-06-09 22:21:13 +00:00
Dave Airlie	870b8717b2	Revert "hasvk/elk: stop turning load_push_constants into load_uniform" Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This reverts commit `b036d2ded2`. This seems to break gtk4 and other stuff. Cc: mesa-stable (taking ack from Lionel saying we should revert) Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35407>	2025-06-09 09:20:19 +10:00
llyyr	c8bd9ac789	brw: don't unconditionally print message on instance creation This would cause Mesa to print this message even if an Intel GPU is just being enumerated by a Vulkan application. For example, `vulkaninfo --summary`. Fixes: `52f73db5b7` ("brw: implement read without format lowering") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35396>	2025-06-07 13:59:22 +00:00
Caio Oliveira	80fb555718	brw: Fix MAD instruction usage in spilling logic Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The intention here is to build a SIMD8 value, that will be expanded as needed -- just like the SHL/ADD case, but with a single instruction. Found when the was triggering invalid MAD with SIMD32 (that gets compressed) and with overlapping destination and source and which would cause conflict when divided into two SIMD16. Fixes: `338273dedd` ("brw/reg_allocate: Optimize spill offset calculation using integer MAD") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35302>	2025-06-06 15:31:50 +00:00
Lionel Landwerlin	52f73db5b7	brw: implement read without format lowering Load the format enum and then just go through a series of : if format == R16G16B16A16_UNORM color = lower_r32g32_uint_tor_r16g16b16a16_unorm(color) else if format == R16G16B16A16_SNORM ... For Gfx12.5, there is no in-shader conversion. For Gfx12/11, the in-shader conversion covers the following formats : - ISL_FORMAT_R10G10B10A2_UNORM - ISL_FORMAT_R10G10B10A2_UINT - ISL_FORMAT_R11G11B10_FLOAT For Gfx9, the following formats : - ISL_FORMAT_R16G16B16A16_UNORM - ISL_FORMAT_R16G16B16A16_SNORM - ISL_FORMAT_R10G10B10A2_UNORM - ISL_FORMAT_R10G10B10A2_UINT - ISL_FORMAT_R8G8B8A8_UNORM - ISL_FORMAT_R8G8B8A8_SNORM - ISL_FORMAT_R16G16_UNORM - ISL_FORMAT_R16G16_SNORM - ISL_FORMAT_R11G11B10_FLOAT - ISL_FORMAT_R8G8_UNORM - ISL_FORMAT_R8G8_SNORM - ISL_FORMAT_R16_UNORM - ISL_FORMAT_R16_SNORM - ISL_FORMAT_R8_UNORM - ISL_FORMAT_R8_SNORM Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22524>	2025-06-06 12:28:42 +00:00
Lionel Landwerlin	79498a0849	brw: fix brw_nir_fs_needs_null_rt helper In `9b42215e0d` ("iris: ensure null render target for specific cases") I wrongly assumed that writing gl_SampleMask would only happen in multisampled cases. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9b42215e0d` ("iris: ensure null render target for specific cases") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13292 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35313>	2025-06-04 10:10:38 +00:00
Lionel Landwerlin	a51d061c00	brw: don't generate invalid instructions `0e3e5146cf` ("intel/brw: Use correct instruction for value change check when coalescing") enabled some new cases that exposed a pre-existing bug that would turn something like this : mul.sat(16) %789:F, %787:F, %788:F mov.g.f0.0(16) %790:F, %789:F (+f0.0) sel(16) %800:UD, %790:UD, 0u into this : mul.sat(16) %790:F, %787:F, %788:F mov.g.f0.0(16) null:F, null<8,8,1>:F (+f0.0) sel(16) %800:UD, %790:UD, 0u The mov[] array can contain the same instruction because it's repeated for each REG_SIZE writes and a SIMD16 instruction will write 2 REG_SIZE. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `0e3e5146cf` ("intel/brw: Use correct instruction for value change check when coalescing") Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35276>	2025-06-04 06:08:26 +00:00
Caio Oliveira	2bb9b94c4c	brw/disasm: Don't print src1 information for SEND gather Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details There's always only the ARF scalar register source, so don't bother printing other information that won't be used. Matches the assembler code. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35297>	2025-06-03 22:52:39 +00:00
Sviatoslav Peleshko	0e3e5146cf	intel/brw: Use correct instruction for value change check when coalescing Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details When we have partial VGRF MOVs with offsets, we will reach `channels_remaining == 0` with `inst` that is not writing the whole VGRF. Currently, even though we check `can_coalesce_vars()` for each offset separately, it will always check if the dst value is not changed only for the offset from the instruction that satisfied the `channels_remaining == 0` condition. Instead, we should remember and use the correct instruction for each written offset separately. Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10916 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35062>	2025-06-01 17:37:10 +00:00
Lionel Landwerlin	f0e18c475b	intel: remove GRL/intel-clc Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35227>	2025-05-29 20:17:13 +00:00
Matt Turner	37016468a5	intel/compiler: Align human-readable send message info This fprintf() was added in commit `cce3bea2a7` ("i965/disasm: Align send instruction meta-information with dst.")) to align the human-readable send message info (e.g. "render MsgDesc: RT write ...") with the destination register on the previous line. Two months later we disabled printing the instruction offset in commit `662f1ccc24` ("i965: Disable hex offset printing in disassembly."), thereby unaligning the human-readable send message info for the next 11 years. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35077>	2025-05-28 21:54:40 +00:00
Caleb Callaway	52db0e1480	intel/compiler: fix SHA generation for shader replace Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35140>	2025-05-27 22:57:19 +00:00
Christian Gmeiner	41f2da1a6e	treewide: Do not use NIR_PASS_V for nir_divergence_analysis(..) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35131>	2025-05-23 21:19:25 +00:00
Caleb Callaway	e7454f5318	intel/debug: shader dump filter Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details v2: Fixes filtering for various brw shader dump logic Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35061>	2025-05-23 19:57:02 +00:00
Sushma Venkatesh Reddy	6d226ceca1	intel/compiler: Call brw_try_override_assembly independent of debug flag Previously, brw_try_override_assembly was only called when a debug flag was enabled. However, during investigations involving workloads such as Steam games, enabling the debug flag results in excessive NIR and ISA output to stderr, making debugging more difficult. This change ensures that brw_try_override_assembly is called when the INTEL_SHADER_ASM_READ_PATH is set, regardless of the debug flag. This improves usability in scenarios where minimal debug output is desired. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35115>	2025-05-22 21:45:38 +00:00
Lionel Landwerlin	b036d2ded2	hasvk/elk: stop turning load_push_constants into load_uniform Those intrinsics have different semantics in particular with regards to divergence. Turning one into the other without invalidating the divergence information breaks NIR validation. But also the conversion means we get artificially less convergent values in the shaders. So just handle load_push_constants in the backend and stop changing things in Hasvk. Fixes a bunch of tests in dEQP-VK.descriptor_indexing.* dEQP-VK.pipeline..push_constant.graphics_pipeline.dynamic_index_ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>	2025-05-22 07:49:20 +00:00
Lionel Landwerlin	df15968813	anv/brw: stop turning load_push_constants into load_uniform Those intrinsics have different semantics in particular with regards to divergence. Turning one into the other without invalidating the divergence information breaks NIR validation. But also the conversion means we get artificially less convergent values in the shaders. So just handle load_push_constants in the backend and stop changing things in Anv. Fixes a bunch of tests in dEQP-VK.descriptor_indexing.* dEQP-VK.pipeline..push_constant.graphics_pipeline.dynamic_index_ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>	2025-05-22 07:49:20 +00:00
Sushma Venkatesh Reddy	524733a990	intel/compiler: Centralize type stomping logic for Gen12.5 restrictions This patch improves code readability by centralizing the type stomping logic for Gen12.5 region restrictions in `brw_lower_alu_restrictions`. It removes redundant comments and ensures type consistency assertions in `brw_broadcast`, `generate_mov_indirect`, and `generate_shuffle`. Thank you Ken for guiding me on this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35006>	2025-05-22 06:46:18 +00:00
Iván Briano	27a2f6d1ff	brw: add lowering passes for FS barycentric inputs Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:59 +00:00
Iván Briano	8ee14e5291	brw/anv: add provoking vertex to fs_msaa_flags This will be necessary to select the right value for flat inputs in fragment shaders when fragment shader barycentrics are in use. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:58 +00:00
Iván Briano	acdd30a9da	brw: check if the FS needs vertex_attributes_bypass to be set Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:58 +00:00
Iván Briano	c327b83706	brw: implement load_input_vertex intrinsic Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>	2025-05-20 20:57:58 +00:00
Tapani Pälli	0f591425c9	intel/compiler: provide a helper for null any-hit shader Xe driver will be disabling the HW functionality for null any-hit shaders, drivers need to take care of it instead. This commit brings back parts of older workaround (see `b0624e414f`) we used to have to handle the null any-hit case. Cc: mesa-stable Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>	2025-05-20 10:58:53 +00:00
Mauro Rossi	04a643d877	intel/compiler: use ffsll instead of ffsl in brw_vue_map.c `18bbcf9a` triggered the following building error in Android, simple fix is to use ffsll() as it was done before `18bbcf9a` to process uint64_t generics argument. Fixes the following building error: FAILED: src/intel/compiler/libintel_compiler.a.p/brw_vue_map.c.o ... ../src/intel/compiler/brw_vue_map.c:120:37: error: implicit declaration of function 'ffsl' is invalid in C99 [-Werror,-Wimplicit-function-declaratio n] const int first_generic_output = ffsl(generics) - 1; ^ ../src/intel/compiler/brw_vue_map.c:120:37: note: did you mean 'ffs'? /home/utente/r-x86_kernel/bionic/libc/include/strings.h:72:5: note: 'ffs' declared here int ffs(int __i) __INTRODUCED_IN_X86(18); ^ 1 error generated. Fixes: `18bbcf9a` ("intel: introduce new VUE layout for separate compiled shader with mesh") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34915>	2025-05-11 00:50:21 +02:00
Ian Romanick	338273dedd	brw/reg_allocate: Optimize spill offset calculation using integer MAD Gfx12.5 and later allow the use of two 16-bit immediate values in integer MAD. Gfx11 and Gfx12 allow a single immediate for integer MAD, but that is not helpful where. v2: brw_reg_alloc::build_lane_offsets is only used on Gfx12.5+, so the check around using integer MAD is unnecessary. No shader-db or fossil-db changes on any pre-Gfx12.5 platforms. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17119962 -> 17118441 (<.01%) instructions in affected programs: 65398 -> 63877 (-2.33%) helped: 32 / HURT: 0 total cycles in shared programs: 895433316 -> 895425578 (<.01%) cycles in affected programs: 13437376 -> 13429638 (-0.06%) helped: 30 / HURT: 2 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 210052706 -> 209550074 (-0.24%) Cycle count: 31486266412 -> 31436238696 (-0.16%); split: -0.16%, +0.00% Totals from 7081 (1.00% of 707082) affected shaders: Instrs: 16864614 -> 16361982 (-2.98%) Cycle count: 6323185782 -> 6273158066 (-0.79%); split: -0.79%, +0.00% Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>	2025-05-09 21:31:09 +00:00
Ian Romanick	3db8dbfdc3	brw/reg_allocate: Optimize spill offset calculation using more SIMD8 Re-associate the calculation. The current calcuation is ((lane + zero_or_8) << 2) + offset The first addition is SIMD8, and the shift and second addition are SIMD16. By switching to ((lane << 2) + offset) + zero_or_32 All operations are SIMD8. The SHL operates directly on the UW 0x76543210UV value, and that eliminates the MOV to expand the UW to UD. v2: Switch to alternate method. Update for SIMD32 on Xe2. No shader-db or fossil-db changes on any pre-Gfx12.5 platforms. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17121519 -> 17119962 (<.01%) instructions in affected programs: 73208 -> 71651 (-2.13%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 129 x̄: 43.25 x̃: 56 helped stats (rel) min: 0.05% max: 4.92% x̄: 2.50% x̃: 2.79% 95% mean confidence interval for instructions value: -56.02 -30.48 95% mean confidence interval for instructions %-change: -3.24% -1.75% Instructions are helped. total cycles in shared programs: 895450146 -> 895433316 (<.01%) cycles in affected programs: 13709400 -> 13692570 (-0.12%) helped: 31 HURT: 2 helped stats (abs) min: 26 max: 1654 x̄: 543.10 x̃: 672 helped stats (rel) min: <.01% max: 3.43% x̄: 0.43% x̃: 0.51% HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -652.42 -367.58 95% mean confidence interval for cycles %-change: -0.61% -0.19% Cycles are helped. fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 210566294 -> 210052706 (-0.24%) Cycle count: 31582309052 -> 31486266412 (-0.30%); split: -0.30%, +0.00% Totals from 7091 (1.00% of 707082) affected shaders: Instrs: 17408115 -> 16894527 (-2.95%) Cycle count: 6443785290 -> 6347742650 (-1.49%); split: -1.49%, +0.00% Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>	2025-05-09 21:31:09 +00:00
Lionel Landwerlin	5c7c1eceb5	anv/brw: handle pipeline libraries with mesh I always thought there was a massive issue with pipeline libraries & mesh shaders. Indeed recent CTS tests have exposed a number of issues. Some values delivered to the fragment shader are coming from different places depending on whether the preceding shader is Mesh or not. For example PrimitiveID is delivered in the per-primitive block in Mesh pipelines whereas for other pipelines it's coming as a VUE slot (which is per-vertex). Those are 2 different locations in the payload. We have to find a layout for fragment shaders that is compatible with everything. Leaving gaps here and there in the thread payload. Fixes the following test pattern : dEQP-VK.mesh_shader.ext.smoke.fast_lib.shared_* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	18bbcf9a63	intel: introduce new VUE layout for separate compiled shader with mesh Mesh shaders have per vertex block in URB pretty much identical to the VUE format. Let's just reuse that concept to do all of our layout in the payload attribute registers. This will ensure that we have consistent VUE layout between Mesh & non-Mesh pipelines. We need a new way of laying out the VUE though as we have to accomodate a HW constraint of maximum (per-primitive + per-vertex) of 32 varying. This means we cannot have 2 locations in the payload for things like PrimitiveID which can come from either the per-primitive or the per-vertex block. The new layout places the PrimitiveID at the end of the per-vertex attributes and shrinks the delivery dynamically if the mesh stage is active. The shader is compiled with a MOV_INDIRECT to read the PrimitiveID from the right location in the attributes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	2d396f6085	intel: prepare VUE layout for more than 2 layouts Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	95efdca00b	brw: add documentation pointers to FS attribute layout Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	9d342081e7	brw/nir: add intrinsics to read attribute payload register indirectly Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	ef17fbf8e5	anv/brw: use separate_shader to deduced MUE compaction Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:35 +00:00
Lionel Landwerlin	6230f3029f	brw: fix brw_nir_move_interpolation_to_top In a case like this : block_0: %5 = ... %6 = ... block_1: %7 = load_interpolated_input %5, %6 The current logic would move load_interpolated_input to block_0 before %5 but not move %5 & %6 which are sources of that instruction. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:34 +00:00
Lionel Landwerlin	5ff1b31c3f	brw: document some brw_wm_prog_data fields Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>	2025-05-08 06:48:34 +00:00

1 2 3 4 5 ...

4410 commits