fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 18:28:05 +02:00

Author	SHA1	Message	Date
Caio Oliveira	390317a99e	brw: Fix size in assembler when compacting Calculation was wrongly walking uncompacted instructions, even if we had some compacted in the middle, generating invalid size. Since we are here just drop the instruction count, since in practice the caller will have to walk the instruction stream anyway. Fixes: `6267585778` ("intel/brw: Also return the size of the assembled shader") Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33532> (cherry picked from commit `dd1ca1588d`)	2025-03-04 20:24:05 +01:00
Hyunjun Ko	0ea91330c3	anv: Do not support the tiling of DRM modifier if DECODE_DST Fixes: `04709e4f` ("anv: fix video profile lists"); Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33784> (cherry picked from commit `f7ff9b240d`)	2025-03-03 17:25:22 +01:00
Kevin Chuang	f912436dc9	anv/bvh: Fix copy shader handling sparse buffer Fixes: `692b5fa9f2` ("anv: Add shader to copy acceleration structures") This commit fixes the future test "sparse_binding_structures" for "header_bottom_address" for ray tracing pipeline. Even on 48-bit ray tracing (Xe1/2), the software-defined part instance_leaf_part1.bvh_ptr has to be in canonical form for copy.comp to deference a bvh, which means we have to preserve the upper 16bits. This is especially relevant in cases where the acceleration structure buffer is located high, such as sparse buffer. Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33745> (cherry picked from commit `87ff7b061f`)	2025-03-03 17:25:16 +01:00
Kevin Chuang	614dd4999c	anv/bvh: Fix encoder handling sparse buffer Fixes: `2fe57947e3` ("anv: Implement encode shader to fit in ANV BVH") This commit resolves the failures in the future tests "sparse_binding_structures" for rayquery. Sparse buffers' heaps are located high, and since it's in canonical form, the higher 16bits are all set to 1. However, the existing encoder did not expect any non-zero values at the higher 16bits. As a result, the instance flags got corrupted, causing most triangle tests to fail. Thanks for Paulo providing insights about sparse buffer properties. Co-developed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33745> (cherry picked from commit `b9a980ea73`)	2025-03-03 17:25:14 +01:00
Paulo Zanoni	bac3b56d51	brw: extend the NOP+WHILE workaround It turns out that we need to add a NOP not only in between two consecutive WHILE instructions, but also after every control flow instruction that immediately precedes a WHILE. v2: Rebase after the renames. Fixes: `5ca883505e` ("brw: add a NOP in between WHILE instructions on LNL") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33021> (cherry picked from commit `fd10764cff`)	2025-02-28 22:17:35 +01:00
Karol Herbst	62747d6bdd	intel/brw, lp: enable lower_pack_64_4x16 The compiler won't be able to emit pack_64_4x16, so we should prevent nir_opt_algebraic to optimize to it. This fixes an infinite optimization loop inside brw_nir_optimize: nir_copy_prop 16x4 %77 = @load_global (%80) 32 %61995 = pack_32_2x16_split %77.x, %77.y 32 %61998 = pack_32_2x16_split %77.z, %77.w 64 %61999 = pack_64_2x32_split %61995, %61998 64 %76 = iadd %100, %79 @store_global (%61999, %76) nir_opt_algebraic 16x4 %77 = @load_global (%80) 32 %61995 = pack_32_2x16_split %77.x, %77.y 32 %61998 = pack_32_2x16_split %77.z, %77.w 16x4 %62000 = vec4 %77.x, %77.y, %77.z, %77.w 64 %62001 = pack_64_4x16 %62000 64 %76 = iadd %100, %79 @store_global (%62001, %76) nir_lower_pack 16x4 %77 = @load_global (%80) 16x4 %62000 = vec4 %77.x, %77.y, %77.z, %77.w 16 %62002 = mov %62000.y 16 %62003 = mov %62000.x 32 %62004 = pack_32_2x16_split %62003, %62002 16 %62005 = mov %62000.w 16 %62006 = mov %62000.z 32 %62007 = pack_32_2x16_split %62006, %62005 64 %62008 = pack_64_2x32_split %62004, %62007 64 %76 = iadd %100, %79 @store_global (%62008, %76) // brw_nir_optimize loops here nir_copy_prop 16x4 %77 = @load_global (%80) 32 %62004 = pack_32_2x16_split %77.x, %77.y 32 %62007 = pack_32_2x16_split %77.z, %77.w 64 %62008 = pack_64_2x32_split %62004, %62007 64 %76 = iadd %100, %79 @store_global (%62008, %76) llvmpipe has a similar issue inside lp_build_opt_nir Fixes: `b1bc691b0f` ("nir/algebraic: add and improve pack/unpack patterns") Acked-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33347> (cherry picked from commit `dad5ee1039`)	2025-02-28 22:17:35 +01:00
Lionel Landwerlin	3630721dc8	anv: fix missing 3DSTATE_PS:Kernel0MaximumPolysperThread programming Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `815d2e3e8b` ("anv: move 3DSTATE_PS to partial packing") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33712> (cherry picked from commit `91f36ba5b6`)	2025-02-28 22:17:35 +01:00
Dylan Baker	db51d8f8ac	iris: fix handling of GL__VERTEX_CONVENTION By actually setting the state packets according to the program data. Also ensure that we correctly flag that the program may be dirty when the geometry shader state changes Fixes piglit tests: `spec@!opengl 3.2@gl-3.2-adj-prims pv-first` Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Backport-to: 25.0 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33658> (cherry picked from commit `c33ebf09f5`)	2025-02-28 22:17:35 +01:00
Paulo Zanoni	d8ffce96d2	brw: increase brw_reg::subnr size to 6 bits Since Xe2, the registers are bigger and even the instruction structures got updated to have 6 bits. The way I detected this issue was when I tried to use src/intel/executor to add the following instruction: add(8) g6.8<1>UD g4<8,8,1>UD 0x00000008UD { align1 WE_all 1Q I@1 }; Executor would read this and end up emitting an add with dst being g6<1>UD instead of what we wanted. It turns out that inside brw_gram.y, at dstoperand and dstoperandex we do: $$.subnr = $$.subnr * brw_type_size_bytes($4); which would overflow subnr back to 0. The overflow doesn't seem to be a problem with code we emit directly (unlike the code we parse, like above) due to the fact that we seem to treat Xe2 registers as smaller all the way until we call phys_nr() and phys_subnr() during code generation. The phys_subnr() function can generate a value that would overflow reg.subnr, but this value is never written back to reg.subnr, it's just returned as an unsigned int. Fixes: `e9f63df2f2` ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33539> (cherry picked from commit `927d7b322b`)	2025-02-18 22:46:08 +01:00
Tapani Pälli	3194cae6d0	anv: apply cache flushes on pipeline select with gfx20 This fixes rendering artifacts seen with Hogwarts Legacy and Black Myth Wukong. Assumption is that we can get rid of these flushes once RESOURCE_BARRIER work lands but until then we need them. Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12540 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12489 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33397> (cherry picked from commit `765f3b78d5`)	2025-02-18 22:46:07 +01:00
Tapani Pälli	961a3fc760	anv: tighten condition for changing barrier layouts Assertion (or attempting the layout change) is causing crash when launching Steel Rats. Tighten the condition for change so that it should affect only when runtime has made changes. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12602 Fixes: `eed788213b` ("anv: ensure consistent layout transitions in render passes") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33523> (cherry picked from commit `d8381415a6`)	2025-02-18 22:46:01 +01:00
Lionel Landwerlin	e2232c0be4	anv: ensure Wa_16012775297 interacts correctly with Wa_18020335297 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `dddd765553` ("anv: implement VF_STATISTICS emit for Wa_16012775297") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418> (cherry picked from commit `6b99bf76ca`)	2025-02-15 00:02:54 +01:00
Lionel Landwerlin	399de9dd00	anv: disable VF statistics for memcpy Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418> (cherry picked from commit `462d8e3fab`)	2025-02-15 00:02:53 +01:00
Ian Romanick	2ea6b340ac	brw/copy: Fix handling of offset in extract_imm The offset is measured in bytes. Some of the code here acted as though it were measured in src.type units. Also modify the assertion to check that all extracted bits come from data in the immediate value. Fixes: `580e1c592d` ("intel/brw: Introduce a new SSA-based copy propagation pass") Fixes: `da395e6985` ("intel/brw: Fix extract_imm for subregion reads of 64-bit immediates") Yes, I missed this error twice in code review. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33049> (cherry picked from commit `ac4b93571c`)	2025-02-11 18:05:27 +01:00
Lionel Landwerlin	cb0d551424	brw: fixup scoreboarding for find_live_channels Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32895> (cherry picked from commit `c08b437db7`)	2025-02-05 16:08:29 +01:00
Ernst Persson	26ad2f9149	intel/vulkan: Add bvh build dependency Fixes: `41baeb3810` ("anv: Implement acceleration structure API") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12558 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33333> (cherry picked from commit `c64871accc`)	2025-02-04 20:47:26 +01:00
Hyunjun Ko	cd4ffc319f	anv: Fix to set CDEF flter flag correctly for AV1 decoding and relevant tiny clean-up. Fixes: `8432b8b282` ("anv: add initial support for AV1 decoding") Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33316> (cherry picked from commit `52d9edbf05`)	2025-02-04 20:47:26 +01:00
Caio Oliveira	f18dee3618	intel/brw: Fallback to SEND from SEND_GATHER if possible After optimization happen, if the sources are still in one or two contigous spans for some reason (e.g. some data read from memory now being written), it is beneficial to just use regular SEND and avoid having to set the ARF scalar instruction. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>	2025-01-30 04:43:58 +00:00
Caio Oliveira	b6b32933ad	intel/brw: Use SHADER_OPCODE_SEND_GATHER in Xe3 Add an optimization pass to turn regular SENDs into SEND_GATHERs. This allows the payload to be "broken" into smaller pieces that can be further optimized, which _may_ result in - less register pressure (no need to contiguous space), and - less instructions (no need to MOV to such space). For debugging, the INTEL_DEBUG=no-send-gather option skips this optimization, and reporting how many opportunities were missed. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>	2025-01-30 04:43:58 +00:00
Caio Oliveira	26d4d04d63	intel/brw: Add lowering for SHADER_OPCODE_SEND_GATHER Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>	2025-01-30 04:43:58 +00:00
Caio Oliveira	650ec7169d	intel/brw: Add SHADER_OPCODE_SEND_GATHER Starting in Xe3, there's a variant of SEND that take the register numbers from the ARF scalar register, and don't require them to be contiguous. The new opcode added here represents that kind of SEND. To make the original sources still reachable, we keep them around during the IR, just ignoring them at generator time. This allow software scoreboard to properly reason the dependencies without trying to decode the contents of ARF scalar register being used. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>	2025-01-30 04:43:58 +00:00
Caio Oliveira	2fca22347c	intel/brw: Plumb through generator whether SEND is gather variant Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>	2025-01-30 04:43:58 +00:00
Caio Oliveira	00fac79f99	intel/brw: Add scoreboard support for scalar register Xe3 adds a new pipe that handles only MOVs from immediate into the scalar register. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>	2025-01-30 04:43:57 +00:00
Caio Oliveira	fbacf3761f	intel: Add meson option -Dintel-elk Defaults to true. When set to false Iris and various tools can be built without ELK support. In both cases this means supporting only Gfx9+. This option must be true to build Crocus or Hasvk. This allows skipping re-building ELK when developing for newer platforms with tools/tests enabled. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11575 Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>	2025-01-30 00:45:59 +00:00
Caio Oliveira	31e5d909e7	intel/tools: Merge libaub into libintel_tools Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>	2025-01-30 00:45:59 +00:00
Caio Oliveira	ec2d20a70d	intel/tools: Add helpers for decoder_init/disasm Isolate the BRW/ELK differences in a single place. The way is done now, we are not reusing the isa_info between calls. For the tools here this is probably fine, if its someday this gets in the way, we can add an opaque pointer to store the right data. This intentionally is not used in Iris, since there the driver need more detailed view into BRW/ELK and we don't want to create an all encompassing abstraction for that. Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>	2025-01-30 00:45:59 +00:00
Caio Oliveira	aa2bd16dec	intel/tools: Use idep_libintel_common in meson Since the internal dependency object exists and is already used in some cases, let's be consistent. Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>	2025-01-30 00:45:59 +00:00
Francisco Jerez	d455d5d86c	anv/xe3+: Enable VRT. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	dd1712515b	anv/xe3+: Set RegistersPerThread for bindless shader dispatch. v2: Use MOV and wrap in conditional during BTD spawn header setup (Lionel). Remove references to SIMD8 (Tapani). v3: Update brw_bsr() to specify number of registers per thread, don't initialize Registers Per Thread on BTD spawn header (Lionel). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	b25d0f899b	anv/xe3+: Set RegistersPerThread during shader state setup based on prog_data. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	7537f8edee	intel/blorp/xe3+: Set RegistersPerThread during shader state setup based on prog_data. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	f6a1c51de7	intel/genxml/xe3+: Update definitions for shader state setup. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	fb40b449cd	intel/brw: Define ptl_register_blocks() helper. Since this calculation will be needed in many places to set up the state of each shader stage. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	70fecb1483	intel/brw: Report number of GRF registers used in brw_stage_prog_data. This is similar to what we used to do on pre-SNB platforms, the number of GRF registers used by the shader will be used on Xe3+ to adjust the trade-off between thread-level parallelism and size of the GRF file. Plumb the value through prog_data so the driver can set up the hardware state accordingly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	6513bf65c3	intel/brw/xe3+: Optimize CS/TASK/MESH compile time optimistically assuming SIMD32. This is similar in principle to the previous commit "intel/brw/xe3+: brw_compile_fs() implementation for Xe3+." but applied to compute-like shader stages. It changes the implementation of brw_compile_cs/task/mesh() to reduce compile time and take advantage of wider dispatch modes more aggressively than the original logic, since as of Xe3 SIMD32 builds succeed without spills in most cases thanks to VRT. The new "optimistic" SIMD selection logic starts with the SIMD width that is potentially highest performance and only compiles additional narrower variants if that fails (typically due to spilling), while the old "pessimistic" logic did the opposite: It started with the narrowest SIMD width and compiled additional variants with increasing register pressure until one of them failed to compile. In typical non-spilling cases where we formerly compiled SIMD16 and SIMD32 variants of the same compute shader, this change will halve the number of backend compilations required to build it. XXX - Possibly don't do this in cases with variable workgroup size until effect on runtime performance can be measured directly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> v2: Don't do this for now in cases with variable workgroup size, still compile every possible variant in such cases. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Sagar Ghuge	7e1362e9c0	intel/brw/xe3+: Don't compile SIMD32 if there is ray queries Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	5b6906076e	intel/brw/xe3+: brw_compile_fs() implementation for Xe3+. This reworks the implementation of brw_compile_fs() to reduce compile time and take advantage of wider dispatch modes more aggressively than the original logic. The new "optimistic" PS compilation logic starts with the SIMD width that is potentially highest performance and only compiles additional narrower variants if that fails (typically due to spilling or hardware restrictions), while the old "pessimistic" logic did the opposite: It started with the narrowest SIMD width and compiled additional variants with increasing register pressure until one of them failed to compile. The main disadvantage of this is that selectively throwing away some of the compiled variants based on the static analysis of their performance behavior will no longer be possible, however this is expected to be less useful on Xe3+ since the GRF space allocated to a thread can be scaled up or down, which leads to less dramatic differences in scheduling between SIMD variants. In typical non-spilling cases where we formerly compiled SIMD16 and SIMD32 variants of the same fragment shader, this change will halve the number of backend compilations required to build a shader. With multi-polygon PS dispatch enabled (which is disabled by default right now) this has an even more dramatic effect since the number of compiler iterations can be reduced down to a fifth in the best case scenario. Even though in most cases we will only attempt to return a single binary from the pixel shader compilation, the hardware allows a pair of PS kernels to be specified, and we'll still take advantage of this when the multi-polygon PS kernel has the potential to have worse performance than the single-polygon shader because only the latter register-allocates successfully at SIMD32 -- Only in such case (SIMD2x8 multi-polygon, SIMD32 single-polygon) we'll continue programming both so the hardware will chose one or the other at runtime depending on the SIMD fullness and number of polygons it can buffer at runtime. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	1b2bd1fcb8	intel/brw: Exit early from run_fs() if compilation failed before optimization loop. This avoids running the optimizer uselessly if compilation of the current kernel failed due to some hardware (e.g. SIMD-width) restriction. This isn't only inefficient but it can break assumptions throughout the compiler which would lead to crashes on Xe3 when this arises during translation from NIR. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	afff3eb95e	intel/brw: Indent conditional block from brw_compile_fs() not applicable to Xe2+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	d7d08ec2e2	intel/brw: Indent body of brw_compile_fs() not applicable to xe3+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	d03eac3133	intel/brw/xe3+: Disable round-robin allocation heuristic on Xe3+. Xe3+ benefits from packing register allocations tightly in order to make optimal use of the GRF space. The round-robin heuristic previously in use often causes the whole GRF space to be used even if register pressure is substantially lower, which would severely decrease thread-level parallelism on Xe3+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	a67ff3e7e3	intel/brw/xe3+: Bump number of SBID tokens for Xe3. Xe3 supports 32 SBID tokens per thread regardless of the number of register blocks allocated per thread. Take advantage of the increased number of SBIDs in the scoreboard pass to reduce the frequency of false dependencies on Xe3+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	8d2331fe4b	intel/brw/xe3: Extend regalloc sets to maximum Xe3 GRF size. Extend our regalloc sets to 256 registers to match the maximum capacity of the GRF file on Gfx30. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	ca1636d457	intel/brw/xe3: Define XE3_MAX_GRF. Gfx30 supports up to 256 (512b) GRFs which requires a max GRF define of 512 in REG_SIZE units. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	67cb23a4b1	intel/common/xe2+: Allow SIMD32 PS for all multisample cases. These don't seem to be disallowed by recent hardware anymore. Stop disabling SIMD32 due to hardware restrictions of multisample rasterization, since it should have better performance, and on Xe3+ there may be no shader variant available other than SIMD32. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	935f60c13c	intel/blorp: Specify a subgroup size requirement of 16 for fast clear or repclear shaders. Request a fixed subgroup size for pixel shaders that require it due to the hardware restrictions of fast clears and repeated data clears. This requires plumbing the "is_fast_clear" boolean across several callers since blorp_compile_fs_brw() currently has no information regarding whether the kernel is intended for a fast clear operation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	80b2355b39	intel/brw: Allow specifying a required subgroup size for fragment shaders. On older hardware the "use_rep_send" compile parameter was being implicitly used to request the compilation of the SIMD16 variant of clear pixel shaders that require it due to hardware restrictions. However starting on Gfx12+ this flag is never set since replicated data clears are no longer supported, but BLORP still implicitly relies on the SIMD16 variant being generated even though there's no way for BLORP to explicitly request it. This doesn't cause much of a problem right now since brw_compile_fs() typically generates a SIMD16 kernel unless the SIMD8 kernel spills or SIMD debugging flags are enabled, but it won't work reliably on Xe3+ since we'll start using SIMD32 more aggressively. In order to avoid these issues use the standard required subgroup_size parameter from shader_info to signal that the SIMD16 variant of the shader is needed by the caller. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	a736757275	anv/gfx12.5: Request subgroup size 8 for RT trampoline shader. The 16-wide variant of the trampoline shader doesn't appear to work and would be inadvertently enabled by this series on Gfx12.5. Set the required subgroup size to avoid changing current behavior. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	8102500b95	intel/brw/xe3+: Mask subgroup shuffle index to be within valid range to avoid VRT hangs. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Francisco Jerez	d2af77aa6b	intel/brw: Use urb_read_length instead of nr_attribute_slots to calculate VS first_non_payload_grf. Makes sure the number of registers reserved for the payload matches the size of the URB read, which prevents the VS shared function from writing past the end of the register file on Xe3 with VRT enabled. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00

1 2 3 4 5 ...

13455 commits