fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 20:18:06 +02:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	e3f91fb13c	nir/serialize: fix name no more nir_register Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: M Henning <drawoc@darkrefraction.com> Reviewed-by: Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892>	2024-10-30 12:59:11 +00:00
Alyssa Rosenzweig	b8624d5c6b	nir: correct comment Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: M Henning <drawoc@darkrefraction.com> Reviewed-by: Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892>	2024-10-30 12:59:11 +00:00
Alyssa Rosenzweig	33299354e0	nir/opt_algebraic: optimize patterns hit with OpenCL This patterns were all found in the AGX quads tessellator, a medium-sized OpenCL kernel. LLVM generates a lot of garbage around booleans which we need to chew through. Though there's nothing AGX or really OpenCL specific here, so some of this could help graphics shaders too. Together, their effect is significant for that kernel instr count & occupancy: before: 2966 inst, 2310 alu, 2310 fscib, 1216 ic, 23148 bytes, 239 regs, 384 threads after: 2848 inst, 2246 alu, 2246 fscib, 1000 ic, 22260 bytes, 231 regs, 448 threads No significant changes on GL shaderdb (a single godot shader regressed 1 instruction, 1344->1345). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892>	2024-10-30 12:59:10 +00:00
Marek Olšák	ee452129c6	nir: add cull_triangles_, cull_lines_ prefixes to viewport_xy_scale_and_offset for radeonsi Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Marek Olšák	2227f5be9d	nir: rename load_cull_small_primitive_precision -> triangle, add line_precision for radeonsi Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Marek Olšák	0914e0d02f	nir: rename load_cull_small_primitives -> triangles, add load_cull_small_lines for radeonsi Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Georg Lehmann	d6535f2602	nir/opt_algebraic: create ubfe with non constant mask Foz-DB Navi21: Totals from 278 (0.35% of 79395) affected shaders: MaxWaves: 7444 -> 7448 (+0.05%) Instrs: 316069 -> 314584 (-0.47%); split: -0.47%, +0.00% CodeSize: 1608064 -> 1593204 (-0.92%) VGPRs: 11128 -> 11120 (-0.07%) Latency: 796599 -> 797786 (+0.15%); split: -0.19%, +0.34% InvThroughput: 141195 -> 139472 (-1.22%); split: -1.22%, +0.00% Copies: 28565 -> 29796 (+4.31%); split: -0.15%, +4.46% PreSGPRs: 14335 -> 14336 (+0.01%) VALU: 161342 -> 159426 (-1.19%) SALU: 87794 -> 88305 (+0.58%); split: -0.03%, +0.61% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>	2024-10-29 10:51:10 +00:00
Timur Kristóf	be68aeafdc	nir/opt_algebraic: Add various bitfield extract patterns. v2 (Georg Lehmann): - fixed incorrect imin in ubfe_ubfe - simplied outer_bits of ushr((ubfe, ...), ...) opt - added is_used_once to iand(ushr(), ...) opt to improve stats For-DB Navi21: Totals from 3309 (4.18% of 79206) affected shaders: Instrs: 5295291 -> 5282128 (-0.25%); split: -0.28%, +0.03% CodeSize: 28299320 -> 28298456 (-0.00%); split: -0.07%, +0.06% Latency: 51566173 -> 51521923 (-0.09%); split: -0.09%, +0.01% InvThroughput: 13222050 -> 13204557 (-0.13%); split: -0.14%, +0.01% VClause: 116451 -> 116458 (+0.01%); split: -0.02%, +0.02% SClause: 160356 -> 160324 (-0.02%); split: -0.03%, +0.01% Copies: 424152 -> 423670 (-0.11%); split: -0.20%, +0.09% Branches: 156701 -> 156192 (-0.32%); split: -0.33%, +0.01% PreSGPRs: 168507 -> 168500 (-0.00%); split: -0.02%, +0.01% PreVGPRs: 151477 -> 151474 (-0.00%) VALU: 3486077 -> 3476675 (-0.27%); split: -0.31%, +0.04% SALU: 786467 -> 783109 (-0.43%); split: -0.45%, +0.03% VMEM: 188035 -> 188060 (+0.01%) SMEM: 259632 -> 259630 (-0.00%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>	2024-10-29 10:51:09 +00:00
Georg Lehmann	695d2414cd	nir,radv: optimize shared atomic offsets Foz-DB Navi21: Totals from 87 (0.11% of 79395) affected shaders: Instrs: 140877 -> 140873 (-0.00%) CodeSize: 747760 -> 747164 (-0.08%); split: -0.09%, +0.01% Latency: 4528171 -> 4528162 (-0.00%) InvThroughput: 826358 -> 826349 (-0.00%) Copies: 10888 -> 10884 (-0.04%) VALU: 84634 -> 84630 (-0.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31080>	2024-10-29 09:31:08 +00:00
Rob Clark	7f63fa34da	nir/lower_amul: Fix ASAN error We shouldn't assume the bindings are sparse when we allocate an array indexed on the binding. See, for example: dEQP-GLES31.functional.program_interface_query.buffer_variable.random.55 Fixes: `2e833b16bc` ("nir/lower_amul: Use num_ubos/ssbos instead of recomputing it.") Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31611>	2024-10-25 15:38:51 +00:00
Pierre-Eric Pelloux-Prayer	60578df33a	nir: skip offset=0 in nir_io_add_const_offset_to_base When offset=0, the pass was a no-op but was setting the progress flag which could cause infinite loops when this pass is going to be added to gl_nir_opts. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31684>	2024-10-25 13:36:54 +00:00
Rhys Perry	8efc765a3d	nir/algebraic: fix shfr optimization with zero src2 No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Fixes: `08903bbe89` ("nir: add mqsad_4x8, shfr and nir_opt_mqsad") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31808>	2024-10-25 09:59:40 +00:00
Rhys Perry	b2abd3bdba	nir: fix shfr constant folding with zero src2 No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Fixes: `08903bbe89` ("nir: add mqsad_4x8, shfr and nir_opt_mqsad") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31808>	2024-10-25 09:59:40 +00:00
Daniel Schürmann	87cb42f953	treewide: don't lower to LCSSA before calling nir_divergence_analysis() Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	95ed72922e	nir/divergence: Don't assume that LCSSA phis are not loop-invariant Since we check for loop-invariance, we don't have to unconditionally flag LCSSA phis as divergent in presence of divergent breaks. This ensures consistency, with or without LCSSA form. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	c5f142a695	nir/divergence: skip expensive nir_src_is_divergent() check in most cases Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	0eff03d385	nir/divergence: calculate divergence without requiring LCSSA form Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	d34d2f8fa8	nir: consider loop invariance in nir_src_is_divergent() By doing so, this function does not require LCSSA form anymore in order to provide correct results. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	1a55d6c23b	nir/divergence: Introduce and set nir_def::loop_invariant Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	c0b3d7a916	nir/divergence: require nir_metadata_block_index This allows for fast checks whether some value is defined inside a loop. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	8d1abd4996	treewide: use nir_src_is_divergent() rather than checking the divergence of the SSA Without LCSSA, divergence between src and def might differ. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	c8348139fd	nir: change signature of nir_src_is_divergent() Now, it takes nir_src * instead of nir_src. Also move the implementation to nir_divergence_analysis.c. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	421b42637d	nir: remove nir_update_instr_divergence() This function has obscure limitations. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	ce0a3fe645	nir/opt_uniform_atomics: don't preserve divergence information Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	c25c63ebc0	nir/divergence: separately indicate whether loops have divergent continues or breaks bool nir_loop_is_divergent(nir_loop *) replaces the previous loop->divergent indicator. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Georg Lehmann	1f9b82bb2a	nir/opt_algebraic: optimize -0.0 + a Foz-DB Navi21: Totals from 428 (0.54% of 79395) affected shaders: MaxWaves: 8510 -> 8512 (+0.02%) Instrs: 731062 -> 729665 (-0.19%); split: -0.19%, +0.00% CodeSize: 3735788 -> 3728324 (-0.20%); split: -0.20%, +0.00% VGPRs: 27328 -> 27336 (+0.03%); split: -0.03%, +0.06% SpillSGPRs: 315 -> 314 (-0.32%) Latency: 3872986 -> 3873236 (+0.01%); split: -0.08%, +0.09% InvThroughput: 971001 -> 970056 (-0.10%); split: -0.17%, +0.08% VClause: 11954 -> 11956 (+0.02%); split: -0.02%, +0.03% SClause: 17361 -> 17358 (-0.02%) Copies: 59038 -> 59045 (+0.01%); split: -0.22%, +0.24% Branches: 17685 -> 17656 (-0.16%) PreSGPRs: 26103 -> 26102 (-0.00%) PreVGPRs: 23220 -> 23206 (-0.06%) VALU: 515293 -> 513963 (-0.26%); split: -0.26%, +0.00% SALU: 91591 -> 91544 (-0.05%) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31770>	2024-10-23 08:58:34 +00:00
Marek Olšák	0226922384	nir: add nir_gather_tcs_info, new gathering/analysis pass This does shader analysis that is more niche than regular shader info. It's planned to be used by nir_restructure_tcs_flow as discussed here: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11910 It's also useful for driver-specific passes. The code for gathering "all_invocations_define_tess_levels" is copied from radeonsi. The rest is new. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31447>	2024-10-23 03:17:16 +00:00
Amber	a3afe22dc9	nir: add pass to lower atomic arithmetic to a loop with cmpxchg. Signed-off-by: Amber Harmonia <amber@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27776>	2024-10-21 21:47:44 +00:00
Mary Guillemard	84d57e1fb1	nir: Move atomic_op_to_alu to common code Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27776>	2024-10-21 21:47:44 +00:00
Marek Olšák	fb6184f89c	nir: add shader_info::tess::tcs_same_invocation_inputs_read(_indirect) We need both the same-invocation usage mask and cross-invocation usage mask. The AMD reason is below. Cross-invocation TCS input access doesn't prevent the same-invocation fast path in AMD hw because it's just a different way to load the same data, and we want to use both paths for the same TCS input based on the load instruction. The fast path can't be used for indirect access, which is gathered separately for same-invocation access. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31645>	2024-10-21 18:53:51 +00:00
Pavel Ondračka	33c8dc4f18	nir/nir_group_loads: reduce chance of max_distance check overflow Helps for the case when max_distance is set to ~0, where the pass would now only create groups of two loads together due to overflow. Found while experimenting with this pass on r300, however the only driver currently affected is i915. With i915 this change gains around 20 shaders in my small shader-db (most notably some GLMark2, Unigine Tropics, Tesseract, Amnesia) at the expense of increased register pressure in few other cases. I'm assuming this is a good deal for such old HW, and this seems like what was intended when the pass was introduced to i915, but anyway this could be tweaked further driver side with a more optimized max_distance value. Only shader-db tested. Relevant i915 shader-db stats (lpt): total tex_indirect in shared programs: 1529 -> 1493 (-2.35%) tex_indirect in affected programs: 96 -> 60 (-37.50%) helped: 29 HURT: 2 total temps in shared programs: 3015 -> 3200 (6.14%) temps in affected programs: 465 -> 650 (39.78%) helped: 1 HURT: 91 GAINED: 20 Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: GKraats <vd.kraats@hccnet.nl> Fixes: `33b4eb149e` Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31529>	2024-10-18 09:21:22 +00:00
Job Noorman	509606e56d	nir/lower_subgroups: scan/reduce for multiple ballot components lower_scan_reduce only worked when ballot_components equals one. This commit adds support for arbitrary ballot_components. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>	2024-10-18 06:57:52 +00:00
Job Noorman	58b199f7ed	nir/lower_subgroups: add build_cluster_mask helper This functionality will become more complex in the next commit so separate it into a helper function. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>	2024-10-18 06:57:52 +00:00
Job Noorman	e0cb4a94a3	nir/lower_subgroups: move up some helper functions build_subgroup_mask and build_ballot_imm_ishl will be needed by other functions higher-up the file. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>	2024-10-18 06:57:52 +00:00
Lionel Landwerlin	97b17aa0b1	brw/nir: rework inline_data_intel to work with compute This intrinsic was initially dedicated to mesh/task shaders, but the mechanism it exposes also exists in the compute shaders on Gfx12.5+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Georg Lehmann	dbf63a0788	nir: remove nir_op_is_derivative Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	f9d2aad7a3	nir: remove alu ddx/ddy Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	bf0d1a42b4	nir: remove uses_fddx_fddy Unused and the code didn't even do what the comment said. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	cba575f4df	nir: always emit ddx intrinsics Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	1371a8fe2b	nir/opt_move_discards_to_top: handle ddx/ddy intrinsics Fixes: `daa97bb41a` ("amd: switch to derivative intrinsics") Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Marek Olšák	948f94b8c5	nir/opt_varyings: pack TCS inputs with cross-invocation access together Unigine Heaven has a TCS that reads pos.xyz and tescoord.w from all invocations in every invocation. By putting those two in the same vec4, AMD hw can reduce the amount of shared memory that is allocated for those inputs from 2 vec4s to 1 vec4. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31670>	2024-10-17 03:30:07 +00:00
Marek Olšák	8e93907b7c	nir/opt_varyings: assign locations of no_varying IO for TCS outputs only Skip the code for other shader stages because it doesn't do anything there. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31670>	2024-10-17 03:30:07 +00:00
Connor Abbott	65c0846537	nir/lower_input_attachments: Handle unscaled input attachments with no index With VK_KHR_dynamic_rendering_local_read we can have input attachments with no index, which normally correspond to depth/stencil attachments, and we have to handle this here when determining whether we need to emit an unscaled fragcoord for FDM. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31261>	2024-10-17 00:30:44 +00:00
Connor Abbott	4bd506a7f3	spirv: Make the default input attachment index ~0 This will let us know when an input attachment doesn't have an InputAttachmentIndex, which used to be illegal but is now allowed and meaningful with VK_KHR_dynamic_rendering_local_read. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31261>	2024-10-17 00:30:44 +00:00
Job Noorman	4556b18f51	nir: add shuffle_{xor,up,down}_uniform_ir3 intrinsics These are like shuffle_{xor,up,down} except they expect a dynamically uniform index. This is necessary since the ir3 shfl instruction does not work with a divergent index. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31501>	2024-10-16 22:05:10 +00:00
Danylo Piliaiev	7b09fc98fb	nir/opt_16b_tex_image: Sign extension should matter for texel buffer txf Texel buffer could be arbitrary large, so the assumption being made in the following comment is wrong: "Zero-extension (u16) and sign-extension (i16) have the same behavior here - txf returns 0 if bit 15 is set because it's out of bounds and the higher bits don't matter." Sign extension should matter for GLSL_SAMPLER_DIM_BUF. This fixes the case of doing texelFetch with u16 offset: uniform itextureBuffer s1; uint16_t offset = some_ssbo.offset; value = texelFetch(s1, offset).x; If the offset is higher than s16 optimization incorrectly left it as 16b. In spirv the above glsl is translated into: %22 = OpLoad %ushort %21 %23 = OpUConvert %uint %22 %24 = OpBitcast %int %23 %26 = OpImageFetch %v4int %16 %24 Cc: mesa-stable Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31664>	2024-10-16 10:10:00 +00:00
Timothy Arceri	aa7c59e02c	nir/glsl: set deref cast mode for blocks during function inlining More cast fixes this time for UBO and SSBO. Which were missing testing previously. Fixes: `d681cf96fb` ("nir/glsl: set deref cast mode during function inlining") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11587 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31668>	2024-10-16 06:25:57 +00:00
Marek Olšák	0727634443	nir/opt_load_store_vectorize: vectorize load_smem_amd radeonsi+ACO with the new vectorization callback: TOTALS FROM AFFECTED SHADERS (19508/58918) VGPRs: 708672 -> 708864 (0.03 %) Code Size: 31458688 -> 31217160 (-0.77 %) bytes Max Waves: 305960 -> 305952 (-0.00 %) Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	a44e5cfccf	nir/opt_load_store_vectorize: allow a 4-byte hole between 2 loads If there is a 4-byte hole between 2 loads, drivers can now optionally vectorize the loads by including the hole between them, e.g.: 4B load + 4B hole + 8B load --> 16B load All vectorize callbacks already reject all holes, but AMD will want to allow it. radeonsi+ACO with the new vectorization callback: TOTALS FROM AFFECTED SHADERS (25248/58918) VGPRs: 871116 -> 871872 (0.09 %) Spilled SGPRs: 397 -> 407 (2.52 %) Code Size: 43074536 -> 42496352 (-1.34 %) bytes Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	80c156422d	nir/opt_load_store_vectorize: allow overfetching, merge overfetched loads New load merging transformations (first, second), examples: (vec4, vec3) ==> vec8(read=0x7f) (because NIR doesn't have vec7) (vec1, vec8(read=0x7f)) ==> vec8(read=0xff) - the unused component at the end of vec8 is dropped Not merged: vec8(read=0xfe) + vec1 - unused components at the beginning are kept Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00

1 2 3 4 5 ...

5656 commits