fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-04 09:48:07 +02:00

Author	SHA1	Message	Date
Pavel Ondračka	33c8dc4f18	nir/nir_group_loads: reduce chance of max_distance check overflow Helps for the case when max_distance is set to ~0, where the pass would now only create groups of two loads together due to overflow. Found while experimenting with this pass on r300, however the only driver currently affected is i915. With i915 this change gains around 20 shaders in my small shader-db (most notably some GLMark2, Unigine Tropics, Tesseract, Amnesia) at the expense of increased register pressure in few other cases. I'm assuming this is a good deal for such old HW, and this seems like what was intended when the pass was introduced to i915, but anyway this could be tweaked further driver side with a more optimized max_distance value. Only shader-db tested. Relevant i915 shader-db stats (lpt): total tex_indirect in shared programs: 1529 -> 1493 (-2.35%) tex_indirect in affected programs: 96 -> 60 (-37.50%) helped: 29 HURT: 2 total temps in shared programs: 3015 -> 3200 (6.14%) temps in affected programs: 465 -> 650 (39.78%) helped: 1 HURT: 91 GAINED: 20 Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: GKraats <vd.kraats@hccnet.nl> Fixes: `33b4eb149e` Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31529>	2024-10-18 09:21:22 +00:00
Job Noorman	509606e56d	nir/lower_subgroups: scan/reduce for multiple ballot components lower_scan_reduce only worked when ballot_components equals one. This commit adds support for arbitrary ballot_components. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>	2024-10-18 06:57:52 +00:00
Job Noorman	58b199f7ed	nir/lower_subgroups: add build_cluster_mask helper This functionality will become more complex in the next commit so separate it into a helper function. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>	2024-10-18 06:57:52 +00:00
Job Noorman	e0cb4a94a3	nir/lower_subgroups: move up some helper functions build_subgroup_mask and build_ballot_imm_ishl will be needed by other functions higher-up the file. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31587>	2024-10-18 06:57:52 +00:00
Lionel Landwerlin	97b17aa0b1	brw/nir: rework inline_data_intel to work with compute This intrinsic was initially dedicated to mesh/task shaders, but the mechanism it exposes also exists in the compute shaders on Gfx12.5+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Georg Lehmann	dbf63a0788	nir: remove nir_op_is_derivative Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	f9d2aad7a3	nir: remove alu ddx/ddy Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	bf0d1a42b4	nir: remove uses_fddx_fddy Unused and the code didn't even do what the comment said. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	cba575f4df	nir: always emit ddx intrinsics Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	1371a8fe2b	nir/opt_move_discards_to_top: handle ddx/ddy intrinsics Fixes: `daa97bb41a` ("amd: switch to derivative intrinsics") Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Marek Olšák	948f94b8c5	nir/opt_varyings: pack TCS inputs with cross-invocation access together Unigine Heaven has a TCS that reads pos.xyz and tescoord.w from all invocations in every invocation. By putting those two in the same vec4, AMD hw can reduce the amount of shared memory that is allocated for those inputs from 2 vec4s to 1 vec4. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31670>	2024-10-17 03:30:07 +00:00
Marek Olšák	8e93907b7c	nir/opt_varyings: assign locations of no_varying IO for TCS outputs only Skip the code for other shader stages because it doesn't do anything there. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31670>	2024-10-17 03:30:07 +00:00
Connor Abbott	65c0846537	nir/lower_input_attachments: Handle unscaled input attachments with no index With VK_KHR_dynamic_rendering_local_read we can have input attachments with no index, which normally correspond to depth/stencil attachments, and we have to handle this here when determining whether we need to emit an unscaled fragcoord for FDM. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31261>	2024-10-17 00:30:44 +00:00
Connor Abbott	4bd506a7f3	spirv: Make the default input attachment index ~0 This will let us know when an input attachment doesn't have an InputAttachmentIndex, which used to be illegal but is now allowed and meaningful with VK_KHR_dynamic_rendering_local_read. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31261>	2024-10-17 00:30:44 +00:00
Job Noorman	4556b18f51	nir: add shuffle_{xor,up,down}_uniform_ir3 intrinsics These are like shuffle_{xor,up,down} except they expect a dynamically uniform index. This is necessary since the ir3 shfl instruction does not work with a divergent index. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31501>	2024-10-16 22:05:10 +00:00
Danylo Piliaiev	7b09fc98fb	nir/opt_16b_tex_image: Sign extension should matter for texel buffer txf Texel buffer could be arbitrary large, so the assumption being made in the following comment is wrong: "Zero-extension (u16) and sign-extension (i16) have the same behavior here - txf returns 0 if bit 15 is set because it's out of bounds and the higher bits don't matter." Sign extension should matter for GLSL_SAMPLER_DIM_BUF. This fixes the case of doing texelFetch with u16 offset: uniform itextureBuffer s1; uint16_t offset = some_ssbo.offset; value = texelFetch(s1, offset).x; If the offset is higher than s16 optimization incorrectly left it as 16b. In spirv the above glsl is translated into: %22 = OpLoad %ushort %21 %23 = OpUConvert %uint %22 %24 = OpBitcast %int %23 %26 = OpImageFetch %v4int %16 %24 Cc: mesa-stable Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31664>	2024-10-16 10:10:00 +00:00
Timothy Arceri	aa7c59e02c	nir/glsl: set deref cast mode for blocks during function inlining More cast fixes this time for UBO and SSBO. Which were missing testing previously. Fixes: `d681cf96fb` ("nir/glsl: set deref cast mode during function inlining") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11587 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31668>	2024-10-16 06:25:57 +00:00
Marek Olšák	0727634443	nir/opt_load_store_vectorize: vectorize load_smem_amd radeonsi+ACO with the new vectorization callback: TOTALS FROM AFFECTED SHADERS (19508/58918) VGPRs: 708672 -> 708864 (0.03 %) Code Size: 31458688 -> 31217160 (-0.77 %) bytes Max Waves: 305960 -> 305952 (-0.00 %) Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	a44e5cfccf	nir/opt_load_store_vectorize: allow a 4-byte hole between 2 loads If there is a 4-byte hole between 2 loads, drivers can now optionally vectorize the loads by including the hole between them, e.g.: 4B load + 4B hole + 8B load --> 16B load All vectorize callbacks already reject all holes, but AMD will want to allow it. radeonsi+ACO with the new vectorization callback: TOTALS FROM AFFECTED SHADERS (25248/58918) VGPRs: 871116 -> 871872 (0.09 %) Spilled SGPRs: 397 -> 407 (2.52 %) Code Size: 43074536 -> 42496352 (-1.34 %) bytes Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	80c156422d	nir/opt_load_store_vectorize: allow overfetching, merge overfetched loads New load merging transformations (first, second), examples: (vec4, vec3) ==> vec8(read=0x7f) (because NIR doesn't have vec7) (vec1, vec8(read=0x7f)) ==> vec8(read=0xff) - the unused component at the end of vec8 is dropped Not merged: vec8(read=0xfe) + vec1 - unused components at the beginning are kept Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	65ace5649b	nir: reject unsupported component counts from all vectorize callbacks If you allow an unsupported component count in the callback for loads, nir_opt_load_store_vectorize will align num_components to the next supported vector size, essentially overfetching. This changes all callbacks to reject it. AMD will enable it in a later commit. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	02923e237d	nir: add hole_size parameter into the vectorize callback It will be used to allow merging loads with a hole between them. Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	8ce43b7765	nir/opt_load_store_vectorize: add entry::num_components We will represent vec6..vec7, vec9..vec15 loads with 8 and 16 components respectively, so we need to track how many components we really use. This is a prerequisite for optimal merging up to vec16. Example: Step 1: vec4 + vec3 ==> vec7as8 (last component unused) Step 2: vec1 + vec7as8 ==> vec8 (last unused component dropped) Without using the number of components read, the same example would end up doing: Step 1: vec4 + vec3 ==> vec8 Step 2: vec1 + vec8 ==> vec9 (fail) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Alyssa Rosenzweig	e9303c0952	nir: extract round component helper another nir pass will use this. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	64c4d29e65	nir/opt_vectorize_io: fix stack buffer overflow with 16-bit output stores uncovered by unrelated work Fixes: `2514999c9c` - nir: add nir_opt_vectorize_io, vectorizing lowered IO Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31644>	2024-10-15 03:59:17 +00:00
Timothy Arceri	46facf9037	nir/glsl: set cast mode for image during function inlining Fixes: `d681cf96fb` ("nir/glsl: set deref cast mode during function inlining") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11980 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31554>	2024-10-15 03:13:24 +00:00
Konstantin Seurer	70a1453537	nir/print: Fix the alignment of 8-bit definitions Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31612>	2024-10-14 21:21:04 +00:00
Rhys Perry	67ad7359ff	nir/divergence_analysis: disable phi undef optimization by default If the backend does not implement this too, or some other future transform modifiess the phi so that this isn't the case (replace the phi with a bcsel or replace undef with zero), then it will not actually be uniform. This keeps it enabled to some degree for RADV/ACO. fossil-db (navi31): Totals from 76 (0.10% of 79395) affected shaders: Instrs: 195008 -> 195282 (+0.14%) CodeSize: 1012592 -> 1015884 (+0.33%) Latency: 3892826 -> 3898843 (+0.15%); split: -0.00%, +0.15% InvThroughput: 460681 -> 460964 (+0.06%) Copies: 13508 -> 13516 (+0.06%) Branches: 5244 -> 5412 (+3.20%) PreVGPRs: 5092 -> 5096 (+0.08%) VALU: 116177 -> 116197 (+0.02%) SALU: 23449 -> 23785 (+1.43%) fossil-db (navi21): Totals from 76 (0.10% of 79395) affected shaders: Instrs: 164471 -> 164981 (+0.31%) CodeSize: 883988 -> 888420 (+0.50%) Latency: 4074287 -> 4082043 (+0.19%) InvThroughput: 783783 -> 784276 (+0.06%); split: -0.00%, +0.06% Branches: 5262 -> 5430 (+3.19%) PreVGPRs: 5100 -> 5104 (+0.08%) VALU: 116375 -> 116381 (+0.01%) SALU: 23589 -> 23925 (+1.42%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30211>	2024-10-10 14:59:26 +00:00
Alyssa Rosenzweig	6287c8251d	nir: add bounds_agx opcode used to facilitate bounds checking optimization Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31532>	2024-10-05 18:30:11 +00:00
Iago Toral Quiroga	aac1c074cc	nir: make fclamp_pos_mali and fsat_signed_mali opcodes generic V3D can use these too. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>	2024-10-03 09:02:07 +00:00
Faith Ekstrand	62a4fe861a	nir: Add an option to lower quad vote Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31470>	2024-10-02 21:10:32 +00:00
Marek Olšák	f546df95a6	nir/opt_vectorize_io: fix skipped output vectorization if inputs were vectorized Fixes: `2514999c9c` - nir: add nir_opt_vectorize_io, vectorizing lowered IO Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31454>	2024-10-02 20:26:23 +00:00
Job Noorman	4d50504b26	nir/lower_int64: add nir_intrinsic_rotate Can simply be split into 32b ops. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31455>	2024-10-02 06:35:49 +00:00
Job Noorman	e6a5c342da	nir/lower_int64: add nir_intrinsic_read_invocation_cond_ir3 Can simply be split into 32b ops. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31455>	2024-10-02 06:35:49 +00:00
Job Noorman	584b63ecab	nir/load_store_vectorize: fix division by zero Don't use glsl_get_explicit_stride as it may return 0 for vector types, use nir_deref_instr_array_stride instead. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31460>	2024-10-02 05:53:57 +00:00
Rhys Perry	be64454710	nir/tests: test opt_loop_peel_initial_break with derefs in header block Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31324>	2024-10-01 12:24:22 +00:00
Rhys Perry	0484044b1a	nir/opt_loop: rematerialize header block derefs in their use blocks Otherwise, we could end up with phis of derefs. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Fixes: `6b4b044739` ("nir/opt_loop: add loop peeling optimization") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31324>	2024-10-01 12:24:22 +00:00
Gert Wollny	f19f1ec17b	nir/opt_algebraic: Allow two-step lowering of ftrunc@64 to use ffract@64 If ftrunc@64 is lowered by nir_lower_doubles it is turned into a comparable long series of 32 bit operations. If the hardware supports ffract@64 then nir_opt_algebraic can first lower ftrunc@64 to use some combinations with ffloor@64. They can then be turned into a combination of fsub@64 and ffract@64 resulting in less all-over instructions. Fixes: `5218cff34b` nir/algebraic: avoid double lowering of some fp64 operations Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29281>	2024-09-30 23:51:02 +00:00
Kenneth Graunke	0b34a7aff0	nir: Don't generate single iteration loops to zero-initialize memory If the stride we're adding to our loop counter is larger than the total amount of shared local memory we're trying to initialize, we know the loop will run at most one time. So we can skip emitting a loop. Loop unrolling appears to be unable to detect this currently. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31312>	2024-09-30 05:27:17 +00:00
Georg Lehmann	bb7e8d51b6	nir: delete nir_opt_reuse_constants Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>	2024-09-27 05:19:16 +00:00
Georg Lehmann	60776f87c3	nir/opt_remove_phis: rematerialize constants Foz-DB Navi31: Totals from 749 (0.94% of 79395) affected shaders: Instrs: 1224359 -> 1223722 (-0.05%); split: -0.07%, +0.02% CodeSize: 6468392 -> 6466296 (-0.03%); split: -0.06%, +0.03% Latency: 9764410 -> 9766457 (+0.02%); split: -0.01%, +0.03% InvThroughput: 1017401 -> 1017380 (-0.00%); split: -0.03%, +0.03% VClause: 19902 -> 19873 (-0.15%); split: -0.16%, +0.02% SClause: 38441 -> 38424 (-0.04%); split: -0.05%, +0.01% Copies: 86880 -> 86304 (-0.66%); split: -0.73%, +0.06% Branches: 34206 -> 34159 (-0.14%); split: -0.14%, +0.01% PreSGPRs: 45557 -> 45527 (-0.07%); split: -0.08%, +0.01% PreVGPRs: 32406 -> 32408 (+0.01%) VALU: 671633 -> 671533 (-0.01%); split: -0.02%, +0.01% SALU: 155284 -> 154675 (-0.39%); split: -0.40%, +0.00% VMEM: 27303 -> 27271 (-0.12%) SMEM: 67490 -> 67455 (-0.05%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>	2024-09-27 05:19:16 +00:00
Georg Lehmann	40fc85c15b	nir: make nir_instr_clone usable with load_const and undef Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>	2024-09-27 05:19:16 +00:00
Georg Lehmann	a9f8089240	nir: replace nir_opt_remove_phis_block with a single source version This is what callers actually want, and it simplifies nir_opt_remove_phis because we can assume dominance meta data is valid. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>	2024-09-27 05:19:16 +00:00
Georg Lehmann	41e82b8b8e	nir: sink is_subgroup_invocation_lt_amd Having it closer to the branches means we can eliminate an exec copy. Foz-DB Navi31: Totals from 11615 (14.63% of 79395) affected shaders: Instrs: 6804372 -> 6804903 (+0.01%); split: -0.04%, +0.05% CodeSize: 33684672 -> 33680584 (-0.01%); split: -0.07%, +0.05% VGPRs: 578616 -> 578604 (-0.00%) SpillSGPRs: 1506 -> 1304 (-13.41%) Latency: 29817034 -> 29821320 (+0.01%); split: -0.03%, +0.05% InvThroughput: 3581587 -> 3581217 (-0.01%); split: -0.02%, +0.01% VClause: 124826 -> 124782 (-0.04%); split: -0.04%, +0.00% SClause: 187916 -> 187645 (-0.14%); split: -0.27%, +0.13% Copies: 520969 -> 510027 (-2.10%); split: -2.20%, +0.10% PreSGPRs: 442584 -> 421344 (-4.80%) VALU: 3810755 -> 3810267 (-0.01%); split: -0.01%, +0.00% SALU: 763402 -> 752650 (-1.41%); split: -1.48%, +0.07% Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>	2024-09-26 14:29:14 +00:00
Georg Lehmann	bcfc5c09fa	amd: add offset to is_subgroup_invocation_lt_amd Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>	2024-09-26 14:29:13 +00:00
Marek Olšák	09e64e3682	nir/opt_shrink_vectors: shrink memory loads, not just IO The problem with radeonsi+ACO is that UBO loads from vec4 uniforms using only 1 component always load all 4 components. This fixes that. We are only interested in shrinking UBO and SSBO loads, but I added more intrinsics because why not. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29384>	2024-09-26 03:01:38 +00:00
Timothy Arceri	f6e7520b13	glsl: remove now unused linker code This has all be replaced by a nir based linker implementation. Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>	2024-09-25 09:39:44 +00:00
Timothy Arceri	fe9b93fc1c	nir: handle wildcard array deref Here we add handling of wildcard array derefs when attempting to mark an io as partially used rather than hitting an assert. Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>	2024-09-25 09:39:44 +00:00
Timothy Arceri	6bb6b0e5ad	nir: add nir_intrinsic_deref_implicit_array_length intrinsic This will be used to handle .length() calls on unsized arrays Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>	2024-09-25 09:39:44 +00:00
Timothy Arceri	60937b5286	nir: add implicit_conversion_prohibited field to nir_parameter Will be used in link time validation in following patches. Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>	2024-09-25 09:39:44 +00:00

1 2 3 4 5 ...

5626 commits