fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 22:00:13 +01:00

Author	SHA1	Message	Date
Matt Turner	0a63d629fe	intel/compiler: Use unreachable instead of assert(!"...") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34014>	2025-03-13 20:11:10 +00:00
Lionel Landwerlin	1835bf3520	brw: avoid calling lower_indirect_derefs multiple times Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Lowering the indirect derefs multiple times leads to very inefficient shaders because of all the control flow inserted. In particular on some DGC tests with mesh shaders, the tests can spin for 1hour on an i7 and still not complete compilation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33809>	2025-03-09 20:52:01 +00:00
Sagar Ghuge	1bfe2571f5	intel/compiler: Lower sample index into coord for MSRT messages Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32690>	2025-03-07 23:06:14 +00:00
Sagar Ghuge	bea9d79cb9	intel/compiler: Add support for MSAA typed load/store messages Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32690>	2025-03-07 23:06:14 +00:00
Alyssa Rosenzweig	9a58a8257e	treewide: Switch to nir_progress Via the Coccinelle patch at the end of the commit message, followed by sed -ie 's/progress = progress \| /progress \|=/g' $(git grep -l 'progress = prog') ninja -C ~/mesa/build clang-format cd ~/mesa/src/compiler/nir && clang-format -i *.c agxfmt @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} -return prog; +return nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -return true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -return false; -} +bool progress = prog_expr; +return nir_progress(progress, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all); -return prog; +return nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all); +nir_progress(prog, impl, metadata); @@ expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); -return true; +return nir_progress(true, impl, metadata); @@ expression impl; @@ -nir_metadata_preserve(impl, nir_metadata_all); -return false; +return nir_no_progress(impl); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} -other_prog \|= prog; +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +nir_progress(prog, impl, metadata); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -other_prog = true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; identifier prog; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -prog = true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +prog = prog \| nir_progress(impl_progress, impl, metadata); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -other_prog = true; -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; identifier prog; @@ -if (prog_expr) { -prog = true; -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +prog = prog \| nir_progress(impl_progress, impl, metadata); @@ expression prog_expr, impl, metadata; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +nir_progress(impl_progress, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); -prog = true; +prog = nir_progress(true, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} -return prog; +return nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} +nir_progress(prog, impl, metadata); @@ expression impl; @@ -nir_metadata_preserve(impl, nir_metadata_all); +nir_no_progress(impl); @@ expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); +nir_progress(true, impl, metadata); squashme! sed -ie 's/progress = progress \| /progress \|=/g' $(git grep -l 'progress = prog') Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>	2025-02-26 15:19:53 +00:00
Georg Lehmann	f26069fdd9	nir: replace nir_opt_conditional_discard with nir_opt_peephole_select Foz-DB Navi21: Totals from 118 (0.15% of 79377) affected shaders: Instrs: 208001 -> 207355 (-0.31%); split: -0.33%, +0.01% CodeSize: 1080428 -> 1078432 (-0.18%); split: -0.20%, +0.02% SpillSGPRs: 202 -> 211 (+4.46%) Latency: 1923508 -> 1919093 (-0.23%); split: -0.62%, +0.39% InvThroughput: 407475 -> 407081 (-0.10%); split: -0.12%, +0.02% SClause: 7050 -> 7033 (-0.24%); split: -0.31%, +0.07% Copies: 12156 -> 11821 (-2.76%); split: -3.04%, +0.28% PreSGPRs: 8198 -> 8331 (+1.62%); split: -0.02%, +1.65% PreVGPRs: 7628 -> 7528 (-1.31%) VALU: 155747 -> 155657 (-0.06%); split: -0.06%, +0.00% SALU: 18295 -> 17782 (-2.80%); split: -2.98%, +0.18% SMEM: 10521 -> 10519 (-0.02%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33590>	2025-02-20 21:59:17 +00:00
Georg Lehmann	ca8147edbe	nir/peephole_select: add options struct Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33590>	2025-02-20 21:59:16 +00:00
Lionel Landwerlin	f19c5f4fcc	brw: use meaningful io locations for system values Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>	2025-02-13 14:36:15 +00:00
Lionel Landwerlin	bae9344baf	brw: port vs input to lower_64bit_to_32_new Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>	2025-02-13 14:36:15 +00:00
Daniel Schürmann	175c06e5cd	intel: switch to nir_metadata_divergence Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30814>	2025-02-13 10:08:43 +00:00
Francisco Jerez	80b2355b39	intel/brw: Allow specifying a required subgroup size for fragment shaders. On older hardware the "use_rep_send" compile parameter was being implicitly used to request the compilation of the SIMD16 variant of clear pixel shaders that require it due to hardware restrictions. However starting on Gfx12+ this flag is never set since replicated data clears are no longer supported, but BLORP still implicitly relies on the SIMD16 variant being generated even though there's no way for BLORP to explicitly request it. This doesn't cause much of a problem right now since brw_compile_fs() typically generates a SIMD16 kernel unless the SIMD8 kernel spills or SIMD debugging flags are enabled, but it won't work reliably on Xe3+ since we'll start using SIMD32 more aggressively. In order to avoid these issues use the standard required subgroup_size parameter from shader_info to signal that the SIMD16 variant of the shader is needed by the caller. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>	2025-01-29 23:39:32 +00:00
Daniel Schürmann	f3be7ce01b	nir/from_ssa: only consider divergence if requested This pass used to unconditionally use divergence information which forced the caller to either call divergence_analysis or ensure that the divergence is properly reset. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>	2025-01-23 01:31:23 +00:00
Kenneth Graunke	21636ff9fa	brw: Align and combine constant-offset UBO loads in NIR The hope here is to replace our backend handling for loading whole cachelines at a time from UBOs into NIR-based handling, which plays nicely with the NIR load/store vectorizer. Rounding down offsets to multiples of 64B allows us to globally CSE UBO loads across basic blocks. This is really useful. However, blindly rounding down the offset to a multiple of 64B can trigger anti-patterns where...a single unaligned memory load could have hit all the necessary data, but rounding it down split it into two loads. By moving this to NIR, we gain more control of the interplay between nir_opt_load_store_vectorize and this rebasing and CSE'ing. The backend can then simply load between nir_def_{first,last}_component_read() and trust that our NIR has the loads blockified appropriately. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	35f175301d	brw: Fix vectorizer hole_size condition after signedness change Marek recently changed hole_size to be signed, rather than unsigned. A negative hole_size means that the two loads overlap - and thus are prime candidates to be combined. My original hole_size handling was: if hole_size > 4 * (8 - low->num_components) then don't vectorize For non-overlapping loads, this worked: NIR's largest vector is vec16, and if low was already a vec16, combining it with anything would exceed that, so it'd never be considered. That meant low would always be a vec8 or less, so (8 - low->num_components) was a positive number. Now that we see overlapping loads, we can see a vec16 low, vec4 high, and also a negative hole size, giving us fun comparisons like: -16 > 4 * (8 - 16) => -16 > -32 => true, don't vectorize Which is absolutely the wrong thing to do, because the high load's data is entirely included within the former load's data. The idea here was to make sure the second load would be able to pack at least one component into the first's V8 result. But even this isn't the best, because...even if it's simply adjacent, doing one V16 load is more efficient than requesting two back to back V8 loads. So, we just simplify down to a static check: if there's an entire V8 of hole, don't vectorize. This already won't happen because the core pass has max_hole set to 28 bytes (7 32-bit components), but that could change based on the needs of other drivers, so let's be defensive. fossil-db results on Alchemist: Instrs: 161533978 -> 161295137 (-0.15%); split: -0.20%, +0.05% Subgroup size: 8092544 -> 8092568 (+0.00%) Send messages: 7915233 -> 7844503 (-0.89%); split: -0.94%, +0.05% Cycle count: 16577700697 -> 16702609256 (+0.75%); split: -0.59%, +1.35% Spill count: 72338 -> 67226 (-7.07%); split: -7.36%, +0.29% Fill count: 134058 -> 125980 (-6.03%); split: -6.83%, +0.80% Scratch Memory Size: 4092928 -> 3786752 (-7.48%); split: -7.53%, +0.05% Max live registers: 33031460 -> 32945994 (-0.26%); split: -0.27%, +0.01% Max dispatch width: 5778384 -> 5778536 (+0.00%); split: +0.26%, -0.26% Non SSA regs after NIR: 179809505 -> 152735471 (-15.06%); split: -15.08%, +0.03% Fixes: `c21bc65ba7` ("nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32932>	2025-01-08 00:19:54 +00:00
Marek Olšák	c21bc65ba7	nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads A negative hole size means the loads overlap. This will be used by drivers to handle overlapping loads in the callback easily. Reviewed-by: Mel Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>	2025-01-01 00:03:55 +00:00
Caio Oliveira	056b14b882	intel/brw: Move two NIR passes to brw_nir.c Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32799>	2024-12-30 20:18:23 +00:00
Benjamin Lee	becb014d27	nir: treat per-view outputs as arrayed IO This is needed for implementing multiview in panvk, where the address calculation for multiview outputs is not well-represented by lowering to nir_intrinsic_store_output with a single offset. The case where a variable is both per-view and per-{vertex,primitive} is now unsupported. This would come up with drivers implementing NV_mesh_shader or using nir_lower_multiview on geometry, tessellation, or mesh shaders. No drivers currently do either of these. There was some code that attempted to handle the nested per-view case by unwrapping per-view/arrayed types twice, but it's unclear to what extent this actually worked. ANV and Turnip both rely on per-view outputs being assigned a unique driver location for each view, so I've added on option to configure that behavior rather than removing it. Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>	2024-12-09 20:31:49 +00:00
Kenneth Graunke	6fd10a6620	brw: Tune vectorizer conditions to allow overfetching with holes Notably, our convergent block loads were already overfetching - we rounded up to block sizes of 8, 16, 32, or 64(LSC-only). But we did so in the backend, rather than NIR. With recent changes, nir_opt_load_store_vectorizer allows holes of up to 28 bytes (7 components at 4 bytes each). This allows us to detect cases where we did a convergent block load for 1 component (but loaded a whole vec8), then another load for the next vec8, and combine them into a single V16 load. Single component loads aren't the most common, but convergent loads of a vec2 in one group and a vec3 in another are quite common, and it makes no sense to do V8+V8 loads instead of V16. For non-block loads, we allow a max hole of 4 bytes. This allows the common case of XYZ_ + XYZ_ loads (where the last component is unread) to combine into a single larger load. fossil-db results on Lunarlake: Totals: Instrs: 146692608 -> 146246432 (-0.30%); split: -0.33%, +0.02% Subgroup size: 11100528 -> 11100512 (-0.00%) Send messages: 7003425 -> 6862529 (-2.01%); split: -2.01%, +0.00% Cycle count: 22396273274 -> 22523048654 (+0.57%); split: -1.08%, +1.64% Spill count: 67671 -> 67594 (-0.11%); split: -1.59%, +1.48% Fill count: 128999 -> 130223 (+0.95%); split: -1.73%, +2.68% Scratch Memory Size: 5986304 -> 6042624 (+0.94%); split: -1.40%, +2.34% Max live registers: 48898858 -> 48881655 (-0.04%); split: -0.05%, +0.01% Non SSA regs after NIR: 172397792 -> 167577380 (-2.80%); split: -2.80%, +0.00% Totals from 451003 (80.87% of 557667) affected shaders: Instrs: 134111754 -> 133665578 (-0.33%); split: -0.36%, +0.03% Subgroup size: 9039104 -> 9039088 (-0.00%) Send messages: 6127775 -> 5986879 (-2.30%); split: -2.30%, +0.00% Cycle count: 20306336726 -> 20433112106 (+0.62%); split: -1.19%, +1.81% Spill count: 56230 -> 56153 (-0.14%); split: -1.92%, +1.78% Fill count: 112920 -> 114144 (+1.08%); split: -1.97%, +3.06% Scratch Memory Size: 3769344 -> 3825664 (+1.49%); split: -2.23%, +3.72% Max live registers: 43750259 -> 43733056 (-0.04%); split: -0.05%, +0.01% Non SSA regs after NIR: 158449343 -> 153628931 (-3.04%); split: -3.04%, +0.00% In particular, sends get cut by 20.85% for Borderlands 3 DX12, 13.82% on Cyberpunk 2077, 10.75% on Strange Brigade, and 10.20% on Red Dead Redemption 2. Yet, spill/fills remain about the same. fossil-db results on Alchemist are similar though not quite as good. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	f88eb48ff2	anv: Don't consider nir_var_mem_global for vectorizer robustness checks nir_opt_load_store_vectorize checks for potential address wrapping when vectorizing two loads ("low" and "high"). It looks for cases where "low" might have a large address, and "high" has a positive offset which, when added together, could trigger integer wraparound. The issue here is that if the large address of "low" was considered out-of-bounds, adding offset could wrap around to a small address, which might actually be in-bounds. Thus, when loaded separately, "low" will fail and trigger robustness out-of-bound-read behavior, but "high" would read correctly. When vectorized, the entire load would fail. This is explicitly tested for with 32-bit SSBO addresses in the Vulkan CTS. However, anv's 64-bit global addresses and VMA handling effectively prevent this case. Addresses 0-4095 are a reserved page so that if people try to use 0 as a NULL pointer, it never maps to a valid BO. That alone guarantees that the above case where "high" gets a small address would never be in-bounds, so we don't need to check for it. In fact, we allocate most user allocations out of high addresses, and have specialized allocation heaps for certain types of GPU data structures in the lower GB of memory. For a load to wrap around and successfully land in the right heap, it would have to load gigabytes. Disabling this allows load vectorization and overfetching in more cases. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	da93b13f8b	brw: Use nir_combined_align in brw_nir_should_vectorize_mem Better than open-coding this. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:32 +00:00
Lionel Landwerlin	bfcb9bf276	brw: rename brw_sometimes to intel_sometimes Moving it to intel_shader_enums.h The plan is to make it visible to OpenCL shaders. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Marek Olšák	25d4943481	nir: make use_interpolated_input_intrinsics a nir_lower_io parameter This will need to be set to true when the GLSL linker lowers IO, which can later be unlowered by st/mesa, and then drivers can lower it again without load_interpolated_input. Therefore, it can't be a global immutable option. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32229>	2024-11-20 02:45:37 +00:00
Rhys Perry	45c1280d2c	nir_lower_mem_access_bit_sizes: pass access to callback Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Rhys Perry	61752152f7	nir_lower_mem_access_bit_sizes: add nir_mem_access_shift_method Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Daniel Schürmann	87cb42f953	treewide: don't lower to LCSSA before calling nir_divergence_analysis() Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Kenneth Graunke	834b919f6a	brw: Optimize 16-bit texture fetches later At the point we were calling this, we hadn't necessarily cleaned up derefs via nir_lower_vars_to_ssa, nor movs/vecs via copy propagation, so it wasn't necessarily easy for this pass to see the actual usage of the destination. Moving this later allows us to detect f2f32(txf(...)) and avoid converting it to a 16-bit txf (why convert with ALU instructions when the sampler could do it for us?). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31750>	2024-10-22 01:15:10 +00:00
Lionel Landwerlin	97b17aa0b1	brw/nir: rework inline_data_intel to work with compute This intrinsic was initially dedicated to mesh/task shaders, but the mechanism it exposes also exists in the compute shaders on Gfx12.5+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Lionel Landwerlin	f5d123b977	brw: delay printf lowering Useful to insert debug traces a bit later in the lowering process (in particular after load/store vectorization). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Marek Olšák	65ace5649b	nir: reject unsupported component counts from all vectorize callbacks If you allow an unsupported component count in the callback for loads, nir_opt_load_store_vectorize will align num_components to the next supported vector size, essentially overfetching. This changes all callbacks to reject it. AMD will enable it in a later commit. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	02923e237d	nir: add hole_size parameter into the vectorize callback It will be used to allow merging loads with a hole between them. Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Lionel Landwerlin	02b124846f	brw: fix TGM messages to use cmask lsc opcodes This is a restriction for TGM. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b55f7716` ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>	2024-09-17 09:28:58 +00:00
Lionel Landwerlin	2159e17da0	brw: remove (load\|store)_raw_intel Those are Elk specific intrinsics. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b8f264cfe4` ("intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic()") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>	2024-09-17 09:28:58 +00:00
Ian Romanick	447dae7c13	intel/brw: Use nir_opt_generate_bfi No shader-db changes on any Intel platform. The "regression" in SEND messages occurs because a loop containing a SEND is unrolled. v2: Move after nir_opt_algebraic. Suggested by Georg. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19787034 -> 19785933 (<.01%) instructions in affected programs: 373573 -> 372472 (-0.29%) helped: 541 / HURT: 6 total cycles in shared programs: 906012612 -> 905626304 (-0.04%) cycles in affected programs: 58456516 -> 58070208 (-0.66%) helped: 382 / HURT: 180 fossil-db: Lunar Lake Totals: Instrs: 140671401 -> 140670495 (-0.00%); split: -0.00%, +0.00% Send messages: 12891430822 -> 12891430834 (+0.00%) Loop count: 46905 -> 46904 (-0.00%) Cycle count: 21527511599 -> 21530278999 (+0.01%); split: -0.00%, +0.02% Spill count: 70728 -> 70766 (+0.05%) Fill count: 139397 -> 139254 (-0.10%); split: -0.13%, +0.02% Max live registers: 47512432 -> 47512500 (+0.00%) Totals from 355 (0.06% of 549270) affected shaders: Instrs: 878953 -> 878047 (-0.10%); split: -0.18%, +0.08% Send messages: 19289 -> 19301 (+0.06%) Loop count: 1243 -> 1242 (-0.08%) Cycle count: 1434664642 -> 1437432042 (+0.19%); split: -0.06%, +0.25% Spill count: 15826 -> 15864 (+0.24%) Fill count: 38454 -> 38311 (-0.37%); split: -0.46%, +0.08% Max live registers: 52530 -> 52598 (+0.13%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152516575 -> 152516147 (-0.00%); split: -0.00%, +0.00% Send messages: 7491001 -> 7491013 (+0.00%) Loop count: 47588 -> 47587 (-0.00%) Cycle count: 17124433133 -> 17126147156 (+0.01%); split: -0.01%, +0.02% Max live registers: 31854704 -> 31854764 (+0.00%) Totals from 402 (0.06% of 633223) affected shaders: Instrs: 839338 -> 838910 (-0.05%); split: -0.09%, +0.04% Send messages: 20203 -> 20215 (+0.06%) Loop count: 1243 -> 1242 (-0.08%) Cycle count: 1327042160 -> 1328756183 (+0.13%); split: -0.11%, +0.24% Max live registers: 33237 -> 33297 (+0.18%) Tiger Lake *** Shaders only in 'before' results are ignored: fossil-db/steam-native/wolfenstein_youngblood/b8cefe7f700304c4/fs.32/0 from 1 apps: fossil-db/steam-native/wolfenstein_youngblood Totals: Instrs: 150549467 -> 150548952 (-0.00%); split: -0.00%, +0.00% Send messages: 7495582 -> 7495594 (+0.00%) Loop count: 46605 -> 46604 (-0.00%) Cycle count: 15472381586 -> 15472247085 (-0.00%); split: -0.00%, +0.00% Spill count: 59776 -> 59775 (-0.00%) Fill count: 103475 -> 103464 (-0.01%) Scratch Memory Size: 2384896 -> 2383872 (-0.04%) Max live registers: 31760724 -> 31760787 (+0.00%) Max dispatch width: 5569928 -> 5569912 (-0.00%) Totals from 525 (0.08% of 632443) affected shaders: Instrs: 349074 -> 348559 (-0.15%); split: -0.25%, +0.11% Send messages: 24355 -> 24367 (+0.05%) Loop count: 849 -> 848 (-0.12%) Cycle count: 187080291 -> 186945790 (-0.07%); split: -0.19%, +0.12% Spill count: 483 -> 482 (-0.21%) Fill count: 1372 -> 1361 (-0.80%) Scratch Memory Size: 22528 -> 21504 (-4.55%) Max live registers: 36705 -> 36768 (+0.17%) Max dispatch width: 6272 -> 6256 (-0.26%) Ice Lake Totals: Instrs: 151804923 -> 151804396 (-0.00%); split: -0.00%, +0.00% Send messages: 7553216 -> 7553228 (+0.00%) Loop count: 46196 -> 46195 (-0.00%) Cycle count: 15099805668 -> 15099533898 (-0.00%); split: -0.00%, +0.00% Fill count: 103978 -> 103979 (+0.00%) Max live registers: 32168254 -> 32168323 (+0.00%) Totals from 527 (0.08% of 637191) affected shaders: Instrs: 347482 -> 346955 (-0.15%); split: -0.25%, +0.10% Send messages: 24586 -> 24598 (+0.05%) Loop count: 849 -> 848 (-0.12%) Cycle count: 191147758 -> 190875988 (-0.14%); split: -0.16%, +0.02% Fill count: 1392 -> 1393 (+0.07%) Max live registers: 37379 -> 37448 (+0.18%) Skylake Totals: Instrs: 140981504 -> 140980647 (-0.00%); split: -0.00%, +0.00% Cycle count: 14653477192 -> 14653249734 (-0.00%); split: -0.00%, +0.00% Fill count: 99636 -> 99637 (+0.00%) Max live registers: 31472062 -> 31472126 (+0.00%) Totals from 523 (0.08% of 626432) affected shaders: Instrs: 335551 -> 334694 (-0.26%); split: -0.26%, +0.01% Cycle count: 178047284 -> 177819826 (-0.13%); split: -0.14%, +0.02% Fill count: 1100 -> 1101 (+0.09%) Max live registers: 36734 -> 36798 (+0.17%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>	2024-09-13 00:21:00 +00:00
Kenneth Graunke	b8f264cfe4	intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	8a6903e50d	intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop" This is going to handle more than atomics shortly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Ian Romanick	13332c236b	intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup I observed some ray tracing shaders where a resource_intel inside a loop was non-uniform, and some code was lowered to account for that. Eventually the loop containing the resource_intel was unrolled, and the resource_intel became uniform. For example, nir_opt_uniform_subgroup can transform something like con loop { con block b5: // preds: b4 b8 con 32 %330 = @read_first_invocation (%329) con 1 %331 = ieq %330, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %330 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless\|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } into con loop { con block b5: // preds: b4 b8 con 1 %331 = ieq %329, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %329 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless\|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } Notice that %331 is now a tautology. Running brw_nir_optimize again eliminates the loop. v2: Add a comment in the code explaining the rationale. Suggested by Ken. Update the commit message. Suggested by Caio. shader-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19733448 -> 19733330 (<.01%) instructions in affected programs: 14120 -> 14002 (-0.84%) helped: 32 / HURT: 3 total cycles in shared programs: 916254496 -> 916226288 (<.01%) cycles in affected programs: 2035116 -> 2006908 (-1.39%) helped: 19 / HURT: 13 total spills in shared programs: 5807 -> 5807 (0.00%) spills in affected programs: 26 -> 26 (0.00%) helped: 1 / HURT: 1 total fills in shared programs: 6794 -> 6792 (-0.03%) fills in affected programs: 84 -> 82 (-2.38%) helped: 1 / HURT: 1 LOST: 1 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20393084 -> 20392971 (<.01%) instructions in affected programs: 21750 -> 21637 (-0.52%) helped: 31 / HURT: 4 total cycles in shared programs: 880273065 -> 880247818 (<.01%) cycles in affected programs: 2546748 -> 2521501 (-0.99%) helped: 18 / HURT: 9 total spills in shared programs: 4628 -> 4630 (0.04%) spills in affected programs: 287 -> 289 (0.70%) helped: 1 / HURT: 2 total fills in shared programs: 5381 -> 5376 (-0.09%) fills in affected programs: 711 -> 706 (-0.70%) helped: 2 / HURT: 2 LOST: 1 GAINED: 1 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151513669 -> 151505520 (-0.01%); split: -0.01%, +0.00% Send messages: 7459339 -> 7459396 (+0.00%) Loop count: 49111 -> 47588 (-3.10%) Cycle count: 17208178205 -> 17201385104 (-0.04%); split: -0.05%, +0.01% Spill count: 80830 -> 80827 (-0.00%); split: -0.02%, +0.01% Fill count: 152754 -> 152693 (-0.04%); split: -0.04%, +0.00% Scratch Memory Size: 4136960 -> 4130816 (-0.15%) Max live registers: 32016493 -> 32015955 (-0.00%); split: -0.00%, +0.00% Totals from 672 (0.11% of 630198) affected shaders: Instrs: 1352428 -> 1344279 (-0.60%); split: -0.78%, +0.17% Send messages: 54302 -> 54359 (+0.10%) Loop count: 6124 -> 4601 (-24.87%) Cycle count: 1260266379 -> 1253473278 (-0.54%); split: -0.69%, +0.16% Spill count: 15967 -> 15964 (-0.02%); split: -0.09%, +0.08% Fill count: 36245 -> 36184 (-0.17%); split: -0.18%, +0.01% Scratch Memory Size: 740352 -> 734208 (-0.83%) Max live registers: 50699 -> 50161 (-1.06%); split: -1.45%, +0.39% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 149976046 -> 149971100 (-0.00%); split: -0.00%, +0.00% Subgroup size: 7685264 -> 7685256 (-0.00%) Cycle count: 15566401168 -> 15566405478 (+0.00%); split: -0.00%, +0.00% Spill count: 61238 -> 61240 (+0.00%) Fill count: 107301 -> 107289 (-0.01%) Max live registers: 31992969 -> 31993857 (+0.00%); split: -0.00%, +0.00% Totals from 553 (0.09% of 629912) affected shaders: Instrs: 557027 -> 552081 (-0.89%); split: -0.90%, +0.01% Subgroup size: 8648 -> 8640 (-0.09%) Cycle count: 150154496 -> 150158806 (+0.00%); split: -0.23%, +0.24% Spill count: 181 -> 183 (+1.10%) Fill count: 440 -> 428 (-2.73%) Max live registers: 33698 -> 34586 (+2.64%); split: -0.02%, +2.65% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	65eb7ed5fc	intel/brw: Run intel_nir_lower_conversions only after brw_nir_optimize Without this, the next commit tiggers assertions. v2: Unconditionally do the lowering after brw_nir_optimize. Suggested by Caio. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Jesse Natalie	03655dfda1	compiler, vk: Support subgroup size of 4 Relax the assert and assign it an enum value Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30876>	2024-08-29 03:30:31 +00:00
Kenneth Graunke	6a292c2699	intel: Fix bad align_offset on global_constant_uniform_block_intel We were specifying align_offset = 64 and align_mul = 64, which is invalid. nir_combined_align() asserts that align_offset < align_mul. Our intention here is to perform cacheline-aligned (64B-aligned) block loads, so we should set align_mul = 64 and can leave align_offset = 0. Fixes: `fbafa9cabd` ("intel/nir: remove load_global_const_block_intel intrinsic") Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30755>	2024-08-21 20:44:57 +00:00
Lionel Landwerlin	fbafa9cabd	intel/nir: remove load_global_const_block_intel intrinsic load_global_constant_uniform_block_intel is equivalent in terms of loading, then for the predicate we just do a bcsel afterward in places where that is required. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>	2024-08-16 11:12:39 +00:00
Ian Romanick	119801e647	intel/brw: Move fsat instructions closer to the source Intel GPUs have a saturate destination modifier, and brw_fs_opt_saturate_propagation tries to replace explicit saturate operations with this destination modifier. That pass is limited in several ways. If the source of the explicit saturate is in a different block or if the source of the explicit saturate is live after the explicit saturate, brw_fs_opt_saturate_propagation will be unable to make progress. This optimization exists to help brw_fs_opt_saturate_propagation make more progress. It tries to move NIR fsat instructions to the same block that contains the definition of its source. It does this only in cases where it will not create additional live values. It also attempts to do this only in cases where the explicit saturate will ultimiately be converted to a destination modifier. v2: Fix metadata_preserve when theres no progress and use nir_metadata_control_flow when there is progress. All suggested by Alyssa. v3: Fix a typo in the file header comment. Noticed by Ken. Don't require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead of open-coding something slightly more specific. Both suggested by Ken. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19733645 -> 19733028 (<.01%) instructions in affected programs: 193300 -> 192683 (-0.32%) helped: 246 HURT: 1 helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2 helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for instructions value: -2.87 -2.13 95% mean confidence interval for instructions %-change: -0.34% -0.32% Instructions are helped. total cycles in shared programs: 916180971 -> 916264656 (<.01%) cycles in affected programs: 30197180 -> 30280865 (0.28%) helped: 194 HURT: 142 helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19 helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23% HURT stats (abs) min: 1 max: 28058 x̄: 1781.68 x̃: 399 HURT stats (rel) min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63% 95% mean confidence interval for cycles value: -196.84 694.97 95% mean confidence interval for cycles %-change: -0.17% 1.27% Inconclusive result (value mean confidence interval includes 0). fossil-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00% Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02% Max live registers: 32013312 -> 32013549 (+0.00%) Max dispatch width: 5512304 -> 5512136 (-0.00%) Totals from 774 (0.12% of 630172) affected shaders: Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01% Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30% Max live registers: 82195 -> 82432 (+0.29%) Max dispatch width: 6664 -> 6496 (-2.52%) Ice Lake Totals: Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00% Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01% Max live registers: 32471367 -> 32471603 (+0.00%) Max dispatch width: 5623752 -> 5623712 (-0.00%) Totals from 733 (0.12% of 635598) affected shaders: Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01% Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64% Max live registers: 72067 -> 72303 (+0.33%) Max dispatch width: 6216 -> 6176 (-0.64%) Skylake Totals: Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01% Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00% Totals from 659 (0.11% of 625670) affected shaders: Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00% Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41% Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:10 -07:00
Kenneth Graunke	b6f4f64b43	intel/brw: Drop image_{load,store}_raw_intel handling Gfx8 required us to emulate image load store with untyped messages, whereas Gfx9 just has typed message support for everything. brw no longer supports Gfx8, so all of this code is effectively dead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>	2024-08-09 07:20:08 +00:00
Alyssa Rosenzweig	d99c2ef059	nir/opt_uniform_atomics: add fs atomics predicated? flag on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in this pass is redundant. and would cause a problem since we'd then have to lower the "is helper inv?" flag late. so just skip the extra lowering code. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:17 -04:00
Kenneth Graunke	7c579f448f	intel/brw: Mark all UBO access with a direct buffer index as speculative UBO loads with a non-indirect buffer index should be safe to perform speculatively. With a direct offset, we may sometimes turn them into push constants, at which point it's just reading a register with no cost at all. Otherwise, we access them via messages that use surface state, and automatically perform bounds checking. So we shouldn't have any issues with reading out of bounds and page faulting, for example. This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics, so we can turn simple if's with loads on both sides to bcsels. In some cases this can collapse a surprising amount of control flow, allowing other optimizations to work better. The i965 OpenGL driver used load_uniform intrinsics, which are allowed in NIR's peephole select pass. But iris uses the Gallium NIR pass that translates uniforms to loads from UBO 0, so we haven't been able to take advantage of NIR's peephole select pass there. The backend pass was still able to handle this to some extent, however. fossil-db results on Alchemist: Totals: Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00% Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00% Send messages: 7416330 -> 7416261 (-0.00%) Spill count: 52471 -> 52473 (+0.00%) Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00% Scratch Memory Size: 3197952 -> 3198976 (+0.03%) Totals from 1848 (0.29% of 630003) affected shaders: Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02% Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03% Send messages: 59829 -> 59760 (-0.12%) Spill count: 3870 -> 3872 (+0.05%) Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02% Scratch Memory Size: 174080 -> 175104 (+0.59%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Sushma Venkatesh Reddy	0116430d39	intel/brw: Handle 16-bit sampler return payloads API requires samplers to return 32-bit even though hardware can handle 16-bit floating point, so we detect that case and make more efficient use of memory BW. This is helping improve performance of encode and decode tokens during LLM by at least 5% across multiple platforms. Thank you Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00
Marek Olšák	b2d32ae246	nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag Instead of having 1 bit in nir_io_semantics indicating a per-primitive FS input, add a dedicated intrinsic for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>	2024-07-23 16:13:16 +00:00
Qiang Yu	3151f5ec47	nir: add filter parameter to nir_lower_array_deref_of_vec To be used by latter commits to limit the lowering to specific variables. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29799>	2024-07-03 02:06:56 +00:00
Francisco Jerez	e8007c9325	intel/fs/xe2+: Don't lower barycentric load offsets to fixed-point format on Xe2+. Floating-point offsets work fine in combination with the floating-point arithmetic we're about to lower these intrinsics into, and they require less instructions than converting to fixed-point and then back. No reason to take the precision/range hit nor the extra instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Alyssa Rosenzweig	da752ed7c1	treewide: use nir_def_replace sometimes Two Coccinelle patches here. Didn't catch nearly as much as I would've liked but it's a start. Coccinelle patch: @@ expression intr, repl; @@ -nir_def_rewrite_uses(&intr->def, repl); -nir_instr_remove(&intr->instr); +nir_def_replace(&intr->def, repl); Coccinelle patch: @@ identifier intr; expression instr, repl; @@ nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr); ... -nir_def_rewrite_uses(&intr->def, repl); -nir_instr_remove(instr); +nir_def_replace(&intr->def, repl); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> [etna] Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> [r300] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29817>	2024-06-21 15:36:56 +00:00
Alyssa Rosenzweig	15257b65c6	treewide: use nir_metadata_control_flow Via Coccinelle patch: @@ @@ -nir_metadata_block_index \| nir_metadata_dominance +nir_metadata_control_flow ...plus some manual fixups for call sites missed by coccinelle. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>	2024-06-17 16:28:14 -04:00

1 2 3 4 5 ...

410 commits