fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 15:40:11 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	f88eb48ff2	anv: Don't consider nir_var_mem_global for vectorizer robustness checks nir_opt_load_store_vectorize checks for potential address wrapping when vectorizing two loads ("low" and "high"). It looks for cases where "low" might have a large address, and "high" has a positive offset which, when added together, could trigger integer wraparound. The issue here is that if the large address of "low" was considered out-of-bounds, adding offset could wrap around to a small address, which might actually be in-bounds. Thus, when loaded separately, "low" will fail and trigger robustness out-of-bound-read behavior, but "high" would read correctly. When vectorized, the entire load would fail. This is explicitly tested for with 32-bit SSBO addresses in the Vulkan CTS. However, anv's 64-bit global addresses and VMA handling effectively prevent this case. Addresses 0-4095 are a reserved page so that if people try to use 0 as a NULL pointer, it never maps to a valid BO. That alone guarantees that the above case where "high" gets a small address would never be in-bounds, so we don't need to check for it. In fact, we allocate most user allocations out of high addresses, and have specialized allocation heaps for certain types of GPU data structures in the lower GB of memory. For a load to wrap around and successfully land in the right heap, it would have to load gigabytes. Disabling this allows load vectorization and overfetching in more cases. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	da93b13f8b	brw: Use nir_combined_align in brw_nir_should_vectorize_mem Better than open-coding this. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:32 +00:00
Lionel Landwerlin	bfcb9bf276	brw: rename brw_sometimes to intel_sometimes Moving it to intel_shader_enums.h The plan is to make it visible to OpenCL shaders. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Marek Olšák	25d4943481	nir: make use_interpolated_input_intrinsics a nir_lower_io parameter This will need to be set to true when the GLSL linker lowers IO, which can later be unlowered by st/mesa, and then drivers can lower it again without load_interpolated_input. Therefore, it can't be a global immutable option. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32229>	2024-11-20 02:45:37 +00:00
Rhys Perry	45c1280d2c	nir_lower_mem_access_bit_sizes: pass access to callback Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Rhys Perry	61752152f7	nir_lower_mem_access_bit_sizes: add nir_mem_access_shift_method Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Daniel Schürmann	87cb42f953	treewide: don't lower to LCSSA before calling nir_divergence_analysis() Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Kenneth Graunke	834b919f6a	brw: Optimize 16-bit texture fetches later At the point we were calling this, we hadn't necessarily cleaned up derefs via nir_lower_vars_to_ssa, nor movs/vecs via copy propagation, so it wasn't necessarily easy for this pass to see the actual usage of the destination. Moving this later allows us to detect f2f32(txf(...)) and avoid converting it to a 16-bit txf (why convert with ALU instructions when the sampler could do it for us?). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31750>	2024-10-22 01:15:10 +00:00
Lionel Landwerlin	97b17aa0b1	brw/nir: rework inline_data_intel to work with compute This intrinsic was initially dedicated to mesh/task shaders, but the mechanism it exposes also exists in the compute shaders on Gfx12.5+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Lionel Landwerlin	f5d123b977	brw: delay printf lowering Useful to insert debug traces a bit later in the lowering process (in particular after load/store vectorization). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Marek Olšák	65ace5649b	nir: reject unsupported component counts from all vectorize callbacks If you allow an unsupported component count in the callback for loads, nir_opt_load_store_vectorize will align num_components to the next supported vector size, essentially overfetching. This changes all callbacks to reject it. AMD will enable it in a later commit. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	02923e237d	nir: add hole_size parameter into the vectorize callback It will be used to allow merging loads with a hole between them. Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Lionel Landwerlin	02b124846f	brw: fix TGM messages to use cmask lsc opcodes This is a restriction for TGM. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b55f7716` ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>	2024-09-17 09:28:58 +00:00
Lionel Landwerlin	2159e17da0	brw: remove (load\|store)_raw_intel Those are Elk specific intrinsics. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b8f264cfe4` ("intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic()") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>	2024-09-17 09:28:58 +00:00
Ian Romanick	447dae7c13	intel/brw: Use nir_opt_generate_bfi No shader-db changes on any Intel platform. The "regression" in SEND messages occurs because a loop containing a SEND is unrolled. v2: Move after nir_opt_algebraic. Suggested by Georg. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19787034 -> 19785933 (<.01%) instructions in affected programs: 373573 -> 372472 (-0.29%) helped: 541 / HURT: 6 total cycles in shared programs: 906012612 -> 905626304 (-0.04%) cycles in affected programs: 58456516 -> 58070208 (-0.66%) helped: 382 / HURT: 180 fossil-db: Lunar Lake Totals: Instrs: 140671401 -> 140670495 (-0.00%); split: -0.00%, +0.00% Send messages: 12891430822 -> 12891430834 (+0.00%) Loop count: 46905 -> 46904 (-0.00%) Cycle count: 21527511599 -> 21530278999 (+0.01%); split: -0.00%, +0.02% Spill count: 70728 -> 70766 (+0.05%) Fill count: 139397 -> 139254 (-0.10%); split: -0.13%, +0.02% Max live registers: 47512432 -> 47512500 (+0.00%) Totals from 355 (0.06% of 549270) affected shaders: Instrs: 878953 -> 878047 (-0.10%); split: -0.18%, +0.08% Send messages: 19289 -> 19301 (+0.06%) Loop count: 1243 -> 1242 (-0.08%) Cycle count: 1434664642 -> 1437432042 (+0.19%); split: -0.06%, +0.25% Spill count: 15826 -> 15864 (+0.24%) Fill count: 38454 -> 38311 (-0.37%); split: -0.46%, +0.08% Max live registers: 52530 -> 52598 (+0.13%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152516575 -> 152516147 (-0.00%); split: -0.00%, +0.00% Send messages: 7491001 -> 7491013 (+0.00%) Loop count: 47588 -> 47587 (-0.00%) Cycle count: 17124433133 -> 17126147156 (+0.01%); split: -0.01%, +0.02% Max live registers: 31854704 -> 31854764 (+0.00%) Totals from 402 (0.06% of 633223) affected shaders: Instrs: 839338 -> 838910 (-0.05%); split: -0.09%, +0.04% Send messages: 20203 -> 20215 (+0.06%) Loop count: 1243 -> 1242 (-0.08%) Cycle count: 1327042160 -> 1328756183 (+0.13%); split: -0.11%, +0.24% Max live registers: 33237 -> 33297 (+0.18%) Tiger Lake *** Shaders only in 'before' results are ignored: fossil-db/steam-native/wolfenstein_youngblood/b8cefe7f700304c4/fs.32/0 from 1 apps: fossil-db/steam-native/wolfenstein_youngblood Totals: Instrs: 150549467 -> 150548952 (-0.00%); split: -0.00%, +0.00% Send messages: 7495582 -> 7495594 (+0.00%) Loop count: 46605 -> 46604 (-0.00%) Cycle count: 15472381586 -> 15472247085 (-0.00%); split: -0.00%, +0.00% Spill count: 59776 -> 59775 (-0.00%) Fill count: 103475 -> 103464 (-0.01%) Scratch Memory Size: 2384896 -> 2383872 (-0.04%) Max live registers: 31760724 -> 31760787 (+0.00%) Max dispatch width: 5569928 -> 5569912 (-0.00%) Totals from 525 (0.08% of 632443) affected shaders: Instrs: 349074 -> 348559 (-0.15%); split: -0.25%, +0.11% Send messages: 24355 -> 24367 (+0.05%) Loop count: 849 -> 848 (-0.12%) Cycle count: 187080291 -> 186945790 (-0.07%); split: -0.19%, +0.12% Spill count: 483 -> 482 (-0.21%) Fill count: 1372 -> 1361 (-0.80%) Scratch Memory Size: 22528 -> 21504 (-4.55%) Max live registers: 36705 -> 36768 (+0.17%) Max dispatch width: 6272 -> 6256 (-0.26%) Ice Lake Totals: Instrs: 151804923 -> 151804396 (-0.00%); split: -0.00%, +0.00% Send messages: 7553216 -> 7553228 (+0.00%) Loop count: 46196 -> 46195 (-0.00%) Cycle count: 15099805668 -> 15099533898 (-0.00%); split: -0.00%, +0.00% Fill count: 103978 -> 103979 (+0.00%) Max live registers: 32168254 -> 32168323 (+0.00%) Totals from 527 (0.08% of 637191) affected shaders: Instrs: 347482 -> 346955 (-0.15%); split: -0.25%, +0.10% Send messages: 24586 -> 24598 (+0.05%) Loop count: 849 -> 848 (-0.12%) Cycle count: 191147758 -> 190875988 (-0.14%); split: -0.16%, +0.02% Fill count: 1392 -> 1393 (+0.07%) Max live registers: 37379 -> 37448 (+0.18%) Skylake Totals: Instrs: 140981504 -> 140980647 (-0.00%); split: -0.00%, +0.00% Cycle count: 14653477192 -> 14653249734 (-0.00%); split: -0.00%, +0.00% Fill count: 99636 -> 99637 (+0.00%) Max live registers: 31472062 -> 31472126 (+0.00%) Totals from 523 (0.08% of 626432) affected shaders: Instrs: 335551 -> 334694 (-0.26%); split: -0.26%, +0.01% Cycle count: 178047284 -> 177819826 (-0.13%); split: -0.14%, +0.02% Fill count: 1100 -> 1101 (+0.09%) Max live registers: 36734 -> 36798 (+0.17%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>	2024-09-13 00:21:00 +00:00
Kenneth Graunke	b8f264cfe4	intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	8a6903e50d	intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop" This is going to handle more than atomics shortly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Ian Romanick	13332c236b	intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup I observed some ray tracing shaders where a resource_intel inside a loop was non-uniform, and some code was lowered to account for that. Eventually the loop containing the resource_intel was unrolled, and the resource_intel became uniform. For example, nir_opt_uniform_subgroup can transform something like con loop { con block b5: // preds: b4 b8 con 32 %330 = @read_first_invocation (%329) con 1 %331 = ieq %330, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %330 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless\|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } into con loop { con block b5: // preds: b4 b8 con 1 %331 = ieq %329, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %329 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless\|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } Notice that %331 is now a tautology. Running brw_nir_optimize again eliminates the loop. v2: Add a comment in the code explaining the rationale. Suggested by Ken. Update the commit message. Suggested by Caio. shader-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19733448 -> 19733330 (<.01%) instructions in affected programs: 14120 -> 14002 (-0.84%) helped: 32 / HURT: 3 total cycles in shared programs: 916254496 -> 916226288 (<.01%) cycles in affected programs: 2035116 -> 2006908 (-1.39%) helped: 19 / HURT: 13 total spills in shared programs: 5807 -> 5807 (0.00%) spills in affected programs: 26 -> 26 (0.00%) helped: 1 / HURT: 1 total fills in shared programs: 6794 -> 6792 (-0.03%) fills in affected programs: 84 -> 82 (-2.38%) helped: 1 / HURT: 1 LOST: 1 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20393084 -> 20392971 (<.01%) instructions in affected programs: 21750 -> 21637 (-0.52%) helped: 31 / HURT: 4 total cycles in shared programs: 880273065 -> 880247818 (<.01%) cycles in affected programs: 2546748 -> 2521501 (-0.99%) helped: 18 / HURT: 9 total spills in shared programs: 4628 -> 4630 (0.04%) spills in affected programs: 287 -> 289 (0.70%) helped: 1 / HURT: 2 total fills in shared programs: 5381 -> 5376 (-0.09%) fills in affected programs: 711 -> 706 (-0.70%) helped: 2 / HURT: 2 LOST: 1 GAINED: 1 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151513669 -> 151505520 (-0.01%); split: -0.01%, +0.00% Send messages: 7459339 -> 7459396 (+0.00%) Loop count: 49111 -> 47588 (-3.10%) Cycle count: 17208178205 -> 17201385104 (-0.04%); split: -0.05%, +0.01% Spill count: 80830 -> 80827 (-0.00%); split: -0.02%, +0.01% Fill count: 152754 -> 152693 (-0.04%); split: -0.04%, +0.00% Scratch Memory Size: 4136960 -> 4130816 (-0.15%) Max live registers: 32016493 -> 32015955 (-0.00%); split: -0.00%, +0.00% Totals from 672 (0.11% of 630198) affected shaders: Instrs: 1352428 -> 1344279 (-0.60%); split: -0.78%, +0.17% Send messages: 54302 -> 54359 (+0.10%) Loop count: 6124 -> 4601 (-24.87%) Cycle count: 1260266379 -> 1253473278 (-0.54%); split: -0.69%, +0.16% Spill count: 15967 -> 15964 (-0.02%); split: -0.09%, +0.08% Fill count: 36245 -> 36184 (-0.17%); split: -0.18%, +0.01% Scratch Memory Size: 740352 -> 734208 (-0.83%) Max live registers: 50699 -> 50161 (-1.06%); split: -1.45%, +0.39% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 149976046 -> 149971100 (-0.00%); split: -0.00%, +0.00% Subgroup size: 7685264 -> 7685256 (-0.00%) Cycle count: 15566401168 -> 15566405478 (+0.00%); split: -0.00%, +0.00% Spill count: 61238 -> 61240 (+0.00%) Fill count: 107301 -> 107289 (-0.01%) Max live registers: 31992969 -> 31993857 (+0.00%); split: -0.00%, +0.00% Totals from 553 (0.09% of 629912) affected shaders: Instrs: 557027 -> 552081 (-0.89%); split: -0.90%, +0.01% Subgroup size: 8648 -> 8640 (-0.09%) Cycle count: 150154496 -> 150158806 (+0.00%); split: -0.23%, +0.24% Spill count: 181 -> 183 (+1.10%) Fill count: 440 -> 428 (-2.73%) Max live registers: 33698 -> 34586 (+2.64%); split: -0.02%, +2.65% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	65eb7ed5fc	intel/brw: Run intel_nir_lower_conversions only after brw_nir_optimize Without this, the next commit tiggers assertions. v2: Unconditionally do the lowering after brw_nir_optimize. Suggested by Caio. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Jesse Natalie	03655dfda1	compiler, vk: Support subgroup size of 4 Relax the assert and assign it an enum value Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30876>	2024-08-29 03:30:31 +00:00
Kenneth Graunke	6a292c2699	intel: Fix bad align_offset on global_constant_uniform_block_intel We were specifying align_offset = 64 and align_mul = 64, which is invalid. nir_combined_align() asserts that align_offset < align_mul. Our intention here is to perform cacheline-aligned (64B-aligned) block loads, so we should set align_mul = 64 and can leave align_offset = 0. Fixes: `fbafa9cabd` ("intel/nir: remove load_global_const_block_intel intrinsic") Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30755>	2024-08-21 20:44:57 +00:00
Lionel Landwerlin	fbafa9cabd	intel/nir: remove load_global_const_block_intel intrinsic load_global_constant_uniform_block_intel is equivalent in terms of loading, then for the predicate we just do a bcsel afterward in places where that is required. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>	2024-08-16 11:12:39 +00:00
Ian Romanick	119801e647	intel/brw: Move fsat instructions closer to the source Intel GPUs have a saturate destination modifier, and brw_fs_opt_saturate_propagation tries to replace explicit saturate operations with this destination modifier. That pass is limited in several ways. If the source of the explicit saturate is in a different block or if the source of the explicit saturate is live after the explicit saturate, brw_fs_opt_saturate_propagation will be unable to make progress. This optimization exists to help brw_fs_opt_saturate_propagation make more progress. It tries to move NIR fsat instructions to the same block that contains the definition of its source. It does this only in cases where it will not create additional live values. It also attempts to do this only in cases where the explicit saturate will ultimiately be converted to a destination modifier. v2: Fix metadata_preserve when theres no progress and use nir_metadata_control_flow when there is progress. All suggested by Alyssa. v3: Fix a typo in the file header comment. Noticed by Ken. Don't require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead of open-coding something slightly more specific. Both suggested by Ken. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19733645 -> 19733028 (<.01%) instructions in affected programs: 193300 -> 192683 (-0.32%) helped: 246 HURT: 1 helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2 helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for instructions value: -2.87 -2.13 95% mean confidence interval for instructions %-change: -0.34% -0.32% Instructions are helped. total cycles in shared programs: 916180971 -> 916264656 (<.01%) cycles in affected programs: 30197180 -> 30280865 (0.28%) helped: 194 HURT: 142 helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19 helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23% HURT stats (abs) min: 1 max: 28058 x̄: 1781.68 x̃: 399 HURT stats (rel) min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63% 95% mean confidence interval for cycles value: -196.84 694.97 95% mean confidence interval for cycles %-change: -0.17% 1.27% Inconclusive result (value mean confidence interval includes 0). fossil-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00% Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02% Max live registers: 32013312 -> 32013549 (+0.00%) Max dispatch width: 5512304 -> 5512136 (-0.00%) Totals from 774 (0.12% of 630172) affected shaders: Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01% Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30% Max live registers: 82195 -> 82432 (+0.29%) Max dispatch width: 6664 -> 6496 (-2.52%) Ice Lake Totals: Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00% Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01% Max live registers: 32471367 -> 32471603 (+0.00%) Max dispatch width: 5623752 -> 5623712 (-0.00%) Totals from 733 (0.12% of 635598) affected shaders: Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01% Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64% Max live registers: 72067 -> 72303 (+0.33%) Max dispatch width: 6216 -> 6176 (-0.64%) Skylake Totals: Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01% Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00% Totals from 659 (0.11% of 625670) affected shaders: Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00% Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41% Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:10 -07:00
Kenneth Graunke	b6f4f64b43	intel/brw: Drop image_{load,store}_raw_intel handling Gfx8 required us to emulate image load store with untyped messages, whereas Gfx9 just has typed message support for everything. brw no longer supports Gfx8, so all of this code is effectively dead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>	2024-08-09 07:20:08 +00:00
Alyssa Rosenzweig	d99c2ef059	nir/opt_uniform_atomics: add fs atomics predicated? flag on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in this pass is redundant. and would cause a problem since we'd then have to lower the "is helper inv?" flag late. so just skip the extra lowering code. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:17 -04:00
Kenneth Graunke	7c579f448f	intel/brw: Mark all UBO access with a direct buffer index as speculative UBO loads with a non-indirect buffer index should be safe to perform speculatively. With a direct offset, we may sometimes turn them into push constants, at which point it's just reading a register with no cost at all. Otherwise, we access them via messages that use surface state, and automatically perform bounds checking. So we shouldn't have any issues with reading out of bounds and page faulting, for example. This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics, so we can turn simple if's with loads on both sides to bcsels. In some cases this can collapse a surprising amount of control flow, allowing other optimizations to work better. The i965 OpenGL driver used load_uniform intrinsics, which are allowed in NIR's peephole select pass. But iris uses the Gallium NIR pass that translates uniforms to loads from UBO 0, so we haven't been able to take advantage of NIR's peephole select pass there. The backend pass was still able to handle this to some extent, however. fossil-db results on Alchemist: Totals: Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00% Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00% Send messages: 7416330 -> 7416261 (-0.00%) Spill count: 52471 -> 52473 (+0.00%) Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00% Scratch Memory Size: 3197952 -> 3198976 (+0.03%) Totals from 1848 (0.29% of 630003) affected shaders: Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02% Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03% Send messages: 59829 -> 59760 (-0.12%) Spill count: 3870 -> 3872 (+0.05%) Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02% Scratch Memory Size: 174080 -> 175104 (+0.59%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Sushma Venkatesh Reddy	0116430d39	intel/brw: Handle 16-bit sampler return payloads API requires samplers to return 32-bit even though hardware can handle 16-bit floating point, so we detect that case and make more efficient use of memory BW. This is helping improve performance of encode and decode tokens during LLM by at least 5% across multiple platforms. Thank you Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00
Marek Olšák	b2d32ae246	nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag Instead of having 1 bit in nir_io_semantics indicating a per-primitive FS input, add a dedicated intrinsic for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>	2024-07-23 16:13:16 +00:00
Qiang Yu	3151f5ec47	nir: add filter parameter to nir_lower_array_deref_of_vec To be used by latter commits to limit the lowering to specific variables. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29799>	2024-07-03 02:06:56 +00:00
Francisco Jerez	e8007c9325	intel/fs/xe2+: Don't lower barycentric load offsets to fixed-point format on Xe2+. Floating-point offsets work fine in combination with the floating-point arithmetic we're about to lower these intrinsics into, and they require less instructions than converting to fixed-point and then back. No reason to take the precision/range hit nor the extra instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Alyssa Rosenzweig	da752ed7c1	treewide: use nir_def_replace sometimes Two Coccinelle patches here. Didn't catch nearly as much as I would've liked but it's a start. Coccinelle patch: @@ expression intr, repl; @@ -nir_def_rewrite_uses(&intr->def, repl); -nir_instr_remove(&intr->instr); +nir_def_replace(&intr->def, repl); Coccinelle patch: @@ identifier intr; expression instr, repl; @@ nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr); ... -nir_def_rewrite_uses(&intr->def, repl); -nir_instr_remove(instr); +nir_def_replace(&intr->def, repl); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> [etna] Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> [r300] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29817>	2024-06-21 15:36:56 +00:00
Alyssa Rosenzweig	15257b65c6	treewide: use nir_metadata_control_flow Via Coccinelle patch: @@ @@ -nir_metadata_block_index \| nir_metadata_dominance +nir_metadata_control_flow ...plus some manual fixups for call sites missed by coccinelle. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>	2024-06-17 16:28:14 -04:00
Ian Romanick	7b7e5cf5d4	nir/algebraic: intel/fs: Optimize some patterns before lowering 64-bit integers v2: Add some comments explaining some of the nuance of the shift optimizations. Fix a bug in the shift count calculation of the upper 32-bits. Move the @64 from the variable to the opcode. All suggested by Jordan. No shader-db changes on any Intel platform. fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154507026 -> 154506576 (-0.00%) Cycle count: 17436298868 -> 17436295016 (-0.00%) Max live registers: 32635309 -> 32635297 (-0.00%) Totals from 42 (0.01% of 632575) affected shaders: Instrs: 5616 -> 5166 (-8.01%) Cycle count: 133680 -> 129828 (-2.88%) Max live registers: 1158 -> 1146 (-1.04%) No fossil-db changes on any other Intel platform. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>	2024-05-31 09:13:23 -07:00
Lionel Landwerlin	9a36278475	intel/nir: add printf lowering Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>	2024-05-15 13:13:38 +00:00
Ian Romanick	3f151c03af	intel/brw: Handle fsign optimization in a NIR algebraic pass This is a lot less code, and it makes it easier to experiment with other pattern-based optimizations in the future. The results here are nearly identical to the results I got from Ken's "intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not particularly good. In this commit and in Ken's, all of the shader-db shaders hurt for spills and fills are from Deus Ex Mankind Divided. Each shader has a bunch of texture instructions with a single fsign between the blocks. With the dependency on the flag removed, the scheduler puts all of the texture instructions at the start... and there are a LOT of them. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19647060 -> 19650207 (0.02%) instructions in affected programs: 734718 -> 737865 (0.43%) helped: 382 / HURT: 1984 total cycles in shared programs: 823238442 -> 822785913 (-0.05%) cycles in affected programs: 426901157 -> 426448628 (-0.11%) helped: 3408 / HURT: 3671 total spills in shared programs: 3887 -> 3891 (0.10%) spills in affected programs: 256 -> 260 (1.56%) helped: 0 / HURT: 4 total fills in shared programs: 3236 -> 3306 (2.16%) fills in affected programs: 882 -> 952 (7.94%) helped: 0 / HURT: 12 LOST: 37 GAINED: 34 fossil-db: DG2 and Meteor Lake had similar results. (Meteor Lake shown) Totals: Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00% Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04% Spill count: 142078 -> 142090 (+0.01%) Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01% Max live registers: 32593578 -> 32593858 (+0.00%) Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01% Totals from 5867 (0.93% of 631350) affected shaders: Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09% Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39% Spill count: 26411 -> 26423 (+0.05%) Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04% Max live registers: 431561 -> 431841 (+0.06%) Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63% Tiger Lake Totals: Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00% Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03% Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01% Max live registers: 32249201 -> 32249464 (+0.00%) Max dispatch width: 5540608 -> 5540584 (-0.00%) Totals from 5862 (0.93% of 630309) affected shaders: Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10% Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72% Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08% Max live registers: 424560 -> 424823 (+0.06%) Max dispatch width: 50304 -> 50280 (-0.05%) Ice Lake Totals: Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00% Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00% Spill count: 60087 -> 60090 (+0.00%) Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00% Max live registers: 32605215 -> 32605495 (+0.00%) Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00% Totals from 5882 (0.93% of 634934) affected shaders: Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10% Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05% Spill count: 10278 -> 10281 (+0.03%) Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01% Max live registers: 424184 -> 424464 (+0.07%) Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18% Skylake Totals: Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00% Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00% Spill count: 58176 -> 58172 (-0.01%) Fill count: 95877 -> 95796 (-0.08%) Max live registers: 31924594 -> 31924874 (+0.00%) Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00% Totals from 5789 (0.93% of 625512) affected shaders: Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10% Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05% Spill count: 9248 -> 9244 (-0.04%) Fill count: 19677 -> 19596 (-0.41%) Max live registers: 415340 -> 415620 (+0.07%) Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Ian Romanick	24cdbbdaa2	intel/brw: Delete stray nir_opt_dce No shader-db or fossil-db changes on any Intel platform. Fixes: `f76f4be301` ("intel/compiler: move gen5 final pass to actually be final pass") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>	2024-04-04 23:42:27 +00:00
Ian Romanick	6377e8fd29	intel/brw: Don't call nir_opt_remove_phis before nir_convert_from_ssa Per discussion in #10727, removing phis breaks LCSSA form which in turn invalidates divergence analysis. shader-db: All Skylake and newer platforms had similar results. (Ice Lake shown) total instructions in shared programs: 20299612 -> 20299695 (<.01%) instructions in affected programs: 20829 -> 20912 (0.40%) helped: 6 / HURT: 13 total cycles in shared programs: 842149085 -> 842148399 (<.01%) cycles in affected programs: 15146222 -> 15145536 (<.01%) helped: 40 / HURT: 45 fossil-db: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165505077 -> 165505603 (+0.00%); split: -0.00%, +0.00% Cycles: 15144183575 -> 15144235695 (+0.00%); split: -0.00%, +0.00% Spill count: 45213 -> 45220 (+0.02%) Fill count: 74166 -> 74184 (+0.02%) Totals from 94 (0.01% of 656116) affected shaders: Instrs: 263079 -> 263605 (+0.20%); split: -0.00%, +0.20% Cycles: 28411487 -> 28463607 (+0.18%); split: -0.18%, +0.37% Spill count: 3474 -> 3481 (+0.20%) Fill count: 6713 -> 6731 (+0.27%) Fixes: `6dbb5f1e07` ("intel/fs: rerun divergence analysis prior to convert_from_ssa") Closes: #10727 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>	2024-04-04 23:42:27 +00:00
Dylan Baker	75ede9d9bc	intel/brw: track last successful pass and leave the loop early This is similar to what RADV implements using the NIR_LOOP_PASS helpers. I have not used those helpers for a couple of reasons: 1. They use the pointer to the optimization function, which doesn't work if the same function is called multiple times in one invocation of the loop (fixable) 2. After fixing them, due to Intel's use of sub-expressions, the amount of code added to wrap the shared macro becomes more than simply reimplementing them for the Intel compiler On most workloads the results are a wash, but on compile heavy workloads like Cyberpunk 2077 and Rise of the Tomb Raider, I saw fossil-db runtimes fall by 1-2% on my ICL, with no changes to the compiled shaders. Caio saw closer to 2.5% on TGL. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27510>	2024-03-21 23:02:32 +00:00
Alyssa Rosenzweig	a6123a80da	nir/opt_shrink_vectors: shrink some intrinsics from start If the backend supports it, intrinsics with a component() are straightforward to shrink from the start. Notably helps vectorized I/O. v2: add an option for this and enable only on grown up backends, because some backends ignore the component() parameter. RADV GFX11: Totals from 921 (1.16% of 79439) affected shaders: Instrs: 616558 -> 615529 (-0.17%); split: -0.30%, +0.14% CodeSize: 3099864 -> 3095632 (-0.14%); split: -0.25%, +0.11% Latency: 2177075 -> 2160966 (-0.74%); split: -0.79%, +0.05% InvThroughput: 299997 -> 298664 (-0.44%); split: -0.47%, +0.02% VClause: 16343 -> 16395 (+0.32%); split: -0.01%, +0.32% SClause: 10715 -> 10714 (-0.01%) Copies: 24736 -> 24701 (-0.14%); split: -0.37%, +0.23% PreVGPRs: 30179 -> 30173 (-0.02%) VALU: 353472 -> 353439 (-0.01%); split: -0.03%, +0.02% SALU: 40323 -> 40322 (-0.00%) VMEM: 25353 -> 25352 (-0.00%) AGX: total instructions in shared programs: 2038217 -> 2038049 (<.01%) instructions in affected programs: 10249 -> 10081 (-1.64%) total alu in shared programs: 1593094 -> 1592939 (<.01%) alu in affected programs: 7145 -> 6990 (-2.17%) total fscib in shared programs: 1589254 -> 1589102 (<.01%) fscib in affected programs: 7217 -> 7065 (-2.11%) total bytes in shared programs: 13975666 -> 13974722 (<.01%) bytes in affected programs: 65942 -> 64998 (-1.43%) total regs in shared programs: 592758 -> 591187 (-0.27%) regs in affected programs: 6936 -> 5365 (-22.65%) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28004>	2024-03-12 18:17:17 +00:00
Caio Oliveira	865ef36609	intel/brw: Remove brw_shader.h Find a better home for its existing content. Some functions are now just static functions at the usage sites. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>	2024-02-29 19:28:06 +00:00
Kenneth Graunke	5fbba530cf	intel/brw: Delete compiler->supports_shader_constants True for all drivers using this compiler. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27872>	2024-02-29 18:00:14 +00:00
Caio Oliveira	63a4a4400a	intel/brw: Remove edgeflag_is_last VS parameter Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	5a3f65e678	intel/brw: Remove unused attrib workarounds Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	d3e451780b	intel/brw: Inline brw_nir_apply_sampler_key code It doesn't use the prog_key anymore, so just move the nir_lower_tex call pass to the single callsite. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	a1e694a890	intel/brw: Remove Gfx8- code from NIR passes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	7c23b90537	intel/brw: Always use scalar shaders Remove scalar_stage[] array, since now it is always scalar. This removes any usage of vec4 shaders in brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:37 +00:00
Caio Oliveira	303fd4e935	intel/brw: Move type_size_* functions out of vec4-specific file Will make easier later to delete vec4 files. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:37 +00:00
Ian Romanick	535caaf3e0	nir: Optimize uniform iadd, fadd, and ixor reduction operations This adds optimizations for iadd, fadd, and ixor with reduce, inclusive scan, and exclusive scan. NOTE: The fadd and ixor optimizations had no shader-db or fossil-db changes on any Intel platform. NOTE 2: This change "fixes" arb_compute_variable_group_size-local-size and base-local-size.shader_test on DG2 and MTL. This is just changing the code path taken to not use whatever path was not working properly before. This is a subset of the things optimized by ACO. See also https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. The min, max, iand, and ior exclusive_scan optimizations are not implemented. Broadwell on shader-db is not happy. I have not investigated. v2: Silence some warnings about discarding const. v3: Rename mbcnt to count_active_invocations. Add a big comment explaining the differences between the two paths. Suggested by Rhys. shader-db: All Gfx9 and newer platforms had similar results. (Ice Lake shown) total instructions in shared programs: 20300384 -> 20299545 (<.01%) instructions in affected programs: 19167 -> 18328 (-4.38%) helped: 35 / HURT: 0 total cycles in shared programs: 842809750 -> 842766381 (<.01%) cycles in affected programs: 2160249 -> 2116880 (-2.01%) helped: 33 / HURT: 2 total spills in shared programs: 4632 -> 4626 (-0.13%) spills in affected programs: 206 -> 200 (-2.91%) helped: 3 / HURT: 0 total fills in shared programs: 5594 -> 5581 (-0.23%) fills in affected programs: 664 -> 651 (-1.96%) helped: 3 / HURT: 1 fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165551893 -> 165513303 (-0.02%) Cycles: 15132539132 -> 15125314947 (-0.05%); split: -0.05%, +0.00% Spill count: 45258 -> 45204 (-0.12%) Fill count: 74286 -> 74157 (-0.17%) Scratch Memory Size: 2467840 -> 2451456 (-0.66%) Totals from 712 (0.11% of 656120) affected shaders: Instrs: 598931 -> 560341 (-6.44%) Cycles: 184650167 -> 177425982 (-3.91%); split: -3.95%, +0.04% Spill count: 983 -> 929 (-5.49%) Fill count: 2274 -> 2145 (-5.67%) Scratch Memory Size: 52224 -> 35840 (-31.37%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 09:44:11 -08:00
Ian Romanick	c63ea755fe	intel/fs: Use nir_opt_uniform_subgroup shader-db: All Skylake and newer platforms had similar results. (Ice Lake shown) total instructions in shared programs: 20300435 -> 20300384 (<.01%) instructions in affected programs: 303 -> 252 (-16.83%) helped: 2 / HURT: 0 total cycles in shared programs: 842810326 -> 842809750 (<.01%) cycles in affected programs: 8374 -> 7798 (-6.88%) helped: 2 / HURT: 0 fossil-db: All Intel platforms (note below) had similar results. (Ice Lake shown) Instrs: 165559735 -> 165551893 (-0.00%) Cycles: 15133083961 -> 15132539132 (-0.00%); split: -0.00%, +0.00% Spill count: 45262 -> 45258 (-0.01%) Fill count: 74293 -> 74286 (-0.01%) Totals from 854 (0.13% of 656120) affected shaders: Instrs: 3461998 -> 3454156 (-0.23%) Cycles: 154252729 -> 153707900 (-0.35%); split: -0.36%, +0.01% Spill count: 2655 -> 2651 (-0.15%) Fill count: 3881 -> 3874 (-0.18%) DG2 did not see changes in spills or fills. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 08:38:45 -08:00

1 2 3 4 5 ...

392 commits