fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 17:50:12 +01:00

Author	SHA1	Message	Date
Rohan Garg	aa9244c8f6	intel/brw: update Xe2 max SIMD message sizes All the non-transpose messages are SIMD 1,2,4,8,16,32 capable (BSpec 57330) Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29212>	2024-05-15 12:02:02 +00:00
Iván Briano	a9f24fb5f1	intel/brw: fix subgroup size of geometry stages for lnl+ Fixes dEQP-VK.subgroups.size_control.allow_varying_subgroup_size and maybe others checking subgroup size. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29177>	2024-05-14 23:13:37 +00:00
Ian Romanick	97e3c6a12a	intel/brw: Use range analysis to optimize fsign shader-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19674784 -> 19665960 (-0.04%) instructions in affected programs: 933425 -> 924601 (-0.95%) helped: 3656 / HURT: 0 total cycles in shared programs: 810343919 -> 810241030 (-0.01%) cycles in affected programs: 56752034 -> 56649145 (-0.18%) helped: 3032 / HURT: 434 LOST: 11 GAINED: 0 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20315795 -> 20305856 (-0.05%) instructions in affected programs: 979698 -> 969759 (-1.01%) helped: 3845 / HURT: 0 total cycles in shared programs: 830600281 -> 830534694 (<.01%) cycles in affected programs: 45675615 -> 45610028 (-0.14%) helped: 3250 / HURT: 325 total spills in shared programs: 4583 -> 4565 (-0.39%) spills in affected programs: 180 -> 162 (-10.00%) helped: 3 / HURT: 0 total fills in shared programs: 5245 -> 5219 (-0.50%) fills in affected programs: 379 -> 353 (-6.86%) helped: 3 / HURT: 0 LOST: 14 GAINED: 8 fossil-db: All Intel platforms except Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 154024263 -> 154023814 (-0.00%) Cycle count: 17463341602 -> 17461726239 (-0.01%); split: -0.01%, +0.00% Totals from 322 (0.05% of 631440) affected shaders: Instrs: 199933 -> 199484 (-0.22%) Cycle count: 168492537 -> 166877174 (-0.96%); split: -0.96%, +0.00% Tiger Lake Instrs: 149984723 -> 149984287 (-0.00%) Cycle count: 15238596937 -> 15239260415 (+0.00%); split: -0.00%, +0.01% Max dispatch width: 5553408 -> 5553424 (+0.00%) Totals from 318 (0.05% of 631414) affected shaders: Instrs: 179624 -> 179188 (-0.24%) Cycle count: 160724533 -> 161388011 (+0.41%); split: -0.06%, +0.48% Max dispatch width: 3296 -> 3312 (+0.49%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:21 +00:00
Ian Romanick	e578657313	intel/brw: Implement more strictly correct fsign lowering The huge amount of helped shaders is due to the "~" versions of the patterns. shader-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19672345 -> 19662605 (-0.05%) instructions in affected programs: 1147766 -> 1138026 (-0.85%) helped: 2691 / HURT: 1650 total cycles in shared programs: 810323688 -> 810145191 (-0.02%) cycles in affected programs: 68918312 -> 68739815 (-0.26%) helped: 3651 / HURT: 1832 LOST: 29 GAINED: 38 Tiger Lake total instructions in shared programs: 19489619 -> 19479909 (-0.05%) instructions in affected programs: 1124564 -> 1114854 (-0.86%) helped: 2682 / HURT: 1643 total cycles in shared programs: 811468406 -> 811706747 (0.03%) cycles in affected programs: 66397690 -> 66636031 (0.36%) helped: 3692 / HURT: 1775 total spills in shared programs: 3906 -> 3907 (0.03%) spills in affected programs: 16 -> 17 (6.25%) helped: 0 / HURT: 1 total fills in shared programs: 3220 -> 3222 (0.06%) fills in affected programs: 50 -> 52 (4.00%) helped: 0 / HURT: 1 LOST: 33 GAINED: 36 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20317882 -> 20307495 (-0.05%) instructions in affected programs: 1199651 -> 1189264 (-0.87%) helped: 2863 / HURT: 1680 total cycles in shared programs: 830880024 -> 830457927 (-0.05%) cycles in affected programs: 63347102 -> 62925005 (-0.67%) helped: 4118 / HURT: 1622 total spills in shared programs: 4593 -> 4583 (-0.22%) spills in affected programs: 205 -> 195 (-4.88%) helped: 4 / HURT: 0 total fills in shared programs: 5284 -> 5245 (-0.74%) fills in affected programs: 464 -> 425 (-8.41%) helped: 4 / HURT: 0 LOST: 70 GAINED: 33 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 154025275 -> 154022035 (-0.00%); split: -0.00%, +0.00% Cycle count: 17472869499 -> 17463289530 (-0.05%); split: -0.06%, +0.00% Spill count: 141269 -> 141246 (-0.02%); split: -0.02%, +0.00% Fill count: 265342 -> 265159 (-0.07%); split: -0.11%, +0.04% Max live registers: 32597829 -> 32597986 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5536776 -> 5537048 (+0.00%) Totals from 1590 (0.25% of 631423) affected shaders: Instrs: 1146532 -> 1143292 (-0.28%); split: -0.44%, +0.16% Cycle count: 1230843330 -> 1221263361 (-0.78%); split: -0.83%, +0.05% Spill count: 15832 -> 15809 (-0.15%); split: -0.19%, +0.04% Fill count: 36071 -> 35888 (-0.51%); split: -0.79%, +0.29% Max live registers: 93529 -> 93686 (+0.17%); split: -0.00%, +0.17% Max dispatch width: 15168 -> 15440 (+1.79%) Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 149564084 -> 149562467 (-0.00%); split: -0.00%, +0.00% Cycle count: 15151701515 -> 15158290114 (+0.04%); split: -0.00%, +0.04% Max live registers: 32249443 -> 32249620 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5540536 -> 5540488 (-0.00%) Totals from 1605 (0.25% of 630303) affected shaders: Instrs: 584950 -> 583333 (-0.28%); split: -0.49%, +0.21% Cycle count: 160926321 -> 167514920 (+4.09%); split: -0.05%, +4.14% Max live registers: 90851 -> 91028 (+0.19%); split: -0.00%, +0.20% Max dispatch width: 15440 -> 15392 (-0.31%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	864268ff0d	intel/brw: Algebraic optimizations for CSEL No shader-db or fossil-db changes on any Intel platform. In this MR, the only benefit of these changes is to convert some "-a > 0" CSEL comparisons to "a < 0" for improved readability. v2: Add integer CSEL support v3: Use fs_inst::resize_sources and brw_type_is_sint. Both suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	033405cd4b	intel/brw: Combine constants and constant propagation for CSEL No shader-db or fossil-db changes on any Intel platform. This ends up begin helpful in "intel/brw: Use range analysis to optimize fsign." v2: Add integer CSEL support v3: Massive simplification (-20 lines!) of constant propagation logic. Suggested by Ken. Add missing CSEL case in supports_src_as_imm. Noticed by Ken. v4: While MAD can mix F and HF sources on some platforms, CSEL cannot. Found by skqp on TGL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	504b742b83	intel/brw: Update CSEL source type validation Gfx9 can only have F, but newer GPUs can have F, HF, D, or W. The source and destination types must still match in size. v2: Simplify the float vs integer logic. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	3f151c03af	intel/brw: Handle fsign optimization in a NIR algebraic pass This is a lot less code, and it makes it easier to experiment with other pattern-based optimizations in the future. The results here are nearly identical to the results I got from Ken's "intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not particularly good. In this commit and in Ken's, all of the shader-db shaders hurt for spills and fills are from Deus Ex Mankind Divided. Each shader has a bunch of texture instructions with a single fsign between the blocks. With the dependency on the flag removed, the scheduler puts all of the texture instructions at the start... and there are a LOT of them. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19647060 -> 19650207 (0.02%) instructions in affected programs: 734718 -> 737865 (0.43%) helped: 382 / HURT: 1984 total cycles in shared programs: 823238442 -> 822785913 (-0.05%) cycles in affected programs: 426901157 -> 426448628 (-0.11%) helped: 3408 / HURT: 3671 total spills in shared programs: 3887 -> 3891 (0.10%) spills in affected programs: 256 -> 260 (1.56%) helped: 0 / HURT: 4 total fills in shared programs: 3236 -> 3306 (2.16%) fills in affected programs: 882 -> 952 (7.94%) helped: 0 / HURT: 12 LOST: 37 GAINED: 34 fossil-db: DG2 and Meteor Lake had similar results. (Meteor Lake shown) Totals: Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00% Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04% Spill count: 142078 -> 142090 (+0.01%) Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01% Max live registers: 32593578 -> 32593858 (+0.00%) Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01% Totals from 5867 (0.93% of 631350) affected shaders: Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09% Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39% Spill count: 26411 -> 26423 (+0.05%) Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04% Max live registers: 431561 -> 431841 (+0.06%) Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63% Tiger Lake Totals: Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00% Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03% Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01% Max live registers: 32249201 -> 32249464 (+0.00%) Max dispatch width: 5540608 -> 5540584 (-0.00%) Totals from 5862 (0.93% of 630309) affected shaders: Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10% Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72% Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08% Max live registers: 424560 -> 424823 (+0.06%) Max dispatch width: 50304 -> 50280 (-0.05%) Ice Lake Totals: Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00% Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00% Spill count: 60087 -> 60090 (+0.00%) Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00% Max live registers: 32605215 -> 32605495 (+0.00%) Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00% Totals from 5882 (0.93% of 634934) affected shaders: Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10% Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05% Spill count: 10278 -> 10281 (+0.03%) Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01% Max live registers: 424184 -> 424464 (+0.07%) Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18% Skylake Totals: Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00% Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00% Spill count: 58176 -> 58172 (-0.01%) Fill count: 95877 -> 95796 (-0.08%) Max live registers: 31924594 -> 31924874 (+0.00%) Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00% Totals from 5789 (0.93% of 625512) affected shaders: Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10% Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05% Spill count: 9248 -> 9244 (-0.04%) Fill count: 19677 -> 19596 (-0.41%) Max live registers: 415340 -> 415620 (+0.07%) Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	cd343fb9ac	intel/brw: Add support for fcsel opcodes Don't enable nir_opt_algebraic to generate these opcodes yet. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	d51ad9f4e0	intel/brw: Use fs_inst::resize_sources in brw_fs_opt_algebraic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	11c6b6c102	intel/elk: Remove dsign optimization This bit from the comment should have been a big red flag: There are currently zero instances of fsign(double(x))IMM in shader-db or any test suite, so it is hard to care at this time. The implementation of that path was incorrect. The XOR instructions should be predicated like the OR instruction in the non-multiplication path. As a result, dsign(zero_value) x will not produce the correct result. Instead of fixing this code that is never exercised by anything, replace it with the simple lowering in NIR. Ironically, the vec4 implementation is correct. The odds of encountering an application that is performace limited by dsign performance in vertex processing stages on Ivy Bridge or Haswell is infinitesimal. No shader-db changes on any Intel platform. v2: Delete 's' in emit_fsign as it is now unused. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Ian Romanick	ded8690336	intel/brw: Remove dsign optimization This bit from the comment should have been a big red flag: There are currently zero instances of fsign(double(x))IMM in shader-db or any test suite, so it is hard to care at this time. The implementation of that path was incorrect. The XOR instructions should be predicated like the OR instruction in the non-multiplication path. As a result, dsign(zero_value) x will not produce the correct result. Instead of fixing this code that is never exercised by anything, replace it with the simple lowering in NIR. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>	2024-05-14 01:28:20 +00:00
Caio Oliveira	b8dbd64267	intel/brw: Fix commas when dumping instructions Some commas were being skipped, according to history as an attempt to elide BAD_FILEs, but we still print them, so be consistent. Also for instructions without any sources, the trailing comma was always being printed. Fix that too. Example of instruction output before the change halt_target(8) (null):UD, send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD(null):UD NoMask and after it halt_target(8) (null):UD send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD, (null):UD NoMask Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>	2024-05-11 02:17:57 +00:00
Caio Oliveira	c9fe20fdf1	intel/brw: Use `vNN` instead of `vgrfNN` when printing instructions Reduce the noise in the shader dump output. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>	2024-05-11 02:17:56 +00:00
Caio Oliveira	3a081106b0	intel/brw: Hide register pressure information in dumps It was the default to show register pressure for each instruction, but it gets in the way of cleaner diffs before/after an optimization pass. Add INTEL_DEBUG=reg-pressure option to show it again. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>	2024-05-11 02:17:56 +00:00
Caio Oliveira	866b1245e9	intel/brw: Don't print IP as part of the dump The sequential IP cause noise when diffing before/after a pass that either add or remove instructions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>	2024-05-11 02:17:56 +00:00
Lionel Landwerlin	fd47f90d37	brw: drop dependency on libintel_common Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11136 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29128>	2024-05-11 01:52:01 +00:00
Lionel Landwerlin	d1c01e256d	brw: add more condition for reducing sampler simdness Running KHR-GL46.sparse_texture_clamp_tests.SparseTextureClampLookupColor test with Zink on Anv we run into an assert : assert(inst->mlen <= MAX_SAMPLER_MESSAGE_SIZE * reg_unit(devinfo)); Turns out we've not covered all the cases in the SIMD lowering. It's a bit of a shame to have both files reproduce the same logic. Will try to think of a better way to extract the layout of the a send message but that'll be a much bigger rework. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29118>	2024-05-10 19:40:00 +00:00
Sagar Ghuge	69fc7ee622	intel/disasm: Fix cache load/store disassembly for URB messages Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28868>	2024-05-09 19:45:18 +00:00
Faith Ekstrand	9d5b4a4ffd	intel/kernel: Use the new capabilities struct Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Iván Briano <ivan.briano@intel.com> Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28905>	2024-05-09 01:14:23 +00:00
Sagar Ghuge	e32828f5fc	intel/compiler: Fix destination type for CMP/CMPN For CMP/CMPN, use src0 type if destination is null otherwise get the src0 type register with destination register size. This fixes dEQP-VK.glsl.builtin_var.frontfacing.* tests cases on Xe2+. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28679>	2024-05-06 21:46:18 +00:00
Ian Romanick	0fa17962d6	intel/elk: Fix optimize_extract_to_float for i2f of unsigned extract Fixes fs-uint-to-float-of-extract-int8.shader_test and fs-uint-to-float-of-extract-int16.shader_test added by piglit!883. v2: Expand the comment explaining the potential problem. Suggested by Caio. Fixes: `e6022281f2` ("intel/elk: Rename files to use elk prefix") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>	2024-05-03 15:01:43 -07:00
Ian Romanick	fc2360167c	intel/brw: Avoid optimize_extract_to_float when it will just be undone later v2: Add bspec quotation. Suggested by Caio. With better understand of the restriction, only apply on DG2 and newer platforms. shader-db: DG2 and Meteor Lake had similar results. (DG2 shown) total instructions in shared programs: 19659363 -> 19659360 (<.01%) instructions in affected programs: 2484 -> 2481 (-0.12%) helped: 6 / HURT: 1 total cycles in shared programs: 823445738 -> 823432524 (<.01%) cycles in affected programs: 2619836 -> 2606622 (-0.50%) helped: 48 / HURT: 63 fossil-db: DG2 and Meteor Lake had similar results. (DG2 shown) Totals: Instrs: 154015863 -> 153987806 (-0.02%); split: -0.02%, +0.00% Cycle count: 17552172994 -> 17562047866 (+0.06%); split: -0.13%, +0.19% Spill count: 142124 -> 141544 (-0.41%); split: -0.54%, +0.13% Fill count: 266803 -> 266046 (-0.28%); split: -0.38%, +0.09% Scratch Memory Size: 10266624 -> 10271744 (+0.05%); split: -0.02%, +0.07% Max live registers: 32592428 -> 32592393 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5535944 -> 5535912 (-0.00%); split: +0.00%, -0.00% Totals from 41887 (6.63% of 631367) affected shaders: Instrs: 32971032 -> 32942975 (-0.09%); split: -0.10%, +0.01% Cycle count: 3892086217 -> 3901961089 (+0.25%); split: -0.60%, +0.85% Spill count: 105669 -> 105089 (-0.55%); split: -0.72%, +0.18% Fill count: 206459 -> 205702 (-0.37%); split: -0.49%, +0.12% Scratch Memory Size: 7766016 -> `7771136` (+0.07%); split: -0.03%, +0.09% Max live registers: 3230515 -> 3230480 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 337232 -> 337200 (-0.01%); split: +0.00%, -0.01% No shader-db or fossil-db changes on any earlier Intel platforms. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>	2024-05-03 15:01:43 -07:00
Ian Romanick	bf5d82654a	intel/brw: Fix optimize_extract_to_float for i2f of unsigned extract Fixes fs-uint-to-float-of-extract-int8.shader_test and fs-uint-to-float-of-extract-int16.shader_test added by piglit!883. No shader-db or fossil-db changes on any Intel platform. v2: Expand the comment explaining the potential problem. Suggested by Caio. Fixes: `29ce110be6` ("i965/fs: Remove extract virtual opcodes.") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>	2024-05-03 15:01:43 -07:00
Kenneth Graunke	8d983b3425	intel/nir: Set src_type on TCS quads workaround store_output We weren't setting this and now it's validated, causing assert failures. Fixes: `1632948a76` ("nir: validate src_type of store_output intrinsics, require bit_size >= 16") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11107 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29027>	2024-05-02 13:58:21 -07:00
Kenneth Graunke	84139470a5	intel/brw: Use VEC for emit_unzip() Helps make SIMD-split code more SSA-friendly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:54 -07:00
Kenneth Graunke	1b54b4fad5	intel/brw: Use VEC for NIR vec*() sources This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:50 -07:00
Kenneth Graunke	d4563747d9	intel/brw: Use VEC for output stores This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:49 -07:00
Kenneth Graunke	f0c29c9b71	intel/brw: Use VEC for FS outputs This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:49 -07:00
Kenneth Graunke	cbe7a13f2b	intel/brw: Use VEC for TCS/TES/GS input/output loads This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:48 -07:00
Kenneth Graunke	a94e1bd0ac	intel/brw: Use VEC for gl_FragCoord This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:47 -07:00
Kenneth Graunke	d0a24496fd	intel/brw: Use VEC for load_const This writes the whole destination register in a single builder call. Eventually, VEC will write the whole destination register in one go, allowing better visibility into how it is defined. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:45 -07:00
Kenneth Graunke	3c867bf2c7	intel/brw: Add a new VEC() helper. This gathers a number of sources into a contiguous vector register. Eventually, the plan is that it will use a MOV for a single source, or LOAD_PAYLOAD for multiple sources. For now, it emits a series of MOVs to allow us to rewrite a bunch of existing code to use the new helper, then change them all over at once later. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:42 -07:00
Kenneth Graunke	c194df565a	intel/brw: Don't include unnecessary undefined values in texture results When emitting a sampler message, we allocate a temporary destination large enough to hold 4 values (or 5 for sparse). This is the maximum size needed to hold any result. However, we shrink the size written by the sampler message to skip writing any trailing components that NIR tells us are never read. So we may not write the entire temporary. The NIR texture instruction has a destination VGRF which is sized assuming that all components are present. We issue a LOAD_PAYLOAD instruction to copy our sampler result temporary to the NIR destination. When we reduce the response length of the sampler messages, then some of these temporary components have undefined values. The correct way to indicate that is by using a BAD_FILE source. Unfortunately, we were naively reading offsets of the temporary that were never written, but are still part of a larger VGRF. This complicates things. For example, sampling and only using RGB (not RGBA) was producing this: txl_logical(8) (written: 3) vgrf3+0.0:F, ... undef(8) (written: 4) vgrf4:UD load_payload(8) (written: 4) vgrf4:F, vgrf3+0.0:F, vgrf3+1.0:F, vgrf3+2.0:F, vgrf3+3.0:F The last source, vgrf3+3.0:F, is undefined, and should be BAD_FILE. Doing so allows VGRF splitting and other optimizations to work better. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:41 -07:00
Kenneth Graunke	e42914529a	intel/brw: Support CSE on more ops This has no changes in shader-db or fossil-db, surprisingly, but at least CSEL will be useful shortly. Presumably the others may matter somewhere. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:40 -07:00
Kenneth Graunke	ed3e4c16dc	intel/brw: Do not create empty basic blocks when removing instructions If there's only a single instruction in a basic block, then removing it would create an empty block. We seem to have trouble representing those as there are no instructions with an IP inside the block; several places mess up connections. While most blocks end in control flow instructions (which are rarely eliminated), ones preceding a DO instruction may end in an ordinary instruction. This makes such blocks tricky to merge with adjacent blocks - they may be between loops. Any optimization pass may may find such an instruction and want to eliminate it, and most of them are unprepared to perform such CFG link surgery. Nor do we want to make every pass aware of this issue. To work around this, we simply replace an instruction with a NOP when removing it from a block containing only that instruction, leaving the block in place. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:39 -07:00
Kenneth Graunke	391da3610c	intel/brw: Print W/UW immediates correctly We were printing 24w as 0x180018d which not only scarily shows the wrong type, but also the replicated format of the word. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>	2024-04-30 17:16:33 -07:00
Kenneth Graunke	674e89953f	intel/brw: Use new builder helpers that allocate a VGRF destination With the previous commit, we now have new builder helpers that will allocate a temporary destination for us. So we can eliminate a lot of the temporary naming and declarations, and build up expressions. In a number of cases here, the code was confusingly mixing D-type addresses with UD-immediates, or expecting a UD destination. But the underlying values should always be positive anyway. To accomodate the type inference restriction that the base types much match, we switch these over to be purely UD calculations. It's cleaner to do so anyway. Compared to the old code, this may in some cases allocate additional temporary registers for subexpressions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>	2024-04-29 07:51:45 +00:00
Kenneth Graunke	4c2c49f7bc	intel/brw: Add builder helpers that allocate temporary destinations In many cases, we calculate an expression by generating a series of instructions. We'd either overwrite the same register repeatedly, or call vgrf(BRW_TYPE_X) repeatedly to allocate temporaries for each intermediate step. In many cases, we overwrote the same register simply because allocating and naming temporaries for each step was annoying. This commit adds new builder helpers that will allocate a temporary destination for you, using simple type interference: unary operations use the source type, and binary operations require a matching base type and return the largest of the two types. The helpers return the destination register, allowing us to write in an expression-tree style, chaining together builder operations to produce whole values. Sort of like nir_builder. We still optionally will write out the fs_inst pointer in case the caller wants to do things like set predicates or saturation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>	2024-04-29 07:51:45 +00:00
Kenneth Graunke	319ba85e10	intel/brw: Add builder helpers for math functions Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>	2024-04-29 07:51:45 +00:00
Kenneth Graunke	cf8ed9925f	intel/brw: Make a helper for finding the largest of two types Some instructions can operate on mixed types. Typically this is something like a binary operation with UD and UW sources resulting in a UD destination. In order to make it easier to find the result type of such operations, let's make a type helper that returns the larger of the two types (but requires the base type to match). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>	2024-04-29 07:51:45 +00:00
Kenneth Graunke	f5473e6edd	intel/brw: Don't use inst return value when it isn't needed We just want to emit an instruction, but we don't need to do anything further with it, so we don't need to store the resulting inst pointer anywhere. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>	2024-04-29 07:51:45 +00:00
Lionel Landwerlin	b80dd22d57	intel/brw: add min_sample_shading value in wm_prog_data Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>	2024-04-26 05:13:03 +00:00
Lionel Landwerlin	bdfa25dc77	intel/fs: decouple alphaToCoverage from per sample dispatch Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>	2024-04-26 05:13:03 +00:00
Lionel Landwerlin	1bbe2d9833	intel/brw: fixup wm_prog_data_barycentric_modes() Always select sample barycentric when persample dispatch is unknown at compile time and let the payload adjustments feed the expected value based on dispatch. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>	2024-04-26 05:13:02 +00:00
Kenneth Graunke	df6cfb4dd0	intel/brw: Rename brw_reg_type_to_hw_type to brw_type_encode And similarly brw_hw_type_to_reg_type to brw_type_decode. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	9205f6ff51	intel/brw: Combine a1/a16 3src type decoding functions Align16 is only used on Gfx9, while Align1 is used on Gfx11+. We can decode both kinds of encodings in the same function with a simple devinfo check. One snag is that the align16 encodings didn't have a separate exec_type field, but we can just pass 0. This lets us have a single function named brw_type_decode_for_3src, which is much less of a mouthful. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	28034aac34	intel/brw: Combine a1/a16 3src type encoding functions Align16 is only used on Gfx9, while Align1 is used on Gfx11+. We can handle both encodings in the same function with a simple devinfo check, and give that function a simple name like brw_type_encode_for_3src. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	545bb8fb6f	intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_* Both of these helpers do the same thing. We now have brw_type_size_bits and brw_type_size_bytes and can use whichever makes sense in that place. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	c22f44ff07	intel/brw: Replace brw_reg_type_from_bit_size by brw_type_with_size Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00

1 2 3 4 5 ...

3464 commits