fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 07:00:12 +01:00

Author	SHA1	Message	Date
Lionel Landwerlin	aa494cbacf	brw: align spilling offsets to physical register sizes In commit `fe3d90aedf` ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.") we aligned the width of scratch messages to physical register sizes (32B prior to Xe2, 64B for Xe2+). But our spilling offsets are computed using the register allocations sizes which are in units of 32B. That means on Xe2, you can end up spilling a virtual register allocated at 32B (which we use for surface state computations with exec_all) and then the spilling of that register will be emitted in SIMD16, having the upper 8 lanes overwriting the next spilled register. We could potentially limit spills to SIMD8 messages on Xe2 (only writing 32B of data), but we're also unlikely to have all 32B virtual register spilled next to one another. And if not tightly packed, we would have 64B registers stored on 2 different cachelines which sounds inefficient. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `fe3d90aedf` ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.") Backport-to: 24.2 Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30983>	2024-09-04 23:05:31 +00:00
Caio Oliveira	74be809237	compiler: Allow derivative_group to be used for all stages in shader_info These will now also be used by stages that have workgroups. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30950>	2024-09-03 20:03:18 +00:00
Caio Oliveira	3f6b5ea27a	intel/brw: Use linear walk when shader requires DERIVATIVE_GROUP_LINEAR Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30955>	2024-08-30 20:24:42 +00:00
Caio Oliveira	e4f090d3a6	intel/brw: Remove special treatment for 2-src in emit() helper For Gfx9+ no 2-src instructions need sources to fixed up. Special treatment remains for 3-src instructions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30911>	2024-08-30 04:33:47 +00:00
Ian Romanick	73f365e208	intel/brw: load_offset cannot be constant on this path Literally inside an if-statement (about 26 lines before this hunk) that checks for !nir_src_is_const(instr->src[1]). No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	fef175de09	intel/brw: Enable constant propagation for a couple more logical sends This prevents some regressions later in the MR. Once load_const operations are marked as is_scalar, they will cesase to get the automatic constant propagation that occurs in try_rebuild_source. No shader-db or fossil-db changes on any Intel platform. v2: Slightly relax source restrictions on SHADER_OPCODE_UNALIGNED_OWORD_BLOCK_READ_LOGICAL. Add a comment explaining the restriction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	c6a8b382fd	intel/brw: Relax is_partial_write check in cmod propagation The is_partial_write check is too strict because it tests two separate things. It tests whether or not the instruction always writes a value (i.e., is it predicated), and it tests whether or not the instruction writes a complete register. This latter check is problematic as it perevents cmod propagation in SIMD1, and it prevents cmod propagation in SIMD8 when the destination size is 16 bits. This check is unnecessary. Cmod propagation already checks that the region written and region read overlap. It also already checks that the execution sizes of the instructions match. Further restriction based on the specific parts of the register written only generates false negatives. v2: Relax all of the calls to is_partial_write. Suggested by Caio. No shader-db changes on any Intel platform. fossil-db: Meteor Lake Totals: Instrs: 151505520 -> 151502923 (-0.00%); split: -0.00%, +0.00% Cycle count: 17201385104 -> 17194901423 (-0.04%); split: -0.06%, +0.02% Spill count: 80827 -> 80837 (+0.01%) Fill count: 152693 -> 152692 (-0.00%); split: -0.01%, +0.01% Totals from 346 (0.05% of 630198) affected shaders: Instrs: 1257205 -> 1254608 (-0.21%); split: -0.21%, +0.00% Cycle count: 5532845647 -> 5526361966 (-0.12%); split: -0.18%, +0.06% Spill count: 32903 -> 32913 (+0.03%) Fill count: 64338 -> 64337 (-0.00%); split: -0.03%, +0.03% DG2 Totals: Instrs: 151531440 -> 151528055 (-0.00%); split: -0.00%, +0.00% Cycle count: 17200238927 -> 17197996676 (-0.01%); split: -0.03%, +0.02% Spill count: 81003 -> 80971 (-0.04%); split: -0.04%, +0.00% Fill count: 152975 -> 152912 (-0.04%); split: -0.05%, +0.01% Totals from 346 (0.05% of 630198) affected shaders: Instrs: 1260363 -> 1256978 (-0.27%); split: -0.27%, +0.00% Cycle count: 5532019670 -> 5529777419 (-0.04%); split: -0.09%, +0.05% Spill count: 33046 -> 33014 (-0.10%); split: -0.11%, +0.01% Fill count: 64581 -> 64518 (-0.10%); split: -0.13%, +0.03% Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 149972324 -> 149972289 (-0.00%) Cycle count: 15566495293 -> 15565151171 (-0.01%); split: -0.01%, +0.00% Totals from 16 (0.00% of 629912) affected shaders: Instrs: 351194 -> 351159 (-0.01%) Cycle count: 3922227030 -> 3920882908 (-0.03%); split: -0.04%, +0.00% Skylake Totals: Instrs: 140787999 -> 140787983 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665614947 -> 14665515855 (-0.00%); split: -0.00%, +0.00% Spill count: 58500 -> 58501 (+0.00%) Fill count: 102097 -> 102100 (+0.00%) Totals from 16 (0.00% of 625685) affected shaders: Instrs: 343560 -> 343544 (-0.00%); split: -0.01%, +0.01% Cycle count: 3354997898 -> 3354898806 (-0.00%); split: -0.01%, +0.01% Spill count: 16864 -> 16865 (+0.01%) Fill count: 27479 -> 27482 (+0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	13332c236b	intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup I observed some ray tracing shaders where a resource_intel inside a loop was non-uniform, and some code was lowered to account for that. Eventually the loop containing the resource_intel was unrolled, and the resource_intel became uniform. For example, nir_opt_uniform_subgroup can transform something like con loop { con block b5: // preds: b4 b8 con 32 %330 = @read_first_invocation (%329) con 1 %331 = ieq %330, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %330 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless\|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } into con loop { con block b5: // preds: b4 b8 con 1 %331 = ieq %329, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %329 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless\|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } Notice that %331 is now a tautology. Running brw_nir_optimize again eliminates the loop. v2: Add a comment in the code explaining the rationale. Suggested by Ken. Update the commit message. Suggested by Caio. shader-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19733448 -> 19733330 (<.01%) instructions in affected programs: 14120 -> 14002 (-0.84%) helped: 32 / HURT: 3 total cycles in shared programs: 916254496 -> 916226288 (<.01%) cycles in affected programs: 2035116 -> 2006908 (-1.39%) helped: 19 / HURT: 13 total spills in shared programs: 5807 -> 5807 (0.00%) spills in affected programs: 26 -> 26 (0.00%) helped: 1 / HURT: 1 total fills in shared programs: 6794 -> 6792 (-0.03%) fills in affected programs: 84 -> 82 (-2.38%) helped: 1 / HURT: 1 LOST: 1 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20393084 -> 20392971 (<.01%) instructions in affected programs: 21750 -> 21637 (-0.52%) helped: 31 / HURT: 4 total cycles in shared programs: 880273065 -> 880247818 (<.01%) cycles in affected programs: 2546748 -> 2521501 (-0.99%) helped: 18 / HURT: 9 total spills in shared programs: 4628 -> 4630 (0.04%) spills in affected programs: 287 -> 289 (0.70%) helped: 1 / HURT: 2 total fills in shared programs: 5381 -> 5376 (-0.09%) fills in affected programs: 711 -> 706 (-0.70%) helped: 2 / HURT: 2 LOST: 1 GAINED: 1 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151513669 -> 151505520 (-0.01%); split: -0.01%, +0.00% Send messages: 7459339 -> 7459396 (+0.00%) Loop count: 49111 -> 47588 (-3.10%) Cycle count: 17208178205 -> 17201385104 (-0.04%); split: -0.05%, +0.01% Spill count: 80830 -> 80827 (-0.00%); split: -0.02%, +0.01% Fill count: 152754 -> 152693 (-0.04%); split: -0.04%, +0.00% Scratch Memory Size: 4136960 -> 4130816 (-0.15%) Max live registers: 32016493 -> 32015955 (-0.00%); split: -0.00%, +0.00% Totals from 672 (0.11% of 630198) affected shaders: Instrs: 1352428 -> 1344279 (-0.60%); split: -0.78%, +0.17% Send messages: 54302 -> 54359 (+0.10%) Loop count: 6124 -> 4601 (-24.87%) Cycle count: 1260266379 -> 1253473278 (-0.54%); split: -0.69%, +0.16% Spill count: 15967 -> 15964 (-0.02%); split: -0.09%, +0.08% Fill count: 36245 -> 36184 (-0.17%); split: -0.18%, +0.01% Scratch Memory Size: 740352 -> 734208 (-0.83%) Max live registers: 50699 -> 50161 (-1.06%); split: -1.45%, +0.39% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 149976046 -> 149971100 (-0.00%); split: -0.00%, +0.00% Subgroup size: 7685264 -> 7685256 (-0.00%) Cycle count: 15566401168 -> 15566405478 (+0.00%); split: -0.00%, +0.00% Spill count: 61238 -> 61240 (+0.00%) Fill count: 107301 -> 107289 (-0.01%) Max live registers: 31992969 -> 31993857 (+0.00%); split: -0.00%, +0.00% Totals from 553 (0.09% of 629912) affected shaders: Instrs: 557027 -> 552081 (-0.89%); split: -0.90%, +0.01% Subgroup size: 8648 -> 8640 (-0.09%) Cycle count: 150154496 -> 150158806 (+0.00%); split: -0.23%, +0.24% Spill count: 181 -> 183 (+1.10%) Fill count: 440 -> 428 (-2.73%) Max live registers: 33698 -> 34586 (+2.64%); split: -0.02%, +2.65% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	65eb7ed5fc	intel/brw: Run intel_nir_lower_conversions only after brw_nir_optimize Without this, the next commit tiggers assertions. v2: Unconditionally do the lowering after brw_nir_optimize. Suggested by Caio. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	572e00dd66	intel/brw: Copy prop from raw integer moves with mismatched types The specific pattern from the unit test was observed in ray tracing trampoline shaders. v2: Refactor the is_raw_move tests out to a utility function. Suggested by Ken. v3: Fix a regression caused by being too picky about source modifiers. This was introduced somewhere between when I did initial shader-db runs an v2. v4: Fix typo in comment. Noticed by Caio. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19734086 -> 19733997 (<.01%) instructions in affected programs: 135388 -> 135299 (-0.07%) helped: 76 / HURT: 2 total cycles in shared programs: 916290451 -> 916264968 (<.01%) cycles in affected programs: 41046002 -> 41020519 (-0.06%) helped: 32 / HURT: 29 fossil-db: Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 151531355 -> 151513669 (-0.01%); split: -0.01%, +0.00% Cycle count: 17209372399 -> 17208178205 (-0.01%); split: -0.01%, +0.00% Max live registers: 32016490 -> 32016493 (+0.00%) Totals from 17361 (2.75% of 630198) affected shaders: Instrs: 2642048 -> 2624362 (-0.67%); split: -0.67%, +0.00% Cycle count: 79803066 -> 78608872 (-1.50%); split: -1.75%, +0.25% Max live registers: 421668 -> 421671 (+0.00%) Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 149995644 -> 149977326 (-0.01%); split: -0.01%, +0.00% Cycle count: 15567293770 -> 15566524840 (-0.00%); split: -0.02%, +0.01% Spill count: 61241 -> 61238 (-0.00%) Fill count: 107304 -> 107301 (-0.00%) Max live registers: 31993109 -> 31993112 (+0.00%) Totals from 17813 (2.83% of 629912) affected shaders: Instrs: 3738236 -> 3719918 (-0.49%); split: -0.49%, +0.00% Cycle count: 4251157049 -> 4250388119 (-0.02%); split: -0.06%, +0.04% Spill count: 28268 -> 28265 (-0.01%) Fill count: 50377 -> 50374 (-0.01%) Max live registers: 470648 -> 470651 (+0.00%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Jesse Natalie	03655dfda1	compiler, vk: Support subgroup size of 4 Relax the assert and assign it an enum value Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30876>	2024-08-29 03:30:31 +00:00
Kenneth Graunke	da395e6985	intel/brw: Fix extract_imm for subregion reads of 64-bit immediates We could be trying to extract a D/UD from a Q/UQ, for example. We were ignoring the top 32-bits, which is incorrect. Fixes: `580e1c592d` ("intel/brw: Introduce a new SSA-based copy propagation pass") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>	2024-08-28 12:33:26 -07:00
Kenneth Graunke	51c85e0363	intel/brw: Drop misguided sign extension attempts in extract_imm() This function never expands a type - it only narrows it. As such, we don't need to ever sign extend to fill additional new bits. I think this code was left over from earlier versions of my optimization pass that was buggy and trying to handle cases it should not have. Fixes: `580e1c592d` ("intel/brw: Introduce a new SSA-based copy propagation pass") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>	2024-08-28 12:33:26 -07:00
Caio Oliveira	695f5314d6	intel/brw: Simplify fs_inst annotation When INTEL_DEBUG=ann is also set, the disassembler would annotate the output with either a string or the string verison of a NIR instruction. This was done by keeping two pointers (but only using one at a time). Change the code to print the instruction into a string instead of keeping it pointer around (peg the string to the shader). That way, only one pointer is needed for annotations. Because that serialization is not free, only do that when the environment variable is set. Since we are here, move the annotation string field to the end, moving it to the least commonly used cacheline. Further packing might allow the entire fs_inst to fit in two cachelines. For release builds, don't even add the debug annotation to the struct. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30822>	2024-08-28 03:59:50 +00:00
Caio Oliveira	ec15cdfa2a	intel/brw: Pack brw_reg struct The alignment required for the second union (has 64-bit size) causes a hole between the first and second union. Move the remaining data there. In 64-bit build, shrinks brw_reg from 24 bytes to 16 bytes. And by consequence, shirnks fs_inst from 200 bytes to 160 bytes, making it use one less cacheline. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30822>	2024-08-28 03:59:50 +00:00
Lionel Landwerlin	e97b968aeb	brw: add a comment what Gfx12.5 URB fences Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30849>	2024-08-27 13:38:14 +00:00
Lionel Landwerlin	93fba40389	brw: switch mesh/task URB fence prior to EOT to GPU Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30849>	2024-08-27 13:38:14 +00:00
Kenneth Graunke	437bda3013	intel/brw: Get rid of the lsc_msg_desc_wcmask helper The LOAD/STORE opcodes take a vector size, while the LOAD/STORE_CMASK opcodes take a channel mask. The two are mutually exclusive. So we can just have the lsc_msg_desc() helper take one or the other in the same parameter. This more closely matches the actual descriptor. We couldn't do this until the previous commit, since we were previously relying on the lsc_msg_desc() function to calculate a cmask out of the number of vector components. But now we don't need it to do that. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30632>	2024-08-27 09:25:59 +00:00
Kenneth Graunke	55f193a105	intel/brw: Switch from LSC CMASK opcodes to regular LOAD/STORE The LOAD/STORE opcodes take a vector size (number of components), while the LOAD/STORE_CMASK opcodes take a channel mask. For some reason, we were passing a number of channels to lsc_msg_desc(), then using it to construct a channel mask with all channels enabled, and always using the CMASK message variants. Considering we don't actually want to mask off any channels, we should probably just use the regular LOAD/STORE opcodes, as they're more flexible anyway. One exception is that typed messages on Xe2 apparently only support LOAD_CMASK/STORE_CMASK and not regular LOAD/STORE. So we keep using those there. (Thanks to Sagar Ghuge for catching this!) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30632>	2024-08-27 09:25:58 +00:00
Sviatoslav Peleshko	09122e2be0	brw,elk: Fix opening flags on dumping shader binaries Truncation is needed for overwriting correctly in cases when old file is bigger than the one we want to dump (e.g. when the old one was edited inplace). Also, creation permissions are way too broad. Fixes: `4f41c44d` ("intel/compiler: Add variable to dump binaries of all compiled shaders") Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30581>	2024-08-27 08:26:08 +00:00
Caio Oliveira	31dfb04fd3	intel/brw: Remove long register file names The long names were originally meant to map to the HW encoding but nowadays the actual encoding values depend on gfx version, whether instruction is 3src, etc. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	6bdf2de4d2	intel/brw: Remove unused ARF values and helpers These were used by old Gfx versions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	72b687abb4	intel/brw: Make BAD_FILE the zero value for brw_reg_file Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	e8f921678a	intel/brw: Explicitly map brw_reg_file into hardware values For now this is a no-op, but will be useful when changing the enum to values that don't match the hardware. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	e7179232c9	intel/brw: Move encoding of Gfx11 3-src inside the inst helpers Create specific helper for register file encoding and handle it there. Use ad-hoc structs to let the macro take optional named arguments. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	d31c8bfb6f	intel/brw: Remove more uses of variable length arrays In these cases there's a clear bound we can use. In C++ this is a compiler extension and not compatible with zero initializing a regular struct -- which will happen in a later change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	86c20e2910	intel/brw: Use a helper for common VEC pattern In the helper, instead of using the Variable Length Array, use a fixed size array to NIR_MAX_VEC_COMPONENTS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	abc535a3b4	intel/brw: Remove unused variable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:13 +00:00
Kenneth Graunke	b97e10208c	intel/brw: Add a file parameter to idom_tree::dump() The other dump methods in this file also take a file parameter, defaulting to stderr. Dumping dot files to stdout is probably not what anybody really wanted. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>	2024-08-22 22:54:45 +00:00
Kenneth Graunke	bb4f05005e	intel/brw: Print blocks in brw_print_instructions_to_file() Useful when examining the control flow graph. For some reason, we printed this for the final assembly but not the IR. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>	2024-08-22 22:54:45 +00:00
Kenneth Graunke	2d73e42333	intel/brw: Fix OOB reads when printing instructions post-reg-alloc Post-register allocation, but before brw_fs_lower_vgrfs_to_fixed_grfs, we have registers with the VGRF file but they are actually fixed GRFs. brw_print_instructions_to_file() was seeing VGRFs and trying to access their size, but using bogus register numbers that could be out-of-bound. Detect when we're post-RA and avoid doing this. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>	2024-08-22 22:54:45 +00:00
Lionel Landwerlin	d9406658ed	brw: remove unused prog_data field Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30713>	2024-08-22 19:44:40 +00:00
Kenneth Graunke	6a292c2699	intel: Fix bad align_offset on global_constant_uniform_block_intel We were specifying align_offset = 64 and align_mul = 64, which is invalid. nir_combined_align() asserts that align_offset < align_mul. Our intention here is to perform cacheline-aligned (64B-aligned) block loads, so we should set align_mul = 64 and can leave align_offset = 0. Fixes: `fbafa9cabd` ("intel/nir: remove load_global_const_block_intel intrinsic") Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30755>	2024-08-21 20:44:57 +00:00
Ian Romanick	c96ceb50d0	intel/brw/xe2: Allow int64 conversions As far as I can tell from looking at the Bspec, MOV between integers of all sizes appears to be supported. shader-db: total instructions in shared programs: 17480631 -> 17480535 (<.01%) instructions in affected programs: 26284 -> 26188 (-0.37%) helped: 21 / HURT: 13 total cycles in shared programs: 897601907 -> 897664293 (<.01%) cycles in affected programs: 10929664 -> 10992050 (0.57%) helped: 48 / HURT: 45 fossil-db: Totals: Instrs: 140686824 -> 140686155 (-0.00%); split: -0.00%, +0.00% Cycle count: 21525129188 -> 21524717729 (-0.00%); split: -0.01%, +0.00% Spill count: 70778 -> 70776 (-0.00%) Fill count: 139172 -> 139168 (-0.00%) Max live registers: 47513859 -> 47513795 (-0.00%) Totals from 612 (0.11% of 549272) affected shaders: Instrs: 964441 -> 963772 (-0.07%); split: -0.09%, +0.02% Cycle count: 1215564312 -> 1215152853 (-0.03%); split: -0.09%, +0.06% Spill count: 16172 -> 16170 (-0.01%) Fill count: 37962 -> 37958 (-0.01%) Max live registers: 70749 -> 70685 (-0.09%) Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30700>	2024-08-21 20:16:00 +00:00
Francisco Jerez	71ca8529c5	intel/brw/gfx12.5+: Fix IR of sub-dword atomic LSC operations. We were currently emitting logical atomic instructions with a packed destination region for sub-dword LSC atomics, along the lines of: > untyped_atomic_logical(32) dst<1>:HF, ... However, these instructions use an LSC data size D16U32, which means that the 16b data on the return payload is expanded to 32b by the LSC shared function, so we were lying to the compiler about the location of the individual channels on the return payload, its execution masking, etc. This is why the hacks that manually set the 'inst->size_written' of the instruction were required. In some cases this worked, but any non-trivial manipulation of the instruction destination by lowering or optimization passes could have led to corruption, as has been reproduced in deqp-vk during lower_simd_width() for shaders that use 16-bit atomics in SIMD32 dispatch mode. Note that LSC sub-dword reads aren't affected by this because they use raw UD destinations and specify the actual bit size of the operation datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't work for atomic operations since that immediate specifies the atomic opcode. Instead, have the logical operation implement the behavior of 16-bit destinations correctly instead of silently replacing the 16-bit region with an inconsistent 32-bit region -- This is done by emitting the MOV instructions used to pack the data from the UD temporary into the packed destination from the lower_logical_sends() pass instead of from the NIR translation pass. Fixes: `43169dbbe5` ("intel/compiler: Support 16 bit float ops") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>	2024-08-21 02:33:12 +00:00
Kenneth Graunke	d22d6d814d	intel/brw: Fix Xe2+ SWSB encoding/decoding for DPAS instructions SBID SET can only be used on SEND, SENDC, or DPAS instructions. The existing code was handling SET for SEND/SENDC, but was using the wrong encoding for DPAS. Add a new case to handle that and make it clear that the existing code is only for SEND/SENDC. While here, rewrite the encoder to use 2-bit binary immediates shifted up into the mode [9:8] field, rather than pre-shifted hex values. This matches the documentation better and is a little easier to follow. On the decode side, we were incorrectly decoding MATH instructions. Because they're marked is_unordered, we were hitting the SEND/SENDC decoding, which is incorrect for MATH. Fixes 22 cooperative matrix tests on Lunar Lake. Huge thanks to Paulo Zanoni for bisecting failures to one of my commits, then analyzing shaders and experimenting to discover that the failure was really an unrelated bug, just being provoked by different choices of registers. His work narrowing the problem down made it much easier to discover and fix this bug. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>	2024-08-20 19:09:37 +00:00
Kenneth Graunke	89f9a6e10b	intel/brw: Pass opcode to brw_swsb_encode/decode We're going to need to handle encoding/decoding differently for DPAS vs. SEND/SENDC vs. other instructions. Pass the opcode so we can figure out the encodings for each type of instruction. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>	2024-08-20 19:09:37 +00:00
Caio Oliveira	40f77b6936	intel/brw: Avoid modifying the shader in assign_curb_setup if not needed If there are no uniforms to push, don't emit the AND or invalidate the shader analysis. This affects only compute shaders. Not a significant impact since lots of shaders end up pushing uniforms. Fossil-db numbers (restricted to compute pipelines only) for DG2 ``` Totals: Instrs: 3071016 -> 3070894 (-0.00%) Cycle count: 8320268863 -> 8320264519 (-0.00%) Totals from 122 (2.70% of 4520) affected shaders: Instrs: 10675 -> 10553 (-1.14%) Cycle count: 2060003 -> 2055659 (-0.21%) ``` Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30631>	2024-08-17 16:25:01 -07:00
Sagar Ghuge	c4f2a8d984	intel/compiler: Fix indirect offset in GS input read for Xe2+ Make sure to take new GRF size into consideration and adjust the indirect offset according to new size so that when we do the indirect load with address register, we load right values. This helps pass the following tests: - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.geom - dEQP-VK.ray_query.geometry_shader. Backport-to: 24.2 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>	2024-08-16 18:40:13 +00:00
Ian Romanick	c8038643b8	intel/brw: Make ifind_msb SSA friendly No shader-db changes on any Intel platform. v2: Use negate(tmp) instead of creating a new temporary. Suggested by Ken. fossil-db: Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00% Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06% Totals from 40 (0.01% of 633223) affected shaders: Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00% Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24% Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00% Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11% Spill count: 59795 -> 59794 (-0.00%) Fill count: 103513 -> 103509 (-0.00%) Totals from 40 (0.01% of 632445) affected shaders: Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00% Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43% Spill count: 16896 -> 16895 (-0.01%) Fill count: 27819 -> 27815 (-0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Ian Romanick	e9c151fde6	intel/brw: Make 16-bit ishl, ishr, and ushr SSA friendly No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 152536266 -> 152535897 (-0.00%); split: -0.00%, +0.00% Cycle count: 17124901233 -> 17112329592 (-0.07%); split: -0.07%, +0.00% Spill count: 78571 -> 78525 (-0.06%) Fill count: 148178 -> 148132 (-0.03%) Totals from 210 (0.03% of 633223) affected shaders: Instrs: 514525 -> 514156 (-0.07%); split: -0.16%, +0.08% Cycle count: 4003540698 -> 3990969057 (-0.31%); split: -0.32%, +0.00% Spill count: 15632 -> 15586 (-0.29%) Fill count: 26241 -> 26195 (-0.18%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Lionel Landwerlin	fbafa9cabd	intel/nir: remove load_global_const_block_intel intrinsic load_global_constant_uniform_block_intel is equivalent in terms of loading, then for the predicate we just do a bcsel afterward in places where that is required. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>	2024-08-16 11:12:39 +00:00
Caio Oliveira	6267585778	intel/brw: Also return the size of the assembled shader Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30062>	2024-08-14 03:03:46 +00:00
Sagar Ghuge	83c2524124	intel/compiler: Adjust trace ray control field on Xe2 Bspec 64643: Structure_TraceRayPayload::Trace Ray Control Bit field moved from 9-8 to 10-8 on Xe2. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Sagar Ghuge	c3c62e493f	intel/compiler: Ray query requires write-back register Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable "When this bit is set in the header, Trace Ray Message behaves like a Ray Query. This message requires a write-back message indicating RayQuery for all valid Rays (SIMD lanes) have completed." If we don't pass the write-back register, somehow it was stepping on over R0 register and can mess up the scratch space accesses which could potentially lead to GPU hang. It can be noticed while running it under simulator trace. send.rta (16\|M0) null r124 r126:1 0x0 0x02000100 {$15} // wr:1+1, rd:0; simd16 trace ray R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig	5f437aa24d	elk: fix compute shader derivatives derivatives are not fs only so move to be with the rest of subgroup ops. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11674 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30634>	2024-08-13 12:19:30 +00:00
Lionel Landwerlin	aaff191356	brw/rt: fix ray_object_(direction\|origin) for closest-hit shaders When closest hit shader is called, the BVH object level brw_nir_rt_load_mem_ray origin/direction is 0. What we should be using is the ray origin/direction and apply the transform of the current instance. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ba7d459a3` ("intel/rt: Implement the new ray-tracing system values") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30578>	2024-08-13 10:28:50 +00:00
Ian Romanick	119801e647	intel/brw: Move fsat instructions closer to the source Intel GPUs have a saturate destination modifier, and brw_fs_opt_saturate_propagation tries to replace explicit saturate operations with this destination modifier. That pass is limited in several ways. If the source of the explicit saturate is in a different block or if the source of the explicit saturate is live after the explicit saturate, brw_fs_opt_saturate_propagation will be unable to make progress. This optimization exists to help brw_fs_opt_saturate_propagation make more progress. It tries to move NIR fsat instructions to the same block that contains the definition of its source. It does this only in cases where it will not create additional live values. It also attempts to do this only in cases where the explicit saturate will ultimiately be converted to a destination modifier. v2: Fix metadata_preserve when theres no progress and use nir_metadata_control_flow when there is progress. All suggested by Alyssa. v3: Fix a typo in the file header comment. Noticed by Ken. Don't require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead of open-coding something slightly more specific. Both suggested by Ken. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19733645 -> 19733028 (<.01%) instructions in affected programs: 193300 -> 192683 (-0.32%) helped: 246 HURT: 1 helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2 helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for instructions value: -2.87 -2.13 95% mean confidence interval for instructions %-change: -0.34% -0.32% Instructions are helped. total cycles in shared programs: 916180971 -> 916264656 (<.01%) cycles in affected programs: 30197180 -> 30280865 (0.28%) helped: 194 HURT: 142 helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19 helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23% HURT stats (abs) min: 1 max: 28058 x̄: 1781.68 x̃: 399 HURT stats (rel) min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63% 95% mean confidence interval for cycles value: -196.84 694.97 95% mean confidence interval for cycles %-change: -0.17% 1.27% Inconclusive result (value mean confidence interval includes 0). fossil-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00% Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02% Max live registers: 32013312 -> 32013549 (+0.00%) Max dispatch width: 5512304 -> 5512136 (-0.00%) Totals from 774 (0.12% of 630172) affected shaders: Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01% Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30% Max live registers: 82195 -> 82432 (+0.29%) Max dispatch width: 6664 -> 6496 (-2.52%) Ice Lake Totals: Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00% Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01% Max live registers: 32471367 -> 32471603 (+0.00%) Max dispatch width: 5623752 -> 5623712 (-0.00%) Totals from 733 (0.12% of 635598) affected shaders: Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01% Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64% Max live registers: 72067 -> 72303 (+0.33%) Max dispatch width: 6216 -> 6176 (-0.64%) Skylake Totals: Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01% Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00% Totals from 659 (0.11% of 625670) affected shaders: Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00% Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41% Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:10 -07:00
Ian Romanick	f5815a003e	intel/brw: Use def analysis for simple cases of saturate propagation I had hoped this would improve compilation performance too. I tried several different long running fossils, and there was no difference. Fossil-db results are all over the place from platform to platform. All of the Tiger Lake shaders hurt for spills and fills are fragment shaders in rdr2. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19734088 -> 19733645 (<.01%) instructions in affected programs: 71200 -> 70757 (-0.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 1 helped stats (rel) min: 0.06% max: 2.79% x̄: 0.83% x̃: 0.48% 95% mean confidence interval for instructions value: -2.69 -2.07 95% mean confidence interval for instructions %-change: -0.93% -0.72% Instructions are helped. total cycles in shared programs: 916290473 -> 916180971 (-0.01%) cycles in affected programs: 3403719 -> 3294217 (-3.22%) helped: 89 HURT: 88 helped stats (abs) min: 1 max: 36685 x̄: 1424.13 x̃: 10 helped stats (rel) min: <.01% max: 26.75% x̄: 1.66% x̃: 0.46% HURT stats (abs) min: 1 max: 8750 x̄: 195.98 x̃: 7 HURT stats (rel) min: <.01% max: 17.12% x̄: 1.57% x̃: 0.19% 95% mean confidence interval for cycles value: -1199.88 -37.43 95% mean confidence interval for cycles %-change: -0.66% 0.56% Inconclusive result (%-change mean confidence interval includes 0). fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151458346 -> 151457413 (-0.00%) Cycle count: 17202426472 -> 17202406469 (-0.00%); split: -0.00%, +0.00% Max live registers: 31989626 -> 31989959 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5500560 -> 5500384 (-0.00%) Totals from 479 (0.08% of 628970) affected shaders: Instrs: 398836 -> 397903 (-0.23%) Cycle count: 18064565 -> 18044562 (-0.11%); split: -0.40%, +0.29% Max live registers: 36663 -> 36996 (+0.91%); split: -0.02%, +0.92% Max dispatch width: 4392 -> 4216 (-4.01%) Tiger Lake Totals: Instrs: 149913036 -> 149912182 (-0.00%); split: -0.00%, +0.00% Cycle count: 15560086488 -> 15560135139 (+0.00%); split: -0.00%, +0.00% Spill count: 61241 -> 61251 (+0.02%) Fill count: 107304 -> 107314 (+0.01%) Max live registers: 31964752 -> 31965119 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5517568 -> 5517248 (-0.01%) Totals from 486 (0.08% of 628673) affected shaders: Instrs: 396065 -> 395211 (-0.22%); split: -0.23%, +0.01% Cycle count: 17677691 -> 17726342 (+0.28%); split: -0.23%, +0.51% Spill count: 1302 -> 1312 (+0.77%) Fill count: 3746 -> 3756 (+0.27%) Max live registers: 37538 -> 37905 (+0.98%); split: -0.02%, +0.99% Max dispatch width: 4576 -> 4256 (-6.99%) Ice Lake Totals: Instrs: 151348422 -> 151347463 (-0.00%) Cycle count: 15155678386 -> 15155691726 (+0.00%); split: -0.00%, +0.00% Fill count: 108114 -> 108111 (-0.00%) Max live registers: 32444479 -> 32444814 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5611288 -> 5611256 (-0.00%) Totals from 483 (0.08% of 634352) affected shaders: Instrs: 393333 -> 392374 (-0.24%) Cycle count: 16706439 -> 16719779 (+0.08%); split: -0.14%, +0.22% Fill count: 3654 -> 3651 (-0.08%) Max live registers: 37246 -> 37581 (+0.90%); split: -0.02%, +0.92% Max dispatch width: 4312 -> 4280 (-0.74%) Skylake Totals: Instrs: 140741190 -> 140734481 (-0.00%); split: -0.00%, +0.00% Cycle count: 14659096516 -> 14659116346 (+0.00%); split: -0.00%, +0.00% Max live registers: 31757558 -> 31757725 (+0.00%) Max dispatch width: 5470040 -> 5469920 (-0.00%) Totals from 3542 (0.57% of 624449) affected shaders: Instrs: 3081309 -> 3074600 (-0.22%); split: -0.22%, +0.00% Cycle count: 228843073 -> 228862903 (+0.01%); split: -0.11%, +0.12% Max live registers: 304531 -> 304698 (+0.05%) Max dispatch width: 31016 -> 30896 (-0.39%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:05 -07:00
Ian Romanick	adcce2bba4	intel/brw: Small code refactor in brw_fs_opt_saturate_propagation This bit of code will have a second use in the next commit. v2: Fix some broken indentation. Noticed by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:03 -07:00

1 2 3 4 5 ...

3713 commits