fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 21:50:12 +01:00

Author	SHA1	Message	Date
Ian Romanick	c2ac7fa77b	brw/cmod: Allow integer CMP to ADD propagation only for Z and NZ No shader-db chnages on any Intel platform. v2: Add a note about integer types in the saturate handling path. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 210743769 -> 210743727 (-0.00%) Cycle count: 30377699060 -> 30377700318 (+0.00%); split: -0.00%, +0.00% Totals from 36 (0.01% of 706776) affected shaders: Instrs: 17032 -> 16990 (-0.25%) Cycle count: 291716 -> 292974 (+0.43%); split: -0.01%, +0.44% Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>	2025-04-28 19:44:23 +00:00
Ian Romanick	e26270249b	brw/cmod: Don't propagate from CMP to possible Inf + (-Inf) Most of the churn in this commit is changing unit tests that were testing things that are now invalid. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17122204 -> 17122669 (<.01%) instructions in affected programs: 120669 -> 121134 (0.39%) helped: 0 / HURT: 124 total cycles in shared programs: 895602370 -> 895613210 (<.01%) cycles in affected programs: 17868974 -> 17879814 (0.06%) helped: 35 / HURT: 85 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 210736518 -> 210743769 (+0.00%) Cycle count: 30377733040 -> 30377699060 (-0.00%); split: -0.00%, +0.00% Max live registers: 66056852 -> 66056966 (+0.00%) Totals from 1505 (0.21% of 706776) affected shaders: Instrs: 1890151 -> 1897402 (+0.38%) Cycle count: 48397408 -> 48363428 (-0.07%); split: -0.11%, +0.04% Max live registers: 256821 -> 256935 (+0.04%) Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `020b0055e7` ("i965/fs: Propagate conditional modifiers from compares to adds") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>	2025-04-28 19:44:23 +00:00
Ian Romanick	0dab520a19	brw/cmod: Fix some errors when propagating from CMP to ADD.SAT When I originally wrote that code, I didn't understand what a jerk NaN can be. v2: Remove the brw_type_is_uint stuff. This function is currently only called for float types. In a later commit, integer types will be supported but only for NZ and Z conditions. Noticed by Matt. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17122197 -> 17122204 (<.01%) instructions in affected programs: 1691 -> 1698 (0.41%) helped: 0 / HURT: 4 total cycles in shared programs: 895602484 -> 895602370 (<.01%) cycles in affected programs: 912964 -> 912850 (-0.01%) helped: 2 / HURT: 2 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 210736388 -> 210736518 (+0.00%) Cycle count: 30377728900 -> 30377733040 (+0.00%); split: -0.00%, +0.00% Totals from 130 (0.02% of 706776) affected shaders: Instrs: 169911 -> 170041 (+0.08%) Cycle count: 18021210 -> 18025350 (+0.02%); split: -0.00%, +0.02% Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `020b0055e7` ("i965/fs: Propagate conditional modifiers from compares to adds") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>	2025-04-28 19:44:23 +00:00
Ian Romanick	8f0fd0e66e	brw/cmod: Remove special handling of NOT The previous commit converts any NOT that might have been affected by this path into a simple MOV. Those MOVs are handled by other paths. No shader-db or fossil-db changes on any Intel platform. v2: Fix a bad squash. Changes that were accidentally in this commit were supposed to be in the previous commit. Noticed by Ivan. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>	2025-04-28 19:44:23 +00:00
Ian Romanick	08fe7988d7	brw/algebraic: Convert some NOT to MOV On Xe platforms, many fragment shaders have patterns like: asr(8) g21<2>W g1.2<0,1,0>W 15D ... mov(8) g11<1>UW g21<16,8,2>UW ... not.nz.f0.0(8) null<1>D g11<8,8,1>W Converting the NOT.NZ to MOV.Z enables copy propagation to eliminate the original MOV. Then cmod propagation is able to eliminate the NOT-converted-to-MOV. It might be possible to cover this case by adding more opcodes to the list NOT can propagate to. The next commit will show that just converting to MOV is a better approach anyway. v2: Fix a bad squash. Changes that were supposed to be in this commit were accidentally in the next commit. Noticed by Ivan. shader-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 20069804 -> 20065167 (-0.02%) instructions in affected programs: 592450 -> 587813 (-0.78%) helped: 2300 / HURT: 0 total cycles in shared programs: 884534032 -> 884496201 (<.01%) cycles in affected programs: 13064194 -> 13026363 (-0.29%) helped: 1285 / HURT: 790 LOST: 18 GAINED: 15 fossil-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 234506495 -> 234468664 (-0.02%) Cycle count: 24444825202 -> 24445710703 (+0.00%); split: -0.01%, +0.01% Max live registers: 42349793 -> 42349789 (-0.00%) Max dispatch width: 7131344 -> 7131744 (+0.01%); split: +0.05%, -0.04% Totals from 16673 (2.07% of 805781) affected shaders: Instrs: 6497669 -> 6459838 (-0.58%) Cycle count: 435877770 -> 436763271 (+0.20%); split: -0.54%, +0.74% Max live registers: 1122972 -> 1122968 (-0.00%) Max dispatch width: 151528 -> 151928 (+0.26%); split: +2.19%, -1.92% No shader-db or fossil-db on any other Intel platforms. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>	2025-04-28 19:44:23 +00:00
Ian Romanick	9ce869aef5	brw/cmod: Delete some stale comment text Stale like the mummified remains of Ötzi, The Iceman. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>	2025-04-28 19:44:23 +00:00
Ian Romanick	12a022cf45	brw/algebraic: Greatly simplify brw_opt_constant_fold_instruction Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details brw_opt_constant_fold_instruction can either do nothing or replace the instruction with a MOV of an immediate value. Previously each opcode case would perform this replacement, and code at the bottom of the function would verify the results. It is much simpler if each opcode case calculates a result in a brw_reg, and code at the bottom of the function performs the replacement. There are two outlier cases that cannot use this pattern: MAD and BROADCAST. These cases simply return directly from the switch-statement after performing the replacement. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34707>	2025-04-28 18:33:42 +00:00
Lionel Landwerlin	1f6cca0800	intel: fixup a few debugging option checks Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ad328bc58d` ("intel: Switch uint64_t intel_debug to a bitset") Reviewed-by: Michael Cheng <michael.cheng@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34667>	2025-04-23 18:47:42 +00:00
Michael Cheng	ad328bc58d	intel: Switch uint64_t intel_debug to a bitset We are reaching our limit of adding flags to intel_debug (apporaching 64 flags). Switch intel_debug to a bitset, which gives us almost "unlimited" bits to use in the future. v2(Michael Cheng): Fixed a few ci errors Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Casey Bowman <casey.g.bowman@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34596>	2025-04-22 23:09:26 +00:00
Sagar Ghuge	36433e932b	intel/rt: Update BVH instance leaf load for Xe3+ Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>	2025-04-21 20:10:45 +00:00
Sagar Ghuge	5cd0f4ba2f	intel/compiler: Update MemRay data structure to 64-bit Rework: (Kevin) - Fix miss_shader_index offset - Handle hit group index Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>	2025-04-21 20:10:45 +00:00
Kevin Chuang	7b526de18f	intel/compiler/rt: Calculate barycentrics on demand This commit moves the calculation of tri_bary out of brw_nir_rt_load_mem_hit_from_addr(), and only do the calculation on demand, since unorm_float_convert can be expensive. We do this for both Xe1/2 and Xe3+ for consistency. Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>	2025-04-21 20:10:45 +00:00
Sagar Ghuge	afc23dffa4	intel/compiler: Update MemHit data structure to 64-bit version Rework (Kevin): - Fix inst leaf ptr - Handle 24bit unorm barycentric coord Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>	2025-04-21 20:10:45 +00:00
Kevin Chuang	40fb95d51a	intel/compiler: Use 24bits for hit_kind on Xe3+ For Xe3+, the upper 8 bits of the second dword of a potential hit is used to store hitGroupIndex0, which is stuffed by the HW. This hitGroupIndex0 will later be used by the HW again to reconstruct the whole hitGroupIndex when driver issues a TRACE_RAY_COMMIT. We were corrupting this hitGroupIndex0 at the driver by setting the whole dword to hit_kind, which will cause the HW to read a wrong hitGroupIndex and therefore invoke a wrong closest hit shader. The behavior can be seen in dEQP-VK.ray_tracing_pipeline.pipeline_no_null_shaders_flag.gpu.boxes.\* and dEQP-VK.ray_tracing_pipeline.pipeline_library.configurations.\* This commit changes the driver to only use lower 24bits to store the hit_kind, and leave the upper 8bits as it. Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>	2025-04-21 20:10:45 +00:00
Sagar Ghuge	64fd66407b	intel/compiler: Pass around intel_device_info parameter in helper This will help us to handle code path separately for Xe3+ for updated 64bit memory data structure for RT. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>	2025-04-21 20:10:45 +00:00
Sagar Ghuge	6deb1950a4	anv: Update RT dispatch globals to use 64bit data structure Rework (Kevin) - Fix Hit/Miss/Resume shader group table value Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33047>	2025-04-21 20:10:45 +00:00
Sushma Venkatesh Reddy	4084527876	intel/compiler: Always run opt_algebraic after descriptor_lowering Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This change ensures that `brw_opt_algebraic` is always executed after `brw_lower_send_descriptors` in `brw_opt.cpp`. By doing so, redundant logical operations are optimized, resulting in cleaner and more compact assembly output. fossil-db results on LNL: - Totals: - Instructions: 215857290 -> 215857028 (-0.00%) - Cycle count: 32008929636 -> 32008935384 (+0.00%); split: -0.00%, +0.00% - Max live registers: 66940643 -> 66940557 (-0.00%) - Affected shaders (104 out of 713963): - Instructions: 31090 -> 30828 (-0.84%) - Cycle count: 5955908 -> 5961656 (+0.10%); split: -0.16%, +0.26% - Max live registers: 10888 -> 10802 (-0.79%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34615>	2025-04-19 07:05:54 +00:00
Rohan Garg	9b477eea19	intel/compiler: use a immediate when doing the shift We can pass immediates to SHL and don't need to allocate a separate register here. Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34604>	2025-04-18 10:08:22 +00:00
Caio Oliveira	d5ad798140	spirv, radv, intel: Add NIR intrinsic for cmat conversion A cooperative matrix conversion operation was represented in NIR by the cmat_unary_op intrinsic with an nir_alu_op as extra parameter, that was already lowered to a specific conversion operation based on the matrix types. Instead of that, add a new intrinsic `cmat_convert` that is specific for that conversion. In addition to the src/dst matrix descriptions already available, also include the signedness information in the intrinsic (reuse nir_cmat_signed for that). This is needed because different Convert operations define different interpretations for integers, regardless their original type. In this patch, both radv and intel were changed to use the same logic that was previously used to pick the lowered ALU op. This change will help represent cmat conversions involving BFloat16, because it avoids having to create new NIR ALU ops for all the combinations involving BFloat16. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34511>	2025-04-16 23:13:36 +00:00
Ian Romanick	e783930b10	elk/algebraic: Don't optimize float SEL.CMOD to MOV Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Floating point SEL.CMOD may flush denorms to zero. We don't have enough information at this point in compilation to know whether or not it is safe to remove that. Integer SEL or SEL without a conditional modifier is just a fancy MOV. Those are always safe to eliminate. See also `3f782cdd25`. Fixes: `fab92fa1cb` ("i965/fs: Optimize SEL with the same sources into a MOV.") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>	2025-04-15 23:59:31 +00:00
Ian Romanick	f4ede9c10a	elk/algebraic: Clear condition modifier on optimized SEL instruction The condition modifier on SEL means something completely different than it means on MOV. On MOV it means to modify the flags based on the value written to the destination. On SEL it means to compare the sources using that mode and pick the result (i.e., as min() or max()) without modifying the flags. The resulting MOV should not have a condition modifier for the same reason it (already) doesn't have a predicate. This bug was found by inspection, so I added a unit test. Fixes: `fab92fa1cb` ("i965/fs: Optimize SEL with the same sources into a MOV.") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>	2025-04-15 23:59:31 +00:00
Ian Romanick	6a19d8915f	brw/algebraic: Don't optimize float SEL.CMOD to MOV Floating point SEL.CMOD may flush denorms to zero. We don't have enough information at this point in compilation to know whether or not it is safe to remove that. Integer SEL or SEL without a conditional modifier is just a fancy MOV. Those are always safe to eliminate. See also `3f782cdd25`. Fixes: `fab92fa1cb` ("i965/fs: Optimize SEL with the same sources into a MOV.") No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 209903490 -> 209903492 (+0.00%) Cycle count: 30546025224 -> 30546021980 (-0.00%); split: -0.00%, +0.00% Max live registers: 65516231 -> 65516235 (+0.00%) Totals from 2 (0.00% of 706657) affected shaders: Instrs: 3197 -> 3199 (+0.06%) Cycle count: 361650 -> 358406 (-0.90%); split: -10.05%, +9.15% Max live registers: 300 -> 304 (+1.33%) Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>	2025-04-15 23:59:31 +00:00
Ian Romanick	07dc1d4043	brw/algebraic: Clear condition modifier on optimized SEL instruction The condition modifier on SEL means something completely different than it means on MOV. On MOV it means to modify the flags based on the value written to the destination. On SEL it means to compare the sources using that mode and pick the result (i.e., as min() or max()) without modifying the flags. The resulting MOV should not have a condition modifier for the same reason it (already) doesn't have a predicate. This bug was found by inspection, so I added a unit test. No shader-db or shader-db changes on any Intel platform. Fixes: `fab92fa1cb` ("i965/fs: Optimize SEL with the same sources into a MOV.") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>	2025-04-15 23:59:31 +00:00
Caio Oliveira	fbe5d559bd	brw: Update EU validation to allow packed BF mixed with packed F Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Caio Oliveira	d1dd088ede	brw: Allow DPAS with BF on Gfx125 MTL doesn't support, but both ACM and ARL-H do. Fixes: `e384ccde28` ("brw: Expand EU validation for DPAS") Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Caio Oliveira	adfab666a4	intel: Add intel_device_info::has_systolic Gfx125+ has systolic, with exception for MTL and some ARL variants. Update code and tests to use it. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>	2025-04-14 18:23:43 +00:00
Kenneth Graunke	eb1ec9cf8e	brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs() This allows us to create temporary VGRFs that are larger than MAX_VGRF_SIZE(devinfo), which will be split eventually. They may not be split on the initial pass, because we may need LOAD_PAYLOAD lowering, copy propagation, and so on to occur first. So we allow registers to exceed that size initially. The "Register allocation relies on split_virtual_grfs()" assertion in brw_reg_allocate.cpp still asserts that all VGRFs which reach the register allocator have been properly split. One case where this is useful is for vectorizing convergent block loads. We create temporaries to splat the SIMD1 values out to SIMD(N), which can lead to some very large temporaries. However, copy propagation and so on ultimately eliminate these and they'll get split down to proper sizes or elided entirely in the end. (Note: both this and the prior commits from this merge request are needed to close the linked issue.) Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12324 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Kenneth Graunke	a45583f078	brw: Use live->max_vgrf_size in pre-RA scheduling Post-RA scheduling doesn't use liveness analysis, so we continue using MAX_VGRF_SIZE(devinfo). But for pre-RA scheduling, we now use live->max_vgrf_size. This helps get us to a place where we can emit arbitrarily large VGRFs early on in compilation, but which will be split and cleaned up prior to register allocation. It may also allocate smaller arrays in practice since MAX_VGRF_SIZE(devinfo) assumes the worst case scenario for things we actually could need to allocate. Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Kenneth Graunke	4b27b5895c	brw: Use live->max_vgrf_size in register coalescing We already require liveness, so just use the actual maximum size we saw instead of a hardcoded pessimal size. Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Kenneth Graunke	ea468412f6	brw: Track the largest VGRF size in liveness analysis We're already looking at this data to calculate the per-component vars_from_vgrf[] and vgrf_from_vars[] mappings, so just record the largest VGRF size while we're here. This will allow passes to size arrays based on the actual size needed, rather than hardcoding some fixed size. In many cases, MAX_VGRF_SIZE(devinfo) is larger than necessary, because e.g. vec5 sparse sampling results aren't used. Not hardcoding this means we can also temporarily handle very large VGRFs which we know will be split eventually, without having to increase the maximum which is ultimately used for RA classes. Cc: mesa-stable Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>	2025-04-11 20:34:51 +00:00
Lionel Landwerlin	06ad9a25e5	brw: fix Wa_22013689345 emission Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details 2 problems : - not detecting null destination correctly - applied too late using SHADER_OPCODE_MEMORY_FENCE, when lowering already happened Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34319>	2025-04-10 16:44:28 +00:00
Ian Romanick	cb69d019cf	brw/nir: Use offset() for all uses of offs in emit_pixel_interpolater_alu_at_offset This is necessary to appropriately uniformize the first component access of a convergent vector. Without this, this is produced: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0:F, 0.5f add(32) %22:F, %18+2.0<0>:F, 0.5f This is the correct code: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0<0>:F, 0.5f add(32) %22:F, %18+2.0<0>:F, 0.5f Without `38b58e286f`, the code generated was more incorrect, but happened to work for this test case: load_payload(16) %18:D, 0d, 0d NoMask group0 add(32) %21:F, %18+0.0<0>:F, 0.5f add(32) %22:F, %18+0.4<0>:F, 0.5f Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `38b58e286f` ("brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset") Closes: #12969 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34427>	2025-04-09 22:21:18 +00:00
Caio Oliveira	7457c4ecfd	brw: Make brw_range use half-open ranges Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	6509f8139d	brw: Use brw_range::last() to explicit get the last valid IP This is a preparation to change what is stored in brw_range::end. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	596bbb2c95	brw: Use brw_range to store Vars ranges Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	0b4a3c0ff6	brw: Use brw_range to store VGRF ranges Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	e644b42e59	brw: Use brw_range when operating with live ranges Makes the intention of some comparisons clearer by using the named helper functions. Add commentary when the straightforward range is not the one used, e.g. VGRF interference. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	f56a5cf1eb	brw: Use brw_range in IP ranges analysis Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:49 +00:00
Caio Oliveira	fb50461220	brw: Add brw_range struct Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:48 +00:00
Caio Oliveira	8d9155e34d	brw: Clean up saturate propagation after non-defs version removal Remove now unused analysis and no need to walk blocks in reverse after the non-defs version of the pass was removed. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:48 +00:00
Caio Oliveira	cfc4067b0e	brw: Add a few basic tests for register coalesce Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>	2025-04-09 19:06:48 +00:00
Lionel Landwerlin	19e4dda9a2	brw: fix shuffle with scalar/uniform index The fixes commit isn't actually the source of the bug but likely the biggest enabler because it creates scalar values that more easily end up in the shuffle operations. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `1b24612c57` ("brw/nir: Treat load_*_uniform_block_intel as convergent") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12927 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12688 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12570 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12905 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12734 Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34393>	2025-04-08 20:14:11 +00:00
Felix DeGrood	7a3de9e877	intel/brw: support for dumping shader line numbers Add support for dumping shader asm containing instruction line numbers matching offsets within instruction state pool buffer. Offsets should match values collected from eu stall sampling. This is required for match eu stall data with individual shader instructions. Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>	2025-04-08 19:39:53 +00:00
Faith Ekstrand	436f175187	intel/compiler: Use nir_split_conversions() Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34266>	2025-04-07 17:45:21 -05:00
Caio Oliveira	bf9ad36f2d	brw: Properly handle cooperative matrices created with constants Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Expand constant sources to cover the region read by DPAS, and also use NULL register as accumulator when possible. Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34373>	2025-04-07 14:27:43 -07:00
Ian Romanick	f33faa4648	brw/nir: Allow b2f(not(X)) optimization on Gfx12.5+ Since there are no type conversions, no restrictions are violated. No shader-db or fossil-db changes on any Gfx12 or older Intel platforms. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 16956077 -> 16944933 (-0.07%) instructions in affected programs: 1957573 -> 1946429 (-0.57%) helped: 4629 / HURT: 35 total cycles in shared programs: 915668518 -> 915684808 (<.01%) cycles in affected programs: 341925598 -> 341941888 (<.01%) helped: 3040 / HURT: 1305 helped stats (abs) min: 2 max: 23034 x̄: 205.36 x̃: 16 helped stats (rel) min: <.01% max: 41.21% x̄: 1.28% x̃: 0.48% HURT stats (abs) min: 2 max: 68820 x̄: 490.88 x̃: 22 HURT stats (rel) min: <.01% max: 103.69% x̄: 2.29% x̃: 0.37% 95% mean confidence interval for cycles value: -50.28 57.78 95% mean confidence interval for cycles %-change: -0.35% -0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 40 GAINED: 42 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 209828027 -> 209790349 (-0.02%); split: -0.03%, +0.01% Cycle count: 30504938008 -> 30514045408 (+0.03%); split: -0.06%, +0.09% Spill count: 512182 -> 512168 (-0.00%) Fill count: 623432 -> 623426 (-0.00%); split: -0.00%, +0.00% Max live registers: 65465029 -> 65464959 (-0.00%) Totals from 57895 (8.19% of 706589) affected shaders: Instrs: 50144907 -> 50107229 (-0.08%); split: -0.11%, +0.03% Cycle count: 7549692606 -> 7558800006 (+0.12%); split: -0.25%, +0.37% Spill count: 58834 -> 58820 (-0.02%) Fill count: 102324 -> 102318 (-0.01%); split: -0.01%, +0.01% Max live registers: 9129045 -> 9128975 (-0.00%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	853ead2073	brw/nir: Optimize b2f(not(X)) using logical operations instead of arithmetic Funny story... this is how regular b2f was implemented before Curro implmented the `MOV dst:F -src:D` method 9 years ago (see `3ee2daf23d`). Eliminating the type conversion in the arithmetic operation enables the next commit. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	3d23496fd9	brw/copy: Copy prop -X into Y&1 This commit prevents code quality regressions in the next commit. Without this, some fragment shaders in Batman: Arkham Origins have code like: shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V ... and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW ... add(8) g56<1>D -g52<8,8,1>D 1D transformed to shr(8) g51<1>UW g1.28<1,8,0>UB 0x76543210V ... and(8) g52<1>UD ~g51<8,8,1>UW 0x0001UW ... mov(8) g56<1>D -g52<8,8,1>D ... and(8) g57<1>UD ~g56<8,8,1>D 0x00000001UD Propagating through the negation allows the added MOV to be deleted. shader-db: All Intel platforms had simlar results. (Lunar Lake shown) total instructions in shared programs: 16968020 -> 16968019 (<.01%) instructions in affected programs: 281 -> 280 (-0.36%) helped: 1 / HURT: 0 total cycles in shared programs: 914598850 -> 914598832 (<.01%) cycles in affected programs: 5398 -> 5380 (-0.33%) helped: 1 / HURT: 0 A single Blender vertex shader was affected. fossil-db: Lunar Lake, Tiger Lake, Ice Lake, and Skylake had similar results. (Lunar Lake shown) Totals: Instrs: 209894650 -> 209894651 (+0.00%) Cycle count: 30545958586 -> 30545952860 (-0.00%) Totals from 2 (0.00% of 706657) affected shaders: Instrs: 3582 -> 3583 (+0.03%) Cycle count: 1875100 -> 1869374 (-0.31%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Subgroup size: 9906400 -> 9906416 (+0.00%) Totals from 2 (0.00% of 805770) affected shaders: Subgroup size: 16 -> 32 (+100.00%) Two compute shaders in Hogwarts Legacy were affected. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	e82464e6e0	brw/copy: Refactor source modifier type checking This simplifies the next commit. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>	2025-04-07 17:42:05 +00:00
Ian Romanick	dee49f4206	brw/algebraic: Optimize derivative of convergent value Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This is mostly defensive. If a convergent value ever ended up as a source of a DDX or DDY, the eu_emit code will ignore the stride. This will result in bad code being generated. No shader-db or fossil-db changes on any Intel platform. v2: DDX and DDY will always be float, but brw_imm_for_type only works with integer types. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Suggested-by: Ken Fixes: `d5d7ae22ae` ("brw/nir: Fix up handling of sources that might be convergent vectors") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>	2025-04-07 17:16:34 +00:00

... 2 3 4 5 6 ...

4434 commits