fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 13:30:12 +01:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	d99c2ef059	nir/opt_uniform_atomics: add fs atomics predicated? flag on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in this pass is redundant. and would cause a problem since we'd then have to lower the "is helper inv?" flag late. so just skip the extra lowering code. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:17 -04:00
Kenneth Graunke	c19e5a0a75	intel/brw: Replace predicated break optimization with a simple peephole We can achieve most of what brw_fs_opt_predicated_break() does with simple peepholes at NIR -> BRW conversion time. For predicated break and continue, we can simply look at an IF ... ENDIF sequence after emitting it. If there's a single instruction between the two, and it's a BREAK or CONTINUE, then we can move the predicate from the IF onto the jump, and delete the IF/ENDIF. Because we haven't built the CFG at this stage, we only need to remove them from the linked list of instructions, which is trivial to do. For the predicated while optimization, we can rely on the fact that we already did the predicated break optimization, and simply look for a predicated BREAK just before the WHILE. If so, we move the predicate onto the WHILE, invert it, and remove the BREAK. There are a few cases where this approach does a worse job than the old one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks containing break, and nir_trivialize_registers may decide it needs to insert movs into those blocks. So, at NIR -> BRW time, we'll actually emit some MOVs there, which might have been possible to copy propagate out after later optimizations. However, the fossil-db results show that it's still pretty competitive. For instructions, 1017 shaders were helped (average -1.87 instructions), while only 62 were hurt (average +2.19 instructions). In affected shaders, it was -0.08% for instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	fad63d6483	intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass With the select peephole gone, this no longer does much of anything. No instruction changes in fossil-db on Alchemist. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	06e8335e11	intel/brw: Delete the brw_fs_opt_peephole_select() pass Now that we can handle load_ubo in NIR's peephole select pass, the backend pass isn't really useful anymore. fossil-db results on Alchemist show almost no impact: Totals: Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00% Cycles: 12633748945 -> 12633760459 (+0.00%) Totals from 261 (0.04% of 630008) affected shaders: Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14% Cycles: 23947172 -> 23958686 (+0.05%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	7c579f448f	intel/brw: Mark all UBO access with a direct buffer index as speculative UBO loads with a non-indirect buffer index should be safe to perform speculatively. With a direct offset, we may sometimes turn them into push constants, at which point it's just reading a register with no cost at all. Otherwise, we access them via messages that use surface state, and automatically perform bounds checking. So we shouldn't have any issues with reading out of bounds and page faulting, for example. This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics, so we can turn simple if's with loads on both sides to bcsels. In some cases this can collapse a surprising amount of control flow, allowing other optimizations to work better. The i965 OpenGL driver used load_uniform intrinsics, which are allowed in NIR's peephole select pass. But iris uses the Gallium NIR pass that translates uniforms to loads from UBO 0, so we haven't been able to take advantage of NIR's peephole select pass there. The backend pass was still able to handle this to some extent, however. fossil-db results on Alchemist: Totals: Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00% Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00% Send messages: 7416330 -> 7416261 (-0.00%) Spill count: 52471 -> 52473 (+0.00%) Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00% Scratch Memory Size: 3197952 -> 3198976 (+0.03%) Totals from 1848 (0.29% of 630003) affected shaders: Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02% Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03% Send messages: 59829 -> 59760 (-0.12%) Spill count: 3870 -> 3872 (+0.05%) Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02% Scratch Memory Size: 174080 -> 175104 (+0.59%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Iván Briano	f8553f56ac	intel/rt: fix terminateOnFirstHit handling If TraceRay() is called with the TerminateOnFirstHit flag, we need to terminate the ray on the first confirmed intersection. This is handled by the lowering of accept_ray_intersection and it's working fine for the case of multiple instances of the intersection shader being called. But if the shader calls reportIntersection() more than once, we were handling them all and accepting the closest one regardless of the flag. Check for the flag on every confirmed intersection and, if set, accept it right there. The subsequent lowering will take care of terminating handling the ray termination if necessary. Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>	2024-08-05 21:43:36 +00:00
Caio Oliveira	ba3fd5dc57	intel/brw: Don't retype load_subgroup_invocation result to signed The values are small unsigned integers, so their signed representation will be the same -- the sign conversion is not needed. As a result the extra MOV can be elided by the optimizations. Fossil-db results for DG2 ``` Totals: Instrs: 151779000 -> 151761591 (-0.01%) Cycle count: 12743968649 -> 12742826024 (-0.01%); split: -0.01%, +0.00% Max live registers: 31834993 -> 31834996 (+0.00%) Totals from 17018 (2.70% of 631450) affected shaders: Instrs: 2381740 -> 2364331 (-0.73%) Cycle count: 76798588 -> 75655963 (-1.49%); split: -1.70%, +0.22% Max live registers: 378921 -> 378924 (+0.00%) ``` and TGL ``` Totals: Instrs: 149812033 -> 149794080 (-0.01%); split: -0.01%, +0.00% Cycle count: 11534727002 -> 11534929834 (+0.00%); split: -0.01%, +0.01% Spill count: 42510 -> 42511 (+0.00%); split: -0.00%, +0.01% Fill count: 75100 -> 75101 (+0.00%); split: -0.00%, +0.00% Max live registers: 31727318 -> 31727321 (+0.00%) Totals from 17421 (2.76% of 630458) affected shaders: Instrs: 3092614 -> 3074661 (-0.58%); split: -0.58%, +0.00% Cycle count: 286061417 -> 286264249 (+0.07%); split: -0.32%, +0.39% Spill count: 11538 -> 11539 (+0.01%); split: -0.02%, +0.03% Fill count: 21359 -> 21360 (+0.00%); split: -0.01%, +0.01% Max live registers: 418954 -> 418957 (+0.00%) ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30508>	2024-08-05 18:05:45 +00:00
Jordan Justen	58469620d3	intel/brw/validate: Convert access mask to be grf based Our validation code doesn't need to know which bytes are accessed. It only needs to know which grfs were accessed by an element. This also helps to easily handle the Xe2 register size change. Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Jordan Justen	e62606b2ec	intel/brw/validate: Update dst grf crossing check for Xe2 Rework: * Update grf_size_shift calculation (s-b Ken) Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Jordan Justen	f2800deacb	intel/brw/validate: Simplify grf span validation check by not using a mask Previously this check would create a mask of the bytes used in the grf, and then shift the mask. This worked well when there was 32 bytes in the register because a 64-bit uint64_t could easily detect that bytes were used in the next regiter. (The next register was the high 32-bits of the `access_mask` variable.) With Xe2, the register size becomes 64 bytes, meaning this strategy doesn't work. Instead of a mask, we can just check to see if more than 1 grfs are used during each loop iteration. (Suggested by Ken.) This will make it easier to extend for Xe2 in a follow on commit. Verified this with dEQP-VK.subgroups.arithmetic.compute.subgroupexclusivemul_u64vec4_requiredsubgroupsize on Xe2, which otherwise would cause the program to fail to validate because it assumed a grf was 32 bytes. Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Kenneth Graunke	8bca7e520c	intel/brw: Only force g0's liveness to be the whole program if spilling We don't actually need to extend g0's live range to the EOT message generally - most messages that end a shader are headerless. The main implicit use of g0 is for constructing scratch headers. With the last two patches, we now consider scratch access that may exist in the IR and already extend the liveness appropriately. There is one remaining problem: spilling. The register allocator will create new scratch messages when spilling a register, which need to create scratch headers, which need g0. So, every new spill or fill might extend the live range of g0, which would create new interference, altering the graph. This can be problematic. However, when compiling SIMD16 or SIMD32 fragment shaders, we don't allow spilling anyway. So, why not use allow g0? Also, when trying various scheduling modes, we first try allocation without spilling. If it works, great, if not, we try a (hopefully) less aggressive schedule, and only allow spilling on the lowest-pressure schedule. So, even for regular SIMD8 shaders, we can potentially gain the use of g0 on the first few tries at scheduling+allocation. Once we try to allocate with spilling, we go back to reserving g0 for the entire program, so that we can construct scratch headers at any point. We could possibly do better here, but this is simple and reliable with some benefit. Thanks to Ian Romanick for suggesting I try this approach. fossil-db on Alchemist shows some more spill/fill improvements: Totals: Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00% Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47% Spill count: 52891 -> 52471 (-0.79%) Fill count: 101599 -> 100818 (-0.77%) Scratch Memory Size: 3292160 -> 3197952 (-2.86%) Totals from 416541 (66.59% of 625484) affected shaders: Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01% Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67% Spill count: 420 -> 0 (-inf%) Fill count: 781 -> 0 (-inf%) Scratch Memory Size: 94208 -> 0 (-inf%) Witcher 3 shows a 33% reduction in scratch memory size, for example. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:34 -07:00
Kenneth Graunke	4ca4b064cf	intel/brw: Record g0 as live for sends with send_ex_desc_scratch set brw_send_indirect_split_message() implicitly reads g0 to construct the extended message descriptor for certain send messages when this is set. Record that liveness explicitly. Thanks to Francisco Jerez for reminding me about this use of g0. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:32 -07:00
Kenneth Graunke	9200fb966c	intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0 The generator code for emitting legacy scratch headers was implicitly using g0 as a source. But the IR wasn't indicating any usage of g0, which means the liveness isn't properly tracked at the IR level. It works because we reserve g0 as permanently live for the whole program. In order to stop doing that, we need to record it properly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:31 -07:00
Kenneth Graunke	545f20419f	intel/brw: Delete fs_reg_alloc::discard_interference_graph() Unused since commit `50519598ff`. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:28 -07:00
Sushma Venkatesh Reddy	0116430d39	intel/brw: Handle 16-bit sampler return payloads API requires samplers to return 32-bit even though hardware can handle 16-bit floating point, so we detect that case and make more efficient use of memory BW. This is helping improve performance of encode and decode tokens during LLM by at least 5% across multiple platforms. Thank you Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00
Sushma Venkatesh Reddy	ddd9e043dc	intel/brw: Move get_nir_def() higher to avoid UNDEF While extending our backend to handle 16-bit sampler return payloads, we found that in piglit's arb_texture_view-rendering-formats, the SIMD8 FS was missing the sampling operation altogether. This was because we were first emitting the texturing instruction, and then calling get_nir_def(), which adds an UNDEF instruction when the destination is smaller than the 32-bit. So the texturing was dead code elimated. Fix this by calling get_nir_def() earlier. Thank you to Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00
Caio Oliveira	52be72e676	intel: Let compiler set indirect_ubos_use_sampler This option is used for Gfx < 12, elk already set it to true, so set it in brw and change the drivers to not set it anymore. Because the dual-compiler support in Iris, the helper function there had to change to consult the right compiler value instead. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30393>	2024-07-31 19:26:20 +00:00
Ian Romanick	fdb6afe71e	intel/elk: Fix undefined left shift of negative value in elk_texture_offset Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:18:08 -07:00
Ian Romanick	f3f4a057b9	intel/elk: Fix undefined left shift of large UW value in elk_imm_uw Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:18:06 -07:00
Ian Romanick	0e5ac7d6b0	intel/elk: Fix undefined left shift of negative value in update_uip_jip v2: Add comment and assertion to explain why the shift is safe. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:18:04 -07:00
Ian Romanick	c2dda8c8e7	intel/elk: Fix undefined shift by 64 of uint64_t in elk_compute_first_urb_slot_required Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:18:01 -07:00
Ian Romanick	e6669467b8	intel/brw: Fix undefined left shift of negative value in brw_texture_offset When -fsanitize=shift is used, many instances of the following are produced: src/intel/compiler/brw_fs_nir.cpp:114:30: runtime error: left shift of negative value -1 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:17:59 -07:00
Ian Romanick	4f24c2707f	intel/brw: Fix undefined left shift of large UW value in brw_imm_uw When -fsanitize=shift is used, 'ninja test' would fail in several Intel assembly tests (mul.asm and and.asm) with: src/intel/compiler/brw_reg.h:703:22: runtime error: left shift of 65532 by 16 places cannot be represented in type 'int' Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:17:56 -07:00
Ian Romanick	abb7c012ff	intel/brw: Fix undefined left shift of negative value in update_uip_jip When -fsanitize=shift is used, many instances of the following are produced: src/intel/compiler/brw_eu_compact.c:2244:50: runtime error: left shift of negative value -306 v2: Add comment and assertion to explain why the shift is safe. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:17:53 -07:00
Ian Romanick	228e049db6	intel/brw: Fix undefined shift by 64 of uint64_t in brw_compute_first_urb_slot_required When -fsanitize=shift is used, many instances of the following are produced: src/intel/compiler/brw_compiler.h:1661:44: runtime error: shift exponent 64 is too large for 64-bit type 'long long unsigned int' I think this is an actual bug. It should check the sentinel value, but the sentinel value is 64. The shift by 64 is treated as a shift by 0. The varying 0 is explicitly filtered by the rest of the if-test. How does this work? Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>	2024-07-26 17:17:15 -07:00
Sushma Venkatesh Reddy	455deacbce	intel/brw: Fix DEBUG_OPTIMIZER Due to recent regression, adding INTEL_DEBUG=optimizer is dumping shader optimization pass details to console rather than to respective files. Thank you, Kenneth W Graunke for helping me figure this out. Fixes: `17b7e49089` ("intel/brw: Move out of fs_visitor and rename print instructions") Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30389>	2024-07-26 22:22:58 +00:00
Caio Oliveira	23b0798551	intel/brw: Move interp_reg and per_primitive_reg out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	a5cc8c4807	intel/brw: Move VARYING_PULL_CONSTANT_LOAD from fs_visitor to fs_builder Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	8a39231e4f	intel/brw: Move calculate_cfg out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	b98930c770	intel/brw: Move regalloc and scheduling functions out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	5cb1f46fd1	intel/brw: Remove workgroup_size() helper from fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	17b7e49089	intel/brw: Move out of fs_visitor and rename print instructions They use the brw_print prefix now. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	bb7f2db5a2	intel/brw: Move printing functions to its own file Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	cdbee4156e	intel/brw: Reduce scope of some MESH specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	67ead4edff	intel/brw: Reduce scope of some TES specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	f9ddf51b70	intel/brw: Reduce scope of some TCS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	47b9dc9070	intel/brw: Reduce scope of some GS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	28858b3ad1	intel/brw: Reduce scope of some FS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	a8b4b9dd51	intel/brw: Reduce scope of some VS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	fdb029fe1b	intel/brw: Move and reduce scope of run_*() functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	c92b8a802e	intel/brw: Move remaining compile stages to their own files Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Matt Turner	a3714b55f4	intel/elk: Use REG_CLASS_COUNT Fixes: `d44462c08d` ("intel/elk: Fork Gfx8- compiler by copying existing code") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30314>	2024-07-25 14:55:09 +00:00
Matt Turner	5e24c21625	intel/brw: Use REG_CLASS_COUNT Fixes: `5d87f41a54` ("intel/fs/ra: Define REG_CLASS_COUNT constant specifying the number of register classes.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30314>	2024-07-25 14:55:09 +00:00
Matt Turner	aae82061af	intel/clc: Free disk_cache Fixes: `c15bf88f01` ("intel: Add a little OpenCL C compiler binary") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30313>	2024-07-24 20:46:28 +00:00
Matt Turner	1574372de4	intel/clc: Free parsed_spirv_data This declaration shadowed a variable by the same type and name in an outer scope. That variable is passed to clc_free_parsed_spirv(). Fixes: `4fd7495c69` ("intel/clc: add ability to output NIR") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30313>	2024-07-24 20:46:28 +00:00
Marek Olšák	b2d32ae246	nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag Instead of having 1 bit in nir_io_semantics indicating a per-primitive FS input, add a dedicated intrinsic for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>	2024-07-23 16:13:16 +00:00
Kenneth Graunke	c429d5025e	intel/brw: Don't force g1's live range to be the entire program The idea here was that pixel shader framebuffer writes used the g0 and g1 thread payload register values to construct the message header. However, most messages are headerless and don't use either. There's a 2012-era comment that the simulator at one point had a bug where certain headerless messages would incorrectly take the values from the g0/g1 register contents rather than using sideband. But, that was likely fixed eons ago. So we really don't need to do this. Furthermore, there are many more shader stages these days: - VS: r1 contains output URB handles - TCS: r1 contains ICP handles - TES: r1 contains gl_TessCoord.x (r4 contains output URB handles) - GS: r1 contains output URB handles - CS: r1 contains LocalID.X on DG2+ but nothing on older hardware - Task/Mesh: r1 contains LocalID.X - BS: r1 contains bindless stack handles Vertex and geometry aren't likely to benefit here because r1 is needed for their output messages, which are also what terminate the shader. TES will definitely benefit because we were making a value pointlessly live for the whole program. Same for TCS, to a lesser extent. Compute prior to DG2 was the worst, as g1 literally has no meaningful content, so there is no point to keeping it live. fossil-db on Alchemist shows substantial spill/fill improvements: Totals: Instrs: 148782351 -> 148741996 (-0.03%); split: -0.03%, +0.01% Cycles: 12602907531 -> 12605795191 (+0.02%); split: -0.70%, +0.72% Subgroup size: 7518608 -> 7518632 (+0.00%) Send messages: 7341727 -> 7341762 (+0.00%) Spill count: 54633 -> 52575 (-3.77%) Fill count: 104694 -> 100680 (-3.83%) Scratch Memory Size: 3375104 -> 3287040 (-2.61%) Totals from 301172 (48.21% of 624670) affected shaders: Instrs: 95531927 -> 95491572 (-0.04%); split: -0.05%, +0.01% Cycles: 9643531593 -> 9646419253 (+0.03%); split: -0.91%, +0.94% Subgroup size: 4492512 -> 4492536 (+0.00%) Send messages: 4399737 -> 4399772 (+0.00%) Spill count: 20034 -> 17976 (-10.27%) Fill count: 41530 -> 37516 (-9.67%) Scratch Memory Size: 1522688 -> 1434624 (-5.78%) Assassin's Creed Odyssey in particular has 20% fewer fills. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30146>	2024-07-23 02:26:52 +00:00
Caio Oliveira	8ba8e33c39	intel/brw: Simplify @file annotations Doxygen documentation says > If the file name is omitted (i.e. the line after \file is left > blank) then the documentation block that contains the \file command will > belong to the file it is located in. so we can omit the filename itself when using the annotation. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>	2024-07-22 22:48:03 +00:00
José Roberto de Souza	de5d767f9a	intel/brw: Add a maximum scratch size restriction Gfx 12.5 moved scratch to a surface and SURFTYPE_SCRATCH has this pitch restriction: RENDER_SURFACE_STATE::Surface Pitch For surfaces of type SURFTYPE_SCRATCH, valid range of pitch is: [63,262143] -> [64B, 256KB] The pitch of the surface is the scratch size per thread and the surface should be large enough to accommodate every physical thread. So here adding a new field to intel_device_info, setting it in intel_device_info_init_common() so even offline tools can have it set. And finally adding a check to fail shader compilation if needed scratch is larger than supported. This issue can be reproduced in debug builds when running dEQP-VK.protected_memory.stack.stacksize_1024 on Gfx 12.5 or newer platforms. Ref: BSpec 43862 (r52666) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30271>	2024-07-22 18:17:38 +00:00
Francisco Jerez	b98eebbcb2	intel/brw: Implement null push constant workaround. This implements an undocumented workaround for a hardware bug that affects draw calls with a pixel shader that has 0 push constant cycles when TBIMR is enabled, which has been seen to lead to a hang with Fallout 3 and Metal Gear Rising Revengeance. This hardware bug has been reported as HSDES#22020184996 which is still pending a resolution by the hardware team. However since this workaround found empirically has been confirmed to fix the issue reliably and it's relatively harmless it seems worth checking in already even though no final W/A number is available nor has the W/A json file been updated. To avoid the issue we simply pad the push constant payload to be at least 1 register. This is enabled via a brw_wm_prog_key since the driver needs to be in agreement with the compiler on whether the dummy push constant cycle is present, and it can be avoided in cases where the driver knows that TBIMR will be disabled (e.g. for BLORP). Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10728 Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11399 Fixes: `57decad976` ("intel/xehp: Enable TBIMR by default.") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30031>	2024-07-20 01:13:19 +00:00

1 2 3 4 5 ...

3754 commits