fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 20:00:10 +01:00

Author	SHA1	Message	Date
Sviatoslav Peleshko	2a4efe21c5	intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11928 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31746>	2024-10-23 15:02:27 +00:00
Lionel Landwerlin	97b17aa0b1	brw/nir: rework inline_data_intel to work with compute This intrinsic was initially dedicated to mesh/task shaders, but the mechanism it exposes also exists in the compute shaders on Gfx12.5+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Caio Oliveira	9537b62759	intel/brw: Add SHADER_OPCODE_REDUCE Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	bf9456753d	intel/brw: Validate some instructions exists only up until some phases Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	affa7567c2	intel/brw: Add phases to backend The general idea is to be able to validate that certain instructions were lowered and certain restrictions were already handled. Passes can now assert their expectations, i.e. if a pass is mean to run after certain lowerings or not. The actual phases are a initial stab and as we re-organized the passes, we may remove/add phases. This commit just add some phase steps, later commits will make use of them. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	2811cb2923	intel: Add statistic for Non SSA registers after NIR to BRW This is going to be useful while we convert the NIR to BRW to produce SSA definitions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	6db7d1af16	intel/compiler: Rename shader_stats structs Add the `brw_` and `elk_` prefixes to the structs to avoid compilation failure building with LTO ("violates the C++ One Definition Rule") when the structs diverge. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Kenneth Graunke	c19e5a0a75	intel/brw: Replace predicated break optimization with a simple peephole We can achieve most of what brw_fs_opt_predicated_break() does with simple peepholes at NIR -> BRW conversion time. For predicated break and continue, we can simply look at an IF ... ENDIF sequence after emitting it. If there's a single instruction between the two, and it's a BREAK or CONTINUE, then we can move the predicate from the IF onto the jump, and delete the IF/ENDIF. Because we haven't built the CFG at this stage, we only need to remove them from the linked list of instructions, which is trivial to do. For the predicated while optimization, we can rely on the fact that we already did the predicated break optimization, and simply look for a predicated BREAK just before the WHILE. If so, we move the predicate onto the WHILE, invert it, and remove the BREAK. There are a few cases where this approach does a worse job than the old one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks containing break, and nir_trivialize_registers may decide it needs to insert movs into those blocks. So, at NIR -> BRW time, we'll actually emit some MOVs there, which might have been possible to copy propagate out after later optimizations. However, the fossil-db results show that it's still pretty competitive. For instructions, 1017 shaders were helped (average -1.87 instructions), while only 62 were hurt (average +2.19 instructions). In affected shaders, it was -0.08% for instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	fad63d6483	intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass With the select peephole gone, this no longer does much of anything. No instruction changes in fossil-db on Alchemist. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	06e8335e11	intel/brw: Delete the brw_fs_opt_peephole_select() pass Now that we can handle load_ubo in NIR's peephole select pass, the backend pass isn't really useful anymore. fossil-db results on Alchemist show almost no impact: Totals: Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00% Cycles: 12633748945 -> 12633760459 (+0.00%) Totals from 261 (0.04% of 630008) affected shaders: Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14% Cycles: 23947172 -> 23958686 (+0.05%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	8bca7e520c	intel/brw: Only force g0's liveness to be the whole program if spilling We don't actually need to extend g0's live range to the EOT message generally - most messages that end a shader are headerless. The main implicit use of g0 is for constructing scratch headers. With the last two patches, we now consider scratch access that may exist in the IR and already extend the liveness appropriately. There is one remaining problem: spilling. The register allocator will create new scratch messages when spilling a register, which need to create scratch headers, which need g0. So, every new spill or fill might extend the live range of g0, which would create new interference, altering the graph. This can be problematic. However, when compiling SIMD16 or SIMD32 fragment shaders, we don't allow spilling anyway. So, why not use allow g0? Also, when trying various scheduling modes, we first try allocation without spilling. If it works, great, if not, we try a (hopefully) less aggressive schedule, and only allow spilling on the lowest-pressure schedule. So, even for regular SIMD8 shaders, we can potentially gain the use of g0 on the first few tries at scheduling+allocation. Once we try to allocate with spilling, we go back to reserving g0 for the entire program, so that we can construct scratch headers at any point. We could possibly do better here, but this is simple and reliable with some benefit. Thanks to Ian Romanick for suggesting I try this approach. fossil-db on Alchemist shows some more spill/fill improvements: Totals: Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00% Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47% Spill count: 52891 -> 52471 (-0.79%) Fill count: 101599 -> 100818 (-0.77%) Scratch Memory Size: 3292160 -> 3197952 (-2.86%) Totals from 416541 (66.59% of 625484) affected shaders: Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01% Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67% Spill count: 420 -> 0 (-inf%) Fill count: 781 -> 0 (-inf%) Scratch Memory Size: 94208 -> 0 (-inf%) Witcher 3 shows a 33% reduction in scratch memory size, for example. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:34 -07:00
Kenneth Graunke	9200fb966c	intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0 The generator code for emitting legacy scratch headers was implicitly using g0 as a source. But the IR wasn't indicating any usage of g0, which means the liveness isn't properly tracked at the IR level. It works because we reserve g0 as permanently live for the whole program. In order to stop doing that, we need to record it properly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:31 -07:00
Caio Oliveira	23b0798551	intel/brw: Move interp_reg and per_primitive_reg out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	a5cc8c4807	intel/brw: Move VARYING_PULL_CONSTANT_LOAD from fs_visitor to fs_builder Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	8a39231e4f	intel/brw: Move calculate_cfg out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	b98930c770	intel/brw: Move regalloc and scheduling functions out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	5cb1f46fd1	intel/brw: Remove workgroup_size() helper from fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	17b7e49089	intel/brw: Move out of fs_visitor and rename print instructions They use the brw_print prefix now. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	cdbee4156e	intel/brw: Reduce scope of some MESH specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	67ead4edff	intel/brw: Reduce scope of some TES specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	f9ddf51b70	intel/brw: Reduce scope of some TCS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	47b9dc9070	intel/brw: Reduce scope of some GS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	28858b3ad1	intel/brw: Reduce scope of some FS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	a8b4b9dd51	intel/brw: Reduce scope of some VS specific functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	fdb029fe1b	intel/brw: Move and reduce scope of run_*() functions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	d00329e821	intel/brw: Replace some fs_reg constructors with functions Create three helper functions for ATTR, UNIFORM and VGRF creation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Sagar Ghuge	99ce8b5a07	intel/compiler: Add indirect mov lowering pass Indirect addressing(vx1 and vxh) not supported with UB/B datatype for src0, so we need to change the data type for both dest and src0. This fixes following tests cases on Xe2+ - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_16* - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_32* Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>	2024-07-01 19:06:31 +00:00
Kenneth Graunke	1e69ec3b8d	intel/brw: Add a lower_csel pass and allow building it for all types We can do CSEL on F, HF, W, and D on Gfx11+. Gfx9 can only do F. We can lower unsupported types to CMP+CSEL, allowing us to use CSEL in the IR and not worry about the limitations. Rework: (Sagar) - Update validation pass for CSEL Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>	2024-07-01 19:06:31 +00:00
Francisco Jerez	95eec5a0dd	intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel offset. This implements a replacement for the previous implementation of nir_intrinsic_load_barycentric_at_offset that relied on the Pixel Interpolator shared function, since it's going to be removed from the hardware from Xe2 onwards. That's okay since we can get all the primitive setup information needed for interpolation at an arbitrary coordinate: We use the X/Y offset relative to the "X/Y Start" coordinates from the thread payload order to evaluate the plane equations also provided in the thread payload for each barycentric coordinate of each polygon. The evaluation of the barycentric plane equations (and the RHW plane equation for perspective-correct interpolation) uses the accumulator and MAD/MAC for ALU efficiency, but that means we need to manually split instructions to fit the width of the accumulator. The division and scaling for perspective-correct interpolation is also now done in the shader if necessary. Note that even though this is only immediately useful on Xe2+, the thread payload numbers are filled out for older platforms, and the EU restrictions of previous Xe platforms are taken into account, mostly for the purposes of testing and performance evaluation. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Caio Oliveira	b59ea3d63f	intel/brw: Print SWSB information when dumping instructions These were only being shown before as part of disassemble. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29738>	2024-06-23 08:09:56 -07:00
Kenneth Graunke	580e1c592d	intel/brw: Introduce a new SSA-based copy propagation pass (Quite a few of the restrictions here are ported from the old pass.) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	9690bd369d	intel/brw: Delete old local common subexpression elimination pass We no longer use this older pass, so there's no need to keep it. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	234c45c929	intel/brw: Write a new global CSE pass that works on defs This has a number of advantages compared to the pass I wrote years ago: - It can easily perform either Global CSE or block-local CSE, without needing to roll any dataflow analysis, thanks to SSA def analysis. This global CSE is able to detect and coalesce memory loads across blocks. Although it may increase spilling a little, the reduction in memory loads seems to more than compensate. - Because SSA guarantees that values are never written more than once, the new CSE pass can directly reuse an existing value. The old pass emitted copies at the point where it discovered a value because it had no idea whether it'd be mutated later. This led it to generate a ton of trash for copy propagation to clean up later, and also a nasty fragility where CSE, register coalescing, and copy propagation could all fight one another by generating and cleaning up copies, leading to infinite optimization loops unless we were really careful. Generating less trash improves our CPU efficiency. - It uses hash tables like nir_instr_set and nir_opt_cse, instead of linearly walking lists and comparing each element. This is much more CPU efficient. - It doesn't use liveness analysis, which is one of the most expensive analysis passes that we have. Def analysis is cheaper. In addition to CSE'ing SSA values, we continue to handle flag writes, as this is a huge source of CSE'able values. These remain block local. However, we can simply track the last flag write, rather than creating entire sets of instruction entries like the old pass. Much simpler. The only real downside to this pass is that, because the backend is currently only partially SSA, it has limited visibility and isn't able to see all values. However, the results appear to be good enough that the new pass can effectively replace the old pass in almost all cases. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	2b30b3bbd4	intel/brw: Print defs in dump_instructions Like NIR, we print SSA defs as %1, %2, and so on. The number here is the VGRF number. VGRFs that don't correspond to a SSA def remain printed as vgrf1, vgrf2, and so on. This makes it much easier to see what values are SSA and which aren't. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Caio Oliveira	08da7edc0e	intel/brw: Track the number of uses of each def in def_analysis Even without a full use list, simply tracking the number of uses will let us tell "this is the only use of the def" or "we've just replaced all uses of a def". It's inexpensive to calculate and will be useful. (rebased by Kenneth Graunke) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	0d144821f0	intel/brw: Add a new def analysis pass This introduces a new analysis pass that opportunistically looks for VGRFs which happen to satisfy the SSA definition properties. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Kenneth Graunke	84219892ad	intel/brw: Make gl_SubgroupInvocation lane index loading SSA Our code to initialize gl_SubgroupInvocation uses multiple instructions some of which are partial writes. This makes it difficult to analyze expressions involving gl_SubgroupInvocation, which appear very frequently in compute shaders. To make this easier, we add a new virtual opcode which initializes a full VGRF to the value of gl_SubgroupInvocation. (We also expand it to UD for SIMD8 so there are not partial write issues.) We then lower it to the original code later on in compilation, after we've done the bulk of our optimizations. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>	2024-06-18 09:02:25 +00:00
Lionel Landwerlin	1bbe2d9833	intel/brw: fixup wm_prog_data_barycentric_modes() Always select sample barycentric when persample dispatch is unknown at compile time and let the payload adjustments feed the expected value based on dispatch. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>	2024-04-26 05:13:02 +00:00
Kenneth Graunke	f523bfcf90	intel/brw: Reindent after shortening BRW_REGISTER_TYPE_* to BRW_TYPE_* Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Caio Oliveira	ff89e83178	intel/brw: Lower VGRFs to FIXED_GRFs earlier Moves the lowering of VGRFs into FIXED_GRFs from the code generation to (almost) right after the register allocation. This will allow: (1) later passes not worry about VGRFs (and what they mean in a post reg alloc phase) and (2) make easier to add certain types of validation post reg alloc phase using the backend IR. Note that a couple of passes still take advantage of seeing "allocated VGRFs", so perform lowering after they run. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>	2024-04-23 23:17:57 +00:00
Caio Oliveira	13093ceb3c	intel/brw: Move validate out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>	2024-04-22 13:38:41 -07:00
Kenneth Graunke	d5b8cec7a2	intel/brw: Replace FS_OPCODE_LINTERP with BRW_OPCODE_PLN We no longer support the old LINE+MAC lowering, and we already lower this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always corresponds the PLN. The only catch is that LINTERP's operands are reversed from PLN, so we have to switch them. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Kenneth Graunke	12b0e03bd2	intel/brw: Use SHADER_OPCODE_SEND for coherent framebuffer reads We already have a logical opcode and lower to what is basically a send instruction. We just weren't using SHADER_OPCODE_SEND, instead having extra redundant infrastructure for no real gain. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Kenneth Graunke	217d56e9b1	intel/brw: Delete fs_visitor::vgrf helper Just use fs_builder::vgrf instead of the older glsl_type-based one. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>	2024-04-16 02:14:49 +00:00
Francisco Jerez	6427f16074	intel/brw/gfx12: Setup PS thread payload registers required for ALU-based pixel interpolation. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Kenneth Graunke	ea423aba1b	intel/brw: Split out 64-bit lowering from algebraic optimizations We don't necessarily want to split up MOVs for 64-bit addresses into 2x 32-bit MOVs right away, as this makes things like copy propagating the whole address around harder. We should do this late, once, while still doing other algebraic optimizations earlier. fossil-db results for Alchemist show tiny improvements: Totals: Instrs: 161310502 -> 161310436 (-0.00%); split: -0.00%, +0.00% Cycles: 14370605606 -> 14370605159 (-0.00%); split: -0.00%, +0.00% Totals from 33 (0.01% of 652298) affected shaders: Instrs: 15053 -> 14987 (-0.44%); split: -0.64%, +0.20% Cycles: 196947 -> 196500 (-0.23%); split: -0.25%, +0.02% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>	2024-03-20 01:04:17 -07:00
Kenneth Graunke	97bf3d3b2d	intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND There's no need for special handling here, it's just a send message with a trivial g0 header and descriptor. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>	2024-03-05 11:16:20 +00:00
Rohan Garg	73d98848fa	intel/compiler: Xe2+ can do URB load/store with a byte offset Thanks to Ken for suggesting this URB refactoring change and pointing out that the LSC can operate on the byte offset granularity. This should fix the geometry shader test cases where we have more than 32 vertices since previously we were failing to write the correct control data bits because of incorrect write mask. Shader-db results for Xe2: total instructions in shared programs: 153475 -> 153437 (-0.02%) instructions in affected programs: 1374 -> 1336 (-2.77%) helped: 11 HURT: 0 helped stats (abs) min: 3 max: 5 x̄: 3.45 x̃: 3 helped stats (rel) min: 1.67% max: 4.92% x̄: 3.23% x̃: 2.70% 95% mean confidence interval for instructions value: -3.92 -2.99 95% mean confidence interval for instructions %-change: -4.10% -2.36% Instructions are helped. total loops in shared programs: 140 -> 140 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 16002649 -> 16002329 (<.01%) cycles in affected programs: 9174 -> 8854 (-3.49%) helped: 11 HURT: 0 helped stats (abs) min: 22 max: 38 x̄: 29.09 x̃: 32 helped stats (rel) min: 2.62% max: 5.54% x̄: 3.78% x̃: 3.85% 95% mean confidence interval for cycles value: -33.56 -24.62 95% mean confidence interval for cycles %-change: -4.48% -3.08% Cycles are helped. total spills in shared programs: 52 -> 52 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 94 -> 94 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 total sends in shared programs: 4240 -> 4240 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Rework: (Sagar) - Adjust offset/indirect offset calculation. - Add shader-db results - Always calculate dword index - Drop changes for indirect writes Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27602>	2024-03-01 16:11:30 +00:00

1 2 3 4 5 ...

340 commits