fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 09:10:11 +01:00

Author	SHA1	Message	Date
Caio Oliveira	a0ea2a656f	intel/brw: Enable EU validation and compaction tests for Xe2 A few EU validation tests had to be updated to account for larger GRF, extra supported types for 3-src instructions and the lack of AccWrEnable in Xe2. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31299>	2024-10-01 16:03:35 -07:00
Caio Oliveira	8b1c5425a9	intel/brw: Update DPAS validation tests for Xe2 The main change is that in Xe2 DPAS instruction requires SIMD16. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31299>	2024-10-01 16:03:35 -07:00
Caio Oliveira	b4acc3fc42	intel/brw: Remove Gfx8- from test_eu_validate.c These tests only run for Gfx9+. Acked-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31272>	2024-10-01 21:16:54 +00:00
Sviatoslav Peleshko	57344052b6	intel/brw: Don't apply discard_if condition opt if it can change results We can't just always negate the alu instruction's cmod, because negating it can produce different results when the argument is NaN float. We can still do that if the condition is == or !=. Fixes: `0ba9497e` ("intel/fs: Improve discard_if code generation") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11800 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31042>	2024-09-27 11:52:27 +00:00
Caio Oliveira	93c3780bc1	intel/brw: Skip per-primitive inputs when computing flat input mask The per-primitive have their own separate section in the FS thread payload, and are not considered when setting the mask in 3STATE_SBE's ConstantInterpolationEnable. This is also consistent with what is done for brw_interp_reg(). Fixes - dEQP-VK.mesh_shader.ext.misc.clip_geom_provoking_last - dEQP-VK.mesh_shader.ext.misc.clip_geom_and_task_shader_provoking_last Backport-to: 24.2 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11844 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31417>	2024-09-27 08:15:18 +00:00
Caio Oliveira	2455e2765a	intel/brw: Add DUMP flag to brw_assemble Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31305>	2024-09-27 02:46:28 +00:00
Caio Oliveira	28ef0de250	intel/brw: Add SWSB MATH pipe to assembler Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31336>	2024-09-26 20:40:28 +00:00
Caio Oliveira	d12950539c	intel/brw: Consider pipe when comparing SWSB in tests When tests were added, there was a single pipe (float), so there wasn't a pipe to compare in `operator==`. Add it there now and adjust expectations accordingly. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31335>	2024-09-25 19:32:31 +00:00
Lionel Landwerlin	2193d87277	brw: remove EOT handling from sampler messages Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31307>	2024-09-25 10:22:40 +00:00
Lionel Landwerlin	2ed4af057a	brw: fix mask componentation for 16-bit sampler returns We can't use register counts since 16-bit sampler loads in SIMD8 will only write back half a GRF. Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com> Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31307>	2024-09-25 10:22:40 +00:00
Lionel Landwerlin	eeb5f6e8c8	brw: make sampler message emission more generic We can generalize the simd8-16bits case by just rounding to a physical register. We also take the opportunity to limit the register allocation to a single physical GRF for the residency data. Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com> Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31307>	2024-09-25 10:22:40 +00:00
Sagar Ghuge	7e48cbb029	intel: uncached L1 to fix memory barrier issue in RT shader In the RT shader, if there's a executeCallableEXT() in between, even though the called shader does nothing, the instructions before and after the executeCallableEXT() is not properly synced. Patch fixes: - dEQP-VK.ray_tracing_pipeline.memguarantee.inside.rgen - dEQP-VK.ray_tracing_pipeline.memguarantee.inside.chit - dEQP-VK.ray_tracing_pipeline.memguarantee.inside.miss - dEQP-VK.ray_tracing_pipeline.memguarantee.inside.call Thank to Kevin for finding out there is a load/store issue. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31201>	2024-09-24 14:33:11 +00:00
Rohan Garg	56adf42110	intel/brw: lower math op regions for Xe2+ This helps fix: - dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_3.tan_frag - dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.tan_frag Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31218>	2024-09-24 09:58:28 +00:00
Caio Oliveira	e1b74407bb	intel/brw: Only validate GRF boundary crossing restriction for GRFs Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31294>	2024-09-24 03:39:05 +00:00
Kenneth Graunke	878ae9708a	intel/brw: Don't include sync.nop in INTEL_DEBUG instruction counts In an earlier commit, I made us stop counting sync.nops in the shader statistics we use for shader-db (brw_debug_log_message) and fossil-db (stats->instructions = ...). However, I missed adjusting the printout for INTEL_DEBUG. Fixes: `1497f4e0c2` ("intel/fs: Don't include sync.nop in instruction count statistics") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31311>	2024-09-24 03:12:32 +00:00
Lionel Landwerlin	35ea8b6cd2	brw: disable null_rt only if color output does not affect other outputs We found out that some HW changes on Xe2 make the HW avoid reading the blend state if we're using the null_rt bit in the extended descriptor. Since the alpha_to_coverage bit resides in the blend state, that state is ignored and writes are going through to the depth/stencil buffers. Disable null_rt in the color outputs if the color outputs can affect other outputs (through alpha_to_coverage & omask). Fixes tests in this pattern on Xe2 : dEQP-VK.pipeline..multisample.alpha_to_coverage_no_color_attachment. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Backport-to: 24.2 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>	2024-09-23 15:56:02 +00:00
Lionel Landwerlin	b45ce7d43e	brw: move null_rt control up a layer We'll want to tune this setting based on other parameters. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Backport-to: 24.2 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>	2024-09-23 15:56:02 +00:00
Lionel Landwerlin	9b42215e0d	iris: ensure null render target for specific cases Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>	2024-09-23 15:56:02 +00:00
Lionel Landwerlin	ed64eccab0	brw: fix virtual register splitting to not go below physical register size Otherwise we can end up generating invalid assembly not following destination/source alignments requirements. Fixes the following tests: dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_4.tan_frag dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.tan_frag dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.tan_frag dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_3.tan_frag Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Backport-to: 24.2 Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31206>	2024-09-18 23:26:34 +00:00
Lionel Landwerlin	45377dc5c4	brw: fix vecN rebuilds When loading a 64bit address from the push constants, we'll load a vec2, so we need to allocate 2 GRFs and MOV each component. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11831 Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>	2024-09-17 14:22:23 +00:00
Lionel Landwerlin	c16b27f66f	brw: use a builder of the size of the physical register for uniforms Should avoid any partial write non-sense on Xe2+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>	2024-09-17 14:22:23 +00:00
Lionel Landwerlin	02b124846f	brw: fix TGM messages to use cmask lsc opcodes This is a restriction for TGM. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b55f7716` ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>	2024-09-17 09:28:58 +00:00
Lionel Landwerlin	2159e17da0	brw: remove (load\|store)_raw_intel Those are Elk specific intrinsics. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b8f264cfe4` ("intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic()") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>	2024-09-17 09:28:58 +00:00
Dylan Baker	3f3cb1e2fa	intel/elk: delete copy constructor and copy-assignment-operator To keep the rule-of-three. This points out that the implicit copy operations would be dangerous when there is an explicit constructor and destructor, since the class is holding un-managed memory. Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29667>	2024-09-16 20:31:45 +00:00
Rohan Garg	daea7e1651	intel/compiler: use the correct cache enum for loads and stores Fixes: `74efde7` ('intel/brw/xehp+: Drop redundant arguments of lsc_msg_desc*()') Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30742>	2024-09-16 15:18:31 +00:00
Rohan Garg	b99fd944e8	intel/compiler: version can never be above 11 due to the previous check Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30742>	2024-09-16 15:18:31 +00:00
Ian Romanick	447dae7c13	intel/brw: Use nir_opt_generate_bfi No shader-db changes on any Intel platform. The "regression" in SEND messages occurs because a loop containing a SEND is unrolled. v2: Move after nir_opt_algebraic. Suggested by Georg. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19787034 -> 19785933 (<.01%) instructions in affected programs: 373573 -> 372472 (-0.29%) helped: 541 / HURT: 6 total cycles in shared programs: 906012612 -> 905626304 (-0.04%) cycles in affected programs: 58456516 -> 58070208 (-0.66%) helped: 382 / HURT: 180 fossil-db: Lunar Lake Totals: Instrs: 140671401 -> 140670495 (-0.00%); split: -0.00%, +0.00% Send messages: 12891430822 -> 12891430834 (+0.00%) Loop count: 46905 -> 46904 (-0.00%) Cycle count: 21527511599 -> 21530278999 (+0.01%); split: -0.00%, +0.02% Spill count: 70728 -> 70766 (+0.05%) Fill count: 139397 -> 139254 (-0.10%); split: -0.13%, +0.02% Max live registers: 47512432 -> 47512500 (+0.00%) Totals from 355 (0.06% of 549270) affected shaders: Instrs: 878953 -> 878047 (-0.10%); split: -0.18%, +0.08% Send messages: 19289 -> 19301 (+0.06%) Loop count: 1243 -> 1242 (-0.08%) Cycle count: 1434664642 -> 1437432042 (+0.19%); split: -0.06%, +0.25% Spill count: 15826 -> 15864 (+0.24%) Fill count: 38454 -> 38311 (-0.37%); split: -0.46%, +0.08% Max live registers: 52530 -> 52598 (+0.13%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152516575 -> 152516147 (-0.00%); split: -0.00%, +0.00% Send messages: 7491001 -> 7491013 (+0.00%) Loop count: 47588 -> 47587 (-0.00%) Cycle count: 17124433133 -> 17126147156 (+0.01%); split: -0.01%, +0.02% Max live registers: 31854704 -> 31854764 (+0.00%) Totals from 402 (0.06% of 633223) affected shaders: Instrs: 839338 -> 838910 (-0.05%); split: -0.09%, +0.04% Send messages: 20203 -> 20215 (+0.06%) Loop count: 1243 -> 1242 (-0.08%) Cycle count: 1327042160 -> 1328756183 (+0.13%); split: -0.11%, +0.24% Max live registers: 33237 -> 33297 (+0.18%) Tiger Lake *** Shaders only in 'before' results are ignored: fossil-db/steam-native/wolfenstein_youngblood/b8cefe7f700304c4/fs.32/0 from 1 apps: fossil-db/steam-native/wolfenstein_youngblood Totals: Instrs: 150549467 -> 150548952 (-0.00%); split: -0.00%, +0.00% Send messages: 7495582 -> 7495594 (+0.00%) Loop count: 46605 -> 46604 (-0.00%) Cycle count: 15472381586 -> 15472247085 (-0.00%); split: -0.00%, +0.00% Spill count: 59776 -> 59775 (-0.00%) Fill count: 103475 -> 103464 (-0.01%) Scratch Memory Size: 2384896 -> 2383872 (-0.04%) Max live registers: 31760724 -> 31760787 (+0.00%) Max dispatch width: 5569928 -> 5569912 (-0.00%) Totals from 525 (0.08% of 632443) affected shaders: Instrs: 349074 -> 348559 (-0.15%); split: -0.25%, +0.11% Send messages: 24355 -> 24367 (+0.05%) Loop count: 849 -> 848 (-0.12%) Cycle count: 187080291 -> 186945790 (-0.07%); split: -0.19%, +0.12% Spill count: 483 -> 482 (-0.21%) Fill count: 1372 -> 1361 (-0.80%) Scratch Memory Size: 22528 -> 21504 (-4.55%) Max live registers: 36705 -> 36768 (+0.17%) Max dispatch width: 6272 -> 6256 (-0.26%) Ice Lake Totals: Instrs: 151804923 -> 151804396 (-0.00%); split: -0.00%, +0.00% Send messages: 7553216 -> 7553228 (+0.00%) Loop count: 46196 -> 46195 (-0.00%) Cycle count: 15099805668 -> 15099533898 (-0.00%); split: -0.00%, +0.00% Fill count: 103978 -> 103979 (+0.00%) Max live registers: 32168254 -> 32168323 (+0.00%) Totals from 527 (0.08% of 637191) affected shaders: Instrs: 347482 -> 346955 (-0.15%); split: -0.25%, +0.10% Send messages: 24586 -> 24598 (+0.05%) Loop count: 849 -> 848 (-0.12%) Cycle count: 191147758 -> 190875988 (-0.14%); split: -0.16%, +0.02% Fill count: 1392 -> 1393 (+0.07%) Max live registers: 37379 -> 37448 (+0.18%) Skylake Totals: Instrs: 140981504 -> 140980647 (-0.00%); split: -0.00%, +0.00% Cycle count: 14653477192 -> 14653249734 (-0.00%); split: -0.00%, +0.00% Fill count: 99636 -> 99637 (+0.00%) Max live registers: 31472062 -> 31472126 (+0.00%) Totals from 523 (0.08% of 626432) affected shaders: Instrs: 335551 -> 334694 (-0.26%); split: -0.26%, +0.01% Cycle count: 178047284 -> 177819826 (-0.13%); split: -0.14%, +0.02% Fill count: 1100 -> 1101 (+0.09%) Max live registers: 36734 -> 36798 (+0.17%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>	2024-09-13 00:21:00 +00:00
Kenneth Graunke	02482604e5	intel/brw: Delete old-style surface and A64 message opcodes These have now been replaced by the MEMORY_*_LOGICAL opcodes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	7090578c35	intel/brw: Switch load_ubo_uniform_block_intel over to memory intrinsics While there are many cases that turn into the *_PULL_CONSTANT_LOAD ops or push constants, this one piece was emitting surface block loads. Switch it over to use the new intrinsics to delete a bunch of code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	b55f77161d	intel/brw: Switch to emitting MEMORY__LOGICAL opcodes We introduce a new fs_nir_emit_memory_access() helper that can handle image, bindless image, SSBO, shared, global, and scratch memory, and handles loads, stores, atomics, and block loads. It translates each of these NIR intrinsics into the new MEMORY__LOGICAL intrinsics. As a result, we delete a lot of similar surface access emitter code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	3ba97176d6	intel/brw: Switch load_num_workgroups to the new memory intrinsic A simple case we handle directly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	dc4770b005	intel/brw: Lower MEMORY_OPCODE__LOGICAL to HDC messages This is more complicated. We map the MEMORY__LOGICAL opcodes to the older HDC messages: typed and untyped surface read/write/atomic (whether float or integer), DWord and Byte scattered messages, OWord block, and both A64, BTI, and stateless messages. - MEMORY_MODE_* is used to select stateless-scratch, typed, or untyped. - MEMORY_FLAG_TRANSPOSE is used to select block access. - MEMORY_BINDING_TYPE = FLAT and 64-bit address size selects A64. - Alignment and data type size select between byte/dword scattered or surface messages. While we may not be able to handle the full generality of message possibilities, we can handle everything we generate currently. The plan here is to assert/validate that we don't generate MEMORY_*_LOGICAL ops on HDC-based platforms which can't support those particular messages. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	3255c9cc49	intel/brw: Lower MEMORY_OPCODE__LOGICAL to LSC messages This is pretty straightforward, as the new MEMORY__LOGICAL opcodes are designed to match the new LSC's capabilities. The main part is constructing the message payload. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	a82e8b1c6b	intel/brw: Pretty-print memory logical opcodes The new MEMORY__LOGICAL intrinsics have a lot of control sources with a bunch of LSC_ enums (opcode, memory type, address type, address and data sizes), as well as flags, coordinate components vs. components... they unfortunately are nigh-unreadable with the default printing since there's just a string of unreadable UD immediates in some order. To fix this, we add some basic pretty-printing. If a control source is simply an enum whose value communicates the entire purpose, we print it. If it has a numeric value (i.e. alignment, or data), we add a label. For example: memory_store(16) (null):UD store shared flat addr: %2:UD coord_comps:1u align:16u d32 comps:2u data0: %3:UD memory_store(16) (null):UD store typed bti:%2+0.0<0>:UD addr: %3+0.0:D coord_comps:2u align:0u d32 comps:4u data0: %4:UD This make them much easier to read. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	2c67729386	intel/brw: Expose functions to convert LSC enums to strings We had tables for these in the disassembler already, but I'd like to use them in brw_print.cpp as well. Just wrap the tables in convenience functions we can use there. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	d5f38be713	intel/brw: Introduce new MEMORY_*_LOGICAL opcodes This is a new unified set of opcodes for memory access loosely patterned after the new LSC-style data port messages introduced on Alchemist GPUs. Rather than creating an opcode for every type of memory access, it has only three opcodes: load, store, and atomic. It has various sources to indicate the rest: - Binding type (raw pointer, pointer to surface state, or BT index) - Address size (A64, A32, A16) - Data size (bit size, number of components) - Opcode (atomic opcode, or LOAD/STORE vs. LOAD_CMASK/STORE_CMASK) - Mode (typed vs. untyped vs. shared-local vs. scratch) - Address (and its dimensionality) - Data (0 for loads, 1 for stores, 2 for atomics) - Whether we want block access Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	b8f264cfe4	intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	8a6903e50d	intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop" This is going to handle more than atomics shortly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	e8883bd40b	intel/brw: Use size_written for NoMask instructions in is_partial_write The intention of inst->is_partial_write() is that it should return true when any REG_SIZE (32B) chunk of inst's destination is written but not fully overwritten. This can be used to tell whether inst combines new data with existing data, or screens off any previous writes, so the old values are no longer required. The existing (exec_size * brw_type_size_bytes(this->dst.type) < 32) check doesn't work in a number of cases. For example, LSC block loads have exec_size == 1 and force_writemask_all set, but may write multiple full registers of data. (Currently, we only see them with exec_size 1 after logical-send-lowering, so our SHADER_OPCODE_SEND special case was covering those.) We had also special cased UNDEF. Instead, we can simply check: 1. Predication 2. !inst->dst.contiguous() 3. inst->dst.offset % REG_SIZE != 0 4. inst->size_written % REG_SIZE != 0 We had the first three already, but #4 is new. If either #3 or #4 are true, then that implies there is a REG_SIZE chunk of the destination which is written, but not entirely written, so it's a partial write. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	ab0b9b6792	intel/brw: Use NUM_BRW_OPCODES in can_omit_write() check The intention here is to detect ALU hardware instructions, but not virtual instructions that haven't been explicitly whitelisted. For some reason we had arbitrarily hardcoded 128 here, but our virtual opcodes don't start at 128. They start at NUM_BRW_OPCODES. So, use that instead. This prevents regressions later when we delete some opcodes, shifting some virtual opcodes into the 72-128 range. Cc: mesa-stable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Sviatoslav Peleshko	fa51595c7f	brw: Fix mov cmod propagation when there's int signedness mismatch If there's difference between scan_inst dest type and inst src type we should be more careful, because difference in signedness can cause incorrect results after the propagation. Updated ror-default.trace hash, as the change fixes misrendering there. Fixes: `b23432c5` ("intel/fs: Fix a cmod prop bug when the source type of a mov doesn't match the dest type of scan_inst") Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30998>	2024-09-09 22:13:08 +00:00
Lionel Landwerlin	aa494cbacf	brw: align spilling offsets to physical register sizes In commit `fe3d90aedf` ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.") we aligned the width of scratch messages to physical register sizes (32B prior to Xe2, 64B for Xe2+). But our spilling offsets are computed using the register allocations sizes which are in units of 32B. That means on Xe2, you can end up spilling a virtual register allocated at 32B (which we use for surface state computations with exec_all) and then the spilling of that register will be emitted in SIMD16, having the upper 8 lanes overwriting the next spilled register. We could potentially limit spills to SIMD8 messages on Xe2 (only writing 32B of data), but we're also unlikely to have all 32B virtual register spilled next to one another. And if not tightly packed, we would have 64B registers stored on 2 different cachelines which sounds inefficient. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `fe3d90aedf` ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.") Backport-to: 24.2 Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30983>	2024-09-04 23:05:31 +00:00
Caio Oliveira	74be809237	compiler: Allow derivative_group to be used for all stages in shader_info These will now also be used by stages that have workgroups. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30950>	2024-09-03 20:03:18 +00:00
Caio Oliveira	3f6b5ea27a	intel/brw: Use linear walk when shader requires DERIVATIVE_GROUP_LINEAR Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30955>	2024-08-30 20:24:42 +00:00
Caio Oliveira	e4f090d3a6	intel/brw: Remove special treatment for 2-src in emit() helper For Gfx9+ no 2-src instructions need sources to fixed up. Special treatment remains for 3-src instructions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30911>	2024-08-30 04:33:47 +00:00
Ian Romanick	73f365e208	intel/brw: load_offset cannot be constant on this path Literally inside an if-statement (about 26 lines before this hunk) that checks for !nir_src_is_const(instr->src[1]). No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	fef175de09	intel/brw: Enable constant propagation for a couple more logical sends This prevents some regressions later in the MR. Once load_const operations are marked as is_scalar, they will cesase to get the automatic constant propagation that occurs in try_rebuild_source. No shader-db or fossil-db changes on any Intel platform. v2: Slightly relax source restrictions on SHADER_OPCODE_UNALIGNED_OWORD_BLOCK_READ_LOGICAL. Add a comment explaining the restriction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	c6a8b382fd	intel/brw: Relax is_partial_write check in cmod propagation The is_partial_write check is too strict because it tests two separate things. It tests whether or not the instruction always writes a value (i.e., is it predicated), and it tests whether or not the instruction writes a complete register. This latter check is problematic as it perevents cmod propagation in SIMD1, and it prevents cmod propagation in SIMD8 when the destination size is 16 bits. This check is unnecessary. Cmod propagation already checks that the region written and region read overlap. It also already checks that the execution sizes of the instructions match. Further restriction based on the specific parts of the register written only generates false negatives. v2: Relax all of the calls to is_partial_write. Suggested by Caio. No shader-db changes on any Intel platform. fossil-db: Meteor Lake Totals: Instrs: 151505520 -> 151502923 (-0.00%); split: -0.00%, +0.00% Cycle count: 17201385104 -> 17194901423 (-0.04%); split: -0.06%, +0.02% Spill count: 80827 -> 80837 (+0.01%) Fill count: 152693 -> 152692 (-0.00%); split: -0.01%, +0.01% Totals from 346 (0.05% of 630198) affected shaders: Instrs: 1257205 -> 1254608 (-0.21%); split: -0.21%, +0.00% Cycle count: 5532845647 -> 5526361966 (-0.12%); split: -0.18%, +0.06% Spill count: 32903 -> 32913 (+0.03%) Fill count: 64338 -> 64337 (-0.00%); split: -0.03%, +0.03% DG2 Totals: Instrs: 151531440 -> 151528055 (-0.00%); split: -0.00%, +0.00% Cycle count: 17200238927 -> 17197996676 (-0.01%); split: -0.03%, +0.02% Spill count: 81003 -> 80971 (-0.04%); split: -0.04%, +0.00% Fill count: 152975 -> 152912 (-0.04%); split: -0.05%, +0.01% Totals from 346 (0.05% of 630198) affected shaders: Instrs: 1260363 -> 1256978 (-0.27%); split: -0.27%, +0.00% Cycle count: 5532019670 -> 5529777419 (-0.04%); split: -0.09%, +0.05% Spill count: 33046 -> 33014 (-0.10%); split: -0.11%, +0.01% Fill count: 64581 -> 64518 (-0.10%); split: -0.13%, +0.03% Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 149972324 -> 149972289 (-0.00%) Cycle count: 15566495293 -> 15565151171 (-0.01%); split: -0.01%, +0.00% Totals from 16 (0.00% of 629912) affected shaders: Instrs: 351194 -> 351159 (-0.01%) Cycle count: 3922227030 -> 3920882908 (-0.03%); split: -0.04%, +0.00% Skylake Totals: Instrs: 140787999 -> 140787983 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665614947 -> 14665515855 (-0.00%); split: -0.00%, +0.00% Spill count: 58500 -> 58501 (+0.00%) Fill count: 102097 -> 102100 (+0.00%) Totals from 16 (0.00% of 625685) affected shaders: Instrs: 343560 -> 343544 (-0.00%); split: -0.01%, +0.01% Cycle count: 3354997898 -> 3354898806 (-0.00%); split: -0.01%, +0.01% Spill count: 16864 -> 16865 (+0.01%) Fill count: 27479 -> 27482 (+0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	13332c236b	intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup I observed some ray tracing shaders where a resource_intel inside a loop was non-uniform, and some code was lowered to account for that. Eventually the loop containing the resource_intel was unrolled, and the resource_intel became uniform. For example, nir_opt_uniform_subgroup can transform something like con loop { con block b5: // preds: b4 b8 con 32 %330 = @read_first_invocation (%329) con 1 %331 = ieq %330, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %330 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless\|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } into con loop { con block b5: // preds: b4 b8 con 1 %331 = ieq %329, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %329 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless\|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } Notice that %331 is now a tautology. Running brw_nir_optimize again eliminates the loop. v2: Add a comment in the code explaining the rationale. Suggested by Ken. Update the commit message. Suggested by Caio. shader-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19733448 -> 19733330 (<.01%) instructions in affected programs: 14120 -> 14002 (-0.84%) helped: 32 / HURT: 3 total cycles in shared programs: 916254496 -> 916226288 (<.01%) cycles in affected programs: 2035116 -> 2006908 (-1.39%) helped: 19 / HURT: 13 total spills in shared programs: 5807 -> 5807 (0.00%) spills in affected programs: 26 -> 26 (0.00%) helped: 1 / HURT: 1 total fills in shared programs: 6794 -> 6792 (-0.03%) fills in affected programs: 84 -> 82 (-2.38%) helped: 1 / HURT: 1 LOST: 1 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20393084 -> 20392971 (<.01%) instructions in affected programs: 21750 -> 21637 (-0.52%) helped: 31 / HURT: 4 total cycles in shared programs: 880273065 -> 880247818 (<.01%) cycles in affected programs: 2546748 -> 2521501 (-0.99%) helped: 18 / HURT: 9 total spills in shared programs: 4628 -> 4630 (0.04%) spills in affected programs: 287 -> 289 (0.70%) helped: 1 / HURT: 2 total fills in shared programs: 5381 -> 5376 (-0.09%) fills in affected programs: 711 -> 706 (-0.70%) helped: 2 / HURT: 2 LOST: 1 GAINED: 1 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151513669 -> 151505520 (-0.01%); split: -0.01%, +0.00% Send messages: 7459339 -> 7459396 (+0.00%) Loop count: 49111 -> 47588 (-3.10%) Cycle count: 17208178205 -> 17201385104 (-0.04%); split: -0.05%, +0.01% Spill count: 80830 -> 80827 (-0.00%); split: -0.02%, +0.01% Fill count: 152754 -> 152693 (-0.04%); split: -0.04%, +0.00% Scratch Memory Size: 4136960 -> 4130816 (-0.15%) Max live registers: 32016493 -> 32015955 (-0.00%); split: -0.00%, +0.00% Totals from 672 (0.11% of 630198) affected shaders: Instrs: 1352428 -> 1344279 (-0.60%); split: -0.78%, +0.17% Send messages: 54302 -> 54359 (+0.10%) Loop count: 6124 -> 4601 (-24.87%) Cycle count: 1260266379 -> 1253473278 (-0.54%); split: -0.69%, +0.16% Spill count: 15967 -> 15964 (-0.02%); split: -0.09%, +0.08% Fill count: 36245 -> 36184 (-0.17%); split: -0.18%, +0.01% Scratch Memory Size: 740352 -> 734208 (-0.83%) Max live registers: 50699 -> 50161 (-1.06%); split: -1.45%, +0.39% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 149976046 -> 149971100 (-0.00%); split: -0.00%, +0.00% Subgroup size: 7685264 -> 7685256 (-0.00%) Cycle count: 15566401168 -> 15566405478 (+0.00%); split: -0.00%, +0.00% Spill count: 61238 -> 61240 (+0.00%) Fill count: 107301 -> 107289 (-0.01%) Max live registers: 31992969 -> 31993857 (+0.00%); split: -0.00%, +0.00% Totals from 553 (0.09% of 629912) affected shaders: Instrs: 557027 -> 552081 (-0.89%); split: -0.90%, +0.01% Subgroup size: 8648 -> 8640 (-0.09%) Cycle count: 150154496 -> 150158806 (+0.00%); split: -0.23%, +0.24% Spill count: 181 -> 183 (+1.10%) Fill count: 440 -> 428 (-2.73%) Max live registers: 33698 -> 34586 (+2.64%); split: -0.02%, +2.65% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Ian Romanick	65eb7ed5fc	intel/brw: Run intel_nir_lower_conversions only after brw_nir_optimize Without this, the next commit tiggers assertions. v2: Unconditionally do the lowering after brw_nir_optimize. Suggested by Caio. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00

1 2 3 4 5 ...

3754 commits