fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 09:20:12 +01:00

Author	SHA1	Message	Date
Ian Romanick	341e5117ec	brw/nir: Treat load_const as convergent opt_combine_constants goes to great effort to pack 8 constants into a single register, this can't have much effect. There is a lot of fossil-db variation among platforms, but the results are generally positive. v2: Fix for Xe2. shader-db: Lunar Lake total instructions in shared programs: 18095100 -> 18092845 (-0.01%) instructions in affected programs: 158931 -> 156676 (-1.42%) helped: 423 / HURT: 0 total cycles in shared programs: 921523326 -> 921522784 (<.01%) cycles in affected programs: 7522774 -> 7522232 (<.01%) helped: 225 / HURT: 228 LOST: 1 GAINED: 7 Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19820211 -> 19820303 (<.01%) instructions in affected programs: 53087 -> 53179 (0.17%) helped: 135 / HURT: 1 total cycles in shared programs: 906380523 -> 906383031 (<.01%) cycles in affected programs: 1402315 -> 1404823 (0.18%) helped: 156 / HURT: 100 LOST: 1 GAINED: 16 fossil-db: Lunar Lake Totals: Instrs: 141876801 -> 141783010 (-0.07%); split: -0.07%, +0.00% Subgroup size: 10994624 -> 10994704 (+0.00%) Cycle count: 22173441950 -> 22172949188 (-0.00%); split: -0.01%, +0.01% Spill count: 69850 -> 69890 (+0.06%); split: -0.00%, +0.06% Fill count: 129285 -> 128877 (-0.32%) Max live registers: 48047900 -> 48043650 (-0.01%); split: -0.01%, +0.00% Totals from 29837 (5.41% of 551396) affected shaders: Instrs: 7842512 -> 7748721 (-1.20%); split: -1.23%, +0.03% Subgroup size: 940320 -> 940400 (+0.01%) Cycle count: 3444846368 -> 3444353606 (-0.01%); split: -0.09%, +0.08% Spill count: 23358 -> 23398 (+0.17%); split: -0.01%, +0.18% Fill count: 52296 -> 51888 (-0.78%) Max live registers: 3183481 -> 3179231 (-0.13%); split: -0.16%, +0.03% Meteor Lake Totals: Instrs: 152709353 -> 152666543 (-0.03%); split: -0.03%, +0.00% Cycle count: 17397176906 -> 17397668904 (+0.00%); split: -0.00%, +0.01% Fill count: 147896 -> 147893 (-0.00%) Max live registers: 31862891 -> 31861888 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04% Totals from 20913 (3.30% of 633046) affected shaders: Instrs: 6676676 -> 6633866 (-0.64%); split: -0.64%, +0.00% Cycle count: 1498330125 -> 1498822123 (+0.03%); split: -0.06%, +0.09% Fill count: 41010 -> 41007 (-0.01%) Max live registers: 1799295 -> 1798292 (-0.06%); split: -0.06%, +0.00% Max dispatch width: 12880 -> 14992 (+16.40%); split: +33.29%, -16.89% DG2 and Tiger Lake had similar results. (DG2 shown) Totals: Instrs: 152730878 -> 152688139 (-0.03%); split: -0.03%, +0.00% Cycle count: 17394835605 -> 17394179808 (-0.00%); split: -0.01%, +0.00% Max live registers: 31862843 -> 31861840 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04% Totals from 20912 (3.30% of 633046) affected shaders: Instrs: 6563021 -> 6520282 (-0.65%); split: -0.65%, +0.00% Cycle count: 1201999616 -> 1201343819 (-0.05%); split: -0.08%, +0.03% Max live registers: 1798392 -> 1797389 (-0.06%); split: -0.06%, +0.00% Max dispatch width: 12872 -> 14984 (+16.41%); split: +33.31%, -16.90% Ice Lake Totals: Instrs: 151914872 -> 151868108 (-0.03%) Cycle count: 15262958696 -> 15262665082 (-0.00%); split: -0.00%, +0.00% Max live registers: 32194225 -> 32193192 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5650880 -> 5650608 (-0.00%); split: +0.02%, -0.03% Totals from 22192 (3.48% of 637223) affected shaders: Instrs: 6419739 -> 6372975 (-0.73%) Cycle count: 184733818 -> 184440204 (-0.16%); split: -0.36%, +0.20% Max live registers: 1989950 -> 1988917 (-0.05%); split: -0.05%, +0.00% Max dispatch width: 5744 -> 5472 (-4.74%); split: +23.40%, -28.13% Skylake Totals: Instrs: 141027379 -> 140811741 (-0.15%) Cycle count: 14817704293 -> 14817418611 (-0.00%); split: -0.01%, +0.01% Max live registers: 31628796 -> 31627791 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5535176 -> 5539880 (+0.08%); split: +0.14%, -0.06% Totals from 22218 (3.53% of 628840) affected shaders: Instrs: 5944856 -> 5729218 (-3.63%) Cycle count: 182845101 -> 182559419 (-0.16%); split: -0.60%, +0.44% Max live registers: 1974576 -> 1973571 (-0.05%); split: -0.07%, +0.02% Max dispatch width: 16912 -> 21616 (+27.81%); split: +46.93%, -19.11% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	5ea9ed4798	brw/nir: Prepare try_rebuild_source for scalar values Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	d5d7ae22ae	brw/nir: Fix up handling of sources that might be convergent vectors Sources that are scalars (almost all source) and convergent generally want <0,1,0> source stride. Sources that are vectors (e.g., texture coordinates, SSBO write data, etc.) and convergent want no extra strides applied. In nearly all cases LOAD_PAYLOAD lowering will do the right thing. v2: Use VEC in emit_pixel_interpolater_send. Suggested by Ken. v3: With the elimination of offset_to_component(), offset() may not convert an is_scalar source to have a zero stride. Explicitly do this in get_nir_src and prepare_alu_destination_and_sources. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Caio Oliveira	93dfe504f2	intel/brw: Add SHADER_OPCODE_READ_FROM_CHANNEL and LIVE_CHANNEL Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32412>	2024-12-14 11:38:14 -08:00
Paulo Zanoni	0dc2a5808e	brw: don't forget the base when emitting SHADER_OPCODE_MOV_RELOC_IMM The last argument seems to be used as brw_shader_reloc::delta (from brw_add_reloc), and we're unconditionally setting it to 0 here, while the other place where we handle nir_intrinsic_load_reloc_const_intel seems to be setting the base appropriately. I found this by inspection while debugging a bug related to this code, so I'm not aware of any workloads that get improved by this patch. Related patches: - `ecbec25e84` ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic") - `99047451c9` ("intel/fs: add plumbing for embedded samplers") Fixes: `ecbec25e84` ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32531>	2024-12-09 15:45:49 +00:00
Sagar Ghuge	9afb0480c4	intel/compiler: Extend nir_intrinsic_load_topology_id_intel for xe3 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32426>	2024-12-04 19:20:51 +00:00
Kenneth Graunke	01680a66a9	brw: Simplify choose_oword_block_size_dwords() Just calculate the block size using util_logbase2() - it's simpler. Also drop the name "oword" as this refers to legacy HDC messages, rather than the newer LSC "vector size" field. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	e703ff5e02	brw: Only consider components read for UBO loads This will matter more with overfetching, where we may suggest loading additional data that we don't actually need for vectorization purposes. We want to make sure that push ranges have the data we actually need; any extra padding is irrelevant. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	8c795af0b8	brw: Drop a few crocus references in comments crocus no longer uses brw. It uses elk. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:32 +00:00
Lionel Landwerlin	ba3ff8b3bb	brw: move barycentric_mode enum to intel_shader_enums.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Lionel Landwerlin	bfcb9bf276	brw: rename brw_sometimes to intel_sometimes Moving it to intel_shader_enums.h The plan is to make it visible to OpenCL shaders. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Caio Oliveira	8474dc853d	intel/brw: Add SHADER_OPCODE_QUAD_SWAP For the horizontal, vertical and diagonal variants. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31053>	2024-11-22 00:27:01 +00:00
Caio Oliveira	2bd7592b0b	intel/brw: Add SHADER_OPCODE_BALLOT Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31052>	2024-11-21 19:32:59 +00:00
Kenneth Graunke	5848035443	brw: Fix try_rebuild_source's ult32/ushr handling to use unsigned types We were accidentally doing a signed integer comparison here for ult32, or a sign-extending shift for ushr. One notable bit of fallout was that load_global_uniform_block_intel address calculations broke on platforms that don't have native 64-bit integer support, as the iadd64 lowering for "do I need to carry?" was using ult32...and performing the wrong comparison. We spotted this in Borderlands 3 on Alchemist once we turned on other optimizations. Thanks to Lionel Landwerlin for helping spot the problem! Fixes: `c7b312ad45` ("brw: factor out source extraction for rematerialization") Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>	2024-11-18 12:55:47 +00:00
Ian Romanick	2a57568ebd	brw/build: Add scalar_group() helper Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems different, so I decided to keep it. The next commit wants to use this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Caio Oliveira	019770f026	intel/brw: Add SHADER_OPCODE_VOTE_* Add opcodes for VOTE_ALL, VOTE_ANY and VOTE_EQUAL. The first two are also used for the quad variants. Move their lowering from NIR conversion to brw_lower_subgroup_ops. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>	2024-10-19 02:44:20 +00:00
Caio Oliveira	d97381efd8	intel/brw: Add fs_builder::BROADCAST() helper Include in the helper which already take care of using exec_all() and taking the first component of the result. Both are expected by SHADER_OPCODE_BROADCAST. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>	2024-10-19 02:44:20 +00:00
Lionel Landwerlin	97b17aa0b1	brw/nir: rework inline_data_intel to work with compute This intrinsic was initially dedicated to mesh/task shaders, but the mechanism it exposes also exists in the compute shaders on Gfx12.5+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Lionel Landwerlin	b2c5ca0ade	brw: remove rebuild single element special case No shader-db difference on DG2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Lionel Landwerlin	19eb601cfc	brw: avoid clashing nested loop indices Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Kenneth Graunke	dea61b7399	intel/brw: Fix register and builder size in emit_barrier() for Xe2 We were manually allocating 1 REG_SIZE for the barrier payload, which is only half a register on Xe2. This should eventually get allocated to a whole register anyway, but it's awkward in the meantime. Also, we were zero-initializing the header using group(8, 0) which only initialized half the register. The rest of the fields are Reserved MBZ, so they're likely unused and unread anyway - but it's better to zero-initialize them so we don't get random undefined, miserable-to-debug behavior. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>	2024-10-15 18:14:37 +00:00
Kenneth Graunke	7c9eb8b289	intel/brw: Make a ubld temporary in emit_barrier() Saves typing .exec_all() in a lot of places. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>	2024-10-15 18:14:37 +00:00
Kenneth Graunke	a9d9488788	intel/brw: Delete Gfx7-8 code from emit_barrier() Those are supported by elk, not brw. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>	2024-10-15 18:14:37 +00:00
Kenneth Graunke	c747c1e1f4	intel/brw: Fix spill/fill count for load/store_scratch in SIMD32 Honestly, I don't know what I was thinking - we are emitting a single spill/fill message here, but were counting it as 2 spill/fills in SIMD32 shaders. So our eventual shader stat reporting would subtract the number of spills and fills from send_count, and get a negative number, wrapping around to just shy of UINT32_MAX. That's way too many sends. This is especially noticable on Xe2 which often uses SIMD32 shaders. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>	2024-10-15 18:14:37 +00:00
Caio Oliveira	0ba1159b0a	intel/brw: Add SHADER_OPCODE_*_SCAN Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	9537b62759	intel/brw: Add SHADER_OPCODE_REDUCE Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	affa7567c2	intel/brw: Add phases to backend The general idea is to be able to validate that certain instructions were lowered and certain restrictions were already handled. Passes can now assert their expectations, i.e. if a pass is mean to run after certain lowerings or not. The actual phases are a initial stab and as we re-organized the passes, we may remove/add phases. This commit just add some phase steps, later commits will make use of them. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Sviatoslav Peleshko	57344052b6	intel/brw: Don't apply discard_if condition opt if it can change results We can't just always negate the alu instruction's cmod, because negating it can produce different results when the argument is NaN float. We can still do that if the condition is == or !=. Fixes: `0ba9497e` ("intel/fs: Improve discard_if code generation") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11800 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31042>	2024-09-27 11:52:27 +00:00
Lionel Landwerlin	eeb5f6e8c8	brw: make sampler message emission more generic We can generalize the simd8-16bits case by just rounding to a physical register. We also take the opportunity to limit the register allocation to a single physical GRF for the residency data. Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com> Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31307>	2024-09-25 10:22:40 +00:00
Lionel Landwerlin	45377dc5c4	brw: fix vecN rebuilds When loading a 64bit address from the push constants, we'll load a vec2, so we need to allocate 2 GRFs and MOV each component. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11831 Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>	2024-09-17 14:22:23 +00:00
Lionel Landwerlin	c16b27f66f	brw: use a builder of the size of the physical register for uniforms Should avoid any partial write non-sense on Xe2+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>	2024-09-17 14:22:23 +00:00
Kenneth Graunke	7090578c35	intel/brw: Switch load_ubo_uniform_block_intel over to memory intrinsics While there are many cases that turn into the *_PULL_CONSTANT_LOAD ops or push constants, this one piece was emitting surface block loads. Switch it over to use the new intrinsics to delete a bunch of code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	b55f77161d	intel/brw: Switch to emitting MEMORY__LOGICAL opcodes We introduce a new fs_nir_emit_memory_access() helper that can handle image, bindless image, SSBO, shared, global, and scratch memory, and handles loads, stores, atomics, and block loads. It translates each of these NIR intrinsics into the new MEMORY__LOGICAL intrinsics. As a result, we delete a lot of similar surface access emitter code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	3ba97176d6	intel/brw: Switch load_num_workgroups to the new memory intrinsic A simple case we handle directly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	8a6903e50d	intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop" This is going to handle more than atomics shortly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Ian Romanick	73f365e208	intel/brw: load_offset cannot be constant on this path Literally inside an if-statement (about 26 lines before this hunk) that checks for !nir_src_is_const(instr->src[1]). No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>	2024-08-30 03:39:31 +00:00
Caio Oliveira	695f5314d6	intel/brw: Simplify fs_inst annotation When INTEL_DEBUG=ann is also set, the disassembler would annotate the output with either a string or the string verison of a NIR instruction. This was done by keeping two pointers (but only using one at a time). Change the code to print the instruction into a string instead of keeping it pointer around (peg the string to the shader). That way, only one pointer is needed for annotations. Because that serialization is not free, only do that when the environment variable is set. Since we are here, move the annotation string field to the end, moving it to the least commonly used cacheline. Further packing might allow the entire fs_inst to fit in two cachelines. For release builds, don't even add the debug annotation to the struct. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30822>	2024-08-28 03:59:50 +00:00
Caio Oliveira	31dfb04fd3	intel/brw: Remove long register file names The long names were originally meant to map to the HW encoding but nowadays the actual encoding values depend on gfx version, whether instruction is 3src, etc. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	d31c8bfb6f	intel/brw: Remove more uses of variable length arrays In these cases there's a clear bound we can use. In C++ this is a compiler extension and not compatible with zero initializing a regular struct -- which will happen in a later change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	86c20e2910	intel/brw: Use a helper for common VEC pattern In the helper, instead of using the Variable Length Array, use a fixed size array to NIR_MAX_VEC_COMPONENTS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:14 +00:00
Caio Oliveira	abc535a3b4	intel/brw: Remove unused variable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>	2024-08-25 22:08:13 +00:00
Francisco Jerez	71ca8529c5	intel/brw/gfx12.5+: Fix IR of sub-dword atomic LSC operations. We were currently emitting logical atomic instructions with a packed destination region for sub-dword LSC atomics, along the lines of: > untyped_atomic_logical(32) dst<1>:HF, ... However, these instructions use an LSC data size D16U32, which means that the 16b data on the return payload is expanded to 32b by the LSC shared function, so we were lying to the compiler about the location of the individual channels on the return payload, its execution masking, etc. This is why the hacks that manually set the 'inst->size_written' of the instruction were required. In some cases this worked, but any non-trivial manipulation of the instruction destination by lowering or optimization passes could have led to corruption, as has been reproduced in deqp-vk during lower_simd_width() for shaders that use 16-bit atomics in SIMD32 dispatch mode. Note that LSC sub-dword reads aren't affected by this because they use raw UD destinations and specify the actual bit size of the operation datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't work for atomic operations since that immediate specifies the atomic opcode. Instead, have the logical operation implement the behavior of 16-bit destinations correctly instead of silently replacing the 16-bit region with an inconsistent 32-bit region -- This is done by emitting the MOV instructions used to pack the data from the UD temporary into the packed destination from the lower_logical_sends() pass instead of from the NIR translation pass. Fixes: `43169dbbe5` ("intel/compiler: Support 16 bit float ops") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>	2024-08-21 02:33:12 +00:00
Sagar Ghuge	c4f2a8d984	intel/compiler: Fix indirect offset in GS input read for Xe2+ Make sure to take new GRF size into consideration and adjust the indirect offset according to new size so that when we do the indirect load with address register, we load right values. This helps pass the following tests: - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.geom - dEQP-VK.ray_query.geometry_shader. Backport-to: 24.2 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>	2024-08-16 18:40:13 +00:00
Ian Romanick	c8038643b8	intel/brw: Make ifind_msb SSA friendly No shader-db changes on any Intel platform. v2: Use negate(tmp) instead of creating a new temporary. Suggested by Ken. fossil-db: Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00% Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06% Totals from 40 (0.01% of 633223) affected shaders: Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00% Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24% Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00% Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11% Spill count: 59795 -> 59794 (-0.00%) Fill count: 103513 -> 103509 (-0.00%) Totals from 40 (0.01% of 632445) affected shaders: Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00% Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43% Spill count: 16896 -> 16895 (-0.01%) Fill count: 27819 -> 27815 (-0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Ian Romanick	e9c151fde6	intel/brw: Make 16-bit ishl, ishr, and ushr SSA friendly No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 152536266 -> 152535897 (-0.00%); split: -0.00%, +0.00% Cycle count: 17124901233 -> 17112329592 (-0.07%); split: -0.07%, +0.00% Spill count: 78571 -> 78525 (-0.06%) Fill count: 148178 -> 148132 (-0.03%) Totals from 210 (0.03% of 633223) affected shaders: Instrs: 514525 -> 514156 (-0.07%); split: -0.16%, +0.08% Cycle count: 4003540698 -> 3990969057 (-0.31%); split: -0.32%, +0.00% Spill count: 15632 -> 15586 (-0.29%) Fill count: 26241 -> 26195 (-0.18%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Lionel Landwerlin	fbafa9cabd	intel/nir: remove load_global_const_block_intel intrinsic load_global_constant_uniform_block_intel is equivalent in terms of loading, then for the predicate we just do a bcsel afterward in places where that is required. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>	2024-08-16 11:12:39 +00:00
Sagar Ghuge	c3c62e493f	intel/compiler: Ray query requires write-back register Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable "When this bit is set in the header, Trace Ray Message behaves like a Ray Query. This message requires a write-back message indicating RayQuery for all valid Rays (SIMD lanes) have completed." If we don't pass the write-back register, somehow it was stepping on over R0 register and can mess up the scratch space accesses which could potentially lead to GPU hang. It can be noticed while running it under simulator trace. send.rta (16\|M0) null r124 r126:1 0x0 0x02000100 {$15} // wr:1+1, rd:0; simd16 trace ray R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig	eec02246f8	brw: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>	2024-08-09 17:07:59 +00:00
Kenneth Graunke	b6f4f64b43	intel/brw: Drop image_{load,store}_raw_intel handling Gfx8 required us to emulate image load store with untyped messages, whereas Gfx9 just has typed message support for everything. brw no longer supports Gfx8, so all of this code is effectively dead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>	2024-08-09 07:20:08 +00:00
Caio Oliveira	2e2b83f72d	intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION Instead of emitting a single one at the top, and making reference to it, emit the virtual instruction as needed and let CSE do its job. Since load_subgroup_invocation now can appear not at the start of the shader, use UNDEF in all cases to ensure that the liveness of the destination doesn't extend to the first partial write done here (it was being used only for SIMD > 8 before). Note this option was considered in the past `6132992cdb` but at the time dismissed. The difference now is that the lowering of the virtual instruction happens earlier than the scheduling. The motivation for this change is to allow passes other than the NIR conversion to use this value. The alternative of storing a `brw_reg` in the shader (instead of NIR state) gets complicated by passes like compact_vgrfs, that move VGRFs around (and update the instructions). This and maybe other passes would have to care about the brw_reg. Fossil-db numbers, TGL ``` * Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/c683ea5067ee157d/fs.32/0, steam-native/shadow_of_the_tomb_raider/f4df450c3cef40b4/fs.32/0, steam-native/shadow_of_the_tomb_raider/94b708fb8e3d9597/fs.32/0, steam-native/shadow_of_the_tomb_raider/19d44c328edabd30/fs.32/0, steam-native/shadow_of_the_tomb_raider/8a7dcbd5a74a19bf/fs.32/0, and 366 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider * Shaders only in 'before' results are ignored: steam-dxvk/octopath_traveler/aaa3d10acb726906/fs.32/0, steam-dxvk/batman_arkham_origins/e6872ae23569c35f/fs.32/0, steam-dxvk/octopath_traveler/fd33a99fa5c271a8/fs.32/0, steam-dxvk/octopath_traveler/9a077cdc16f24520/fs.32/0, steam-dxvk/batman_arkham_city_goty/fac7b438ad52f622/fs.32/0, and 12 more from 4 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-dxvk/octopath_traveler, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 149752381 -> 149751337 (-0.00%); split: -0.00%, +0.00% Cycle count: 11553609349 -> 11549970294 (-0.03%); split: -0.06%, +0.03% Spill count: 42763 -> 42764 (+0.00%); split: -0.01%, +0.01% Fill count: 75650 -> 75651 (+0.00%); split: -0.00%, +0.01% Max live registers: 31725096 -> 31671792 (-0.17%) Max dispatch width: 5546008 -> 5551672 (+0.10%); split: +0.11%, -0.00% Totals from 52574 (8.34% of 630441) affected shaders: Instrs: 9535159 -> 9534115 (-0.01%); split: -0.03%, +0.02% Cycle count: 1006627109 -> 1002988054 (-0.36%); split: -0.65%, +0.29% Spill count: 11588 -> 11589 (+0.01%); split: -0.03%, +0.03% Fill count: 21057 -> 21058 (+0.00%); split: -0.01%, +0.02% Max live registers: 1992493 -> 1939189 (-2.68%) Max dispatch width: 559696 -> 565360 (+1.01%); split: +1.06%, -0.05% ``` and DG2 ``` * Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/1f95a9d3db21df85/fs.32/0, steam-native/shadow_of_the_tomb_raider/56b87c4a46613a2a/fs.32/0, steam-native/shadow_of_the_tomb_raider/a74b4137f85dbbd3/fs.32/0, steam-native/shadow_of_the_tomb_raider/e07e38d3f48e8402/fs.32/0, steam-native/shadow_of_the_tomb_raider/206336789c48996c/fs.32/0, and 268 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider * Shaders only in 'before' results are ignored: steam-native/shadow_of_the_tomb_raider/0420d7c3a2ea99ec/fs.32/0, steam-native/shadow_of_the_tomb_raider/2ff39f8bf7d24abb/fs.32/0, steam-native/shadow_of_the_tomb_raider/92d7be2824bd9659/fs.32/0, steam-native/shadow_of_the_tomb_raider/f09ca6d2ecf18015/fs.32/0, steam-native/shadow_of_the_tomb_raider/490f8ffd59e52949/fs.32/0, and 205 more from 3 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 151597619 -> 151599914 (+0.00%); split: -0.00%, +0.00% Subgroup size: 7699776 -> 7699784 (+0.00%) Cycle count: 12738501989 -> 12739841170 (+0.01%); split: -0.01%, +0.02% Spill count: 61283 -> 61274 (-0.01%) Fill count: 119886 -> 119849 (-0.03%) Max live registers: 31810432 -> 31758920 (-0.16%) Max dispatch width: 5540128 -> 5541136 (+0.02%); split: +0.08%, -0.06% Totals from 49286 (7.81% of 631231) affected shaders: Instrs: 8607753 -> 8610048 (+0.03%); split: -0.01%, +0.04% Subgroup size: 857752 -> 857760 (+0.00%) Cycle count: 305939495 -> 307278676 (+0.44%); split: -0.28%, +0.72% Spill count: 6339 -> 6330 (-0.14%) Fill count: 12571 -> 12534 (-0.29%) Max live registers: 1788346 -> 1736834 (-2.88%) Max dispatch width: 510920 -> 511928 (+0.20%); split: +0.85%, -0.66% ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30489>	2024-08-08 18:20:49 +00:00

1 2 3 4 5 ...

733 commits