fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 20:18:06 +02:00

Author	SHA1	Message	Date
Samuel Pitoiset	22e1d1a1f4	nir/opt_sink: add heap support Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40657>	2026-03-31 10:10:17 +00:00
Lorenzo Rossi	c0e0591999	pan/compiler: Replace frag_coord_zw_pan with var_special_pan Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Just a bit cleaner, and we can unify point size too. Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>	2026-03-27 19:23:02 +00:00
Alyssa Rosenzweig	373358da45	nir/opt_sink: sink pack_64_2x32_split This comes up in lowered load_ubo sequences (observed in OpenCL test test_api min_max_parameter_size). Hopefully the pack gets coalesced, it's like nir_op_vec2 on most backends, so it should usually be ok to sink even though the register pressure heuristic will reject it. Allowing it to sink allows the UBO load to sink. Intel's backend scheduler can optimize the relevant sequences locally but there should still be a win here for global load sinking. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>	2026-03-13 17:03:00 +00:00
Alyssa Rosenzweig	507e7a04bf	nir/opt_sink: sink Intel UBO loads Acts like load_ubo, handle it in the same path. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>	2026-03-13 17:03:00 +00:00
Georg Lehmann	077b654cc7	nir: don't sink alu that uses ballot(true) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Don't sink alu that uses ballot(true), as that can a local system value and moving the alu then requires a new mov in the old location. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38829>	2025-12-08 09:07:54 +00:00
Kenneth Graunke	f1ab64ad74	nir: add new intrinsics to load/store from URB on intel We add several new intrinsics for accessing URB handles: - load_urb_output_handle_intel - load_urb_input_handle_intel - load_urb_input_handle_intel_indexed The latter is used by stages like TCS and GS where each input control point has a unique handle. The index is which ICP to read from. The others are for most stages, where all inputs or outputs are accessed via a single handle. Then we have URB load and store operations, split for Xe2+ (URB via LSC) and earlier (HDC OWord messages): - load_urb_vec4_intel - load_urb_lsc_intel - store_urb_vec4_intel - store_urb_lsc_intel The legacy vec4 variants take a handle and a 128-bit OWord offset as sources. Additionally, stores take a set of channel enables to mask off and avoid writing vec4 components. We don't use the WRITE_MASK const-index as our channel enables are not required to be constant. The Xe2+ LSC variants are simpler. Handles are byte offsets into the URB memory region, and offsets are expressed in bytes. So we simply add them into a single "address" source. We don't support writemasks here, as they aren't really necessary with the better addressability. (Plus, the store_cmask operations work significantly differently than the previous HDC OWord messages). We will lower disjoint writemasks to multiple stores. Based on earlier code by Lionel Landwerlin. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>	2025-11-25 22:43:54 +00:00
Konstantin Seurer	de32f9275f	treewide: add & use parent instr helpers We add a bunch of new helpers to avoid the need to touch >parent_instr, including the full set of: * nir_def_is_* * nir_def_as__or_null nir_def_as_* [assumes the right instr type] * nir_src_is_* * nir_src_as_* * nir_scalar_is_* * nir_scalar_as_* Plus nir_def_instr() where there's no more suitable helper. Also an existing helper is renamed to unify all the names, while we're churning the tree: * nir_src_as_alu_instr -> nir_src_as_alu ..and then we port the tree to use the helpers as much as possible, using nir_def_instr() where that does not work. Acked-by: Marek Olšák <maraeo@gmail.com> --- To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm taking this opportunity to clean up a lot of NIR patterns. Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>	2025-11-12 21:22:13 +00:00
Marek Olšák	3fe651f607	nir: remove load_smem_amd replaced by load_global_amd + ACCESS_SMEM_AMD Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>	2025-10-08 08:54:11 +00:00
Mel Henning	17876a00af	nir: Add a faster lowest common ancestor algorithm On a fossil from the blender 4.5.0 vulkan backend, this improves compile times in nak by about 17%. Compile time of other shaders improves by a more modest 1.2%. No stat changes on shader-db. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>	2025-09-08 23:03:13 +00:00
Mel Henning	ee8d448241	nir: Don't require nir_metadata_control_flow We're about to add to nir_metadata_control_flow, and we don't want passes to require the new metadata. Via coccinelle: @@ expression e1; @@ - nir_metadata_require(e1, nir_metadata_control_flow) + nir_metadata_require(e1, nir_metadata_block_index \| nir_metadata_dominance) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>	2025-09-08 23:03:13 +00:00
Marek Olšák	48050dbef6	nir/opt_sink: handle load_global_amd Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>	2025-08-30 14:55:13 -04:00
Marek Olšák	3aadae22ad	nir: make nir_block::predecessors & dom_frontier sets non-malloc'd We can just place the set structures inside nir_block. This reduces the number of ralloc calls by 6.7% when compiling Heaven shaders with radeonsi+ACO using a release build (i.e. not including nir_validate set allocations, which are also removed). Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>	2025-08-21 06:13:48 +00:00
Karol Herbst	83cf765f8e	nak: run nir_opt_move nir_move_load_ubo Usually we can fold most ldc and ldcx into the instruction using it, however there are a couple of cases where we can't, e.g. when there is an indirect offset. Moving the ldc(x) down to the consumer leads to increase value ranges for uniform registers, but lowering them for normal registers. Totals: CodeSize: 914650304 -> 914469536 (-0.02%); split: -0.05%, +0.03% Number of GPRs: 3879754 -> 3863818 (-0.41%); split: -0.42%, +0.01% Static cycle count: 1073273107 -> 1073101189 (-0.02%); split: -0.09%, +0.08% Spills to reg: 67219 -> 67707 (+0.73%); split: -0.10%, +0.83% Fills from reg: 79733 -> 80456 (+0.91%); split: -0.10%, +1.01% Max warps/SM: 3666036 -> 3672668 (+0.18%); split: +0.18%, -0.00% Totals from 24235 (27.66% of 87622) affected shaders: CodeSize: 444747392 -> 444566624 (-0.04%); split: -0.11%, +0.07% Number of GPRs: 1360384 -> 1344448 (-1.17%); split: -1.20%, +0.03% Static cycle count: 806310857 -> 806138939 (-0.02%); split: -0.12%, +0.10% Spills to reg: 35826 -> 36314 (+1.36%); split: -0.19%, +1.55% Fills from reg: 31863 -> 32586 (+2.27%); split: -0.26%, +2.53% Max warps/SM: 911328 -> 917960 (+0.73%); split: +0.74%, -0.01% Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36536>	2025-08-19 17:29:07 +00:00
Alyssa Rosenzweig	bcf1a1c20b	treewide: use nir_def_block Via Coccinelle patch: @@ expression definition; @@ -definition->parent_instr->block +nir_def_block(definition) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Marek Olšák <maraeo@gmail.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>	2025-08-01 15:34:24 +00:00
Marek Olšák	d61edf079b	nir: add nir_move_only_convergent/divergent This will be needed by nir_opt_move_reorder_loads, which will use the move flags. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>	2025-07-29 16:20:53 -04:00
Marek Olšák	35bbc8405b	nir: add more nir_move_options Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>	2025-07-29 16:20:51 -04:00
Marek Olšák	44d78c4451	nir: handle load_input_vertex in nir_can_move_instr Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>	2025-07-29 16:20:49 -04:00
Marek Olšák	8d3e76c250	nir: split nir_move_load_frag_coord from nir_move_load_input It's a pure system value on AMD, not an input. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>	2025-07-29 16:20:48 -04:00
Marek Olšák	8d584586f5	nir: handle can_reorder robustly in nir_can_move_instr Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>	2025-07-29 16:20:44 -04:00
Marek Olšák	c229c93540	nir: change how can_mov_out_of_loop is set for intrinsics in nir_can_move_instr Set to false first, then set to true when needed. More intrinsics will set false. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>	2025-07-29 16:20:42 -04:00
Alyssa Rosenzweig	63ce73a601	nir,hk: sink lowered UBOs this is better than doing it once we've lowered to hardware ops which makes it more challenging to sink since then we'd have to sink the whole tree instead of a single intrinsic. Totals from 17617 (32.81% of 53701) affected shaders: MaxWaves: 16863872 -> 16901504 (+0.22%); split: +0.24%, -0.02% Instrs: 12406405 -> 12430375 (+0.19%); split: -0.15%, +0.35% CodeSize: 87055248 -> 87180802 (+0.14%); split: -0.18%, +0.33% Spills: 10350 -> 9301 (-10.14%); split: -11.57%, +1.43% Fills: 5215 -> 3733 (-28.42%); split: -31.49%, +3.07% Scratch: 113164 -> 110472 (-2.38%); split: -2.63%, +0.25% ALU: 9552550 -> 9558513 (+0.06%); split: -0.22%, +0.28% FSCIB: 9552545 -> 9558508 (+0.06%); split: -0.22%, +0.28% IC: 2874032 -> 2876442 (+0.08%); split: -0.00%, +0.09% GPRs: 1470040 -> 1459283 (-0.73%); split: -1.00%, +0.27% Uniforms: 5113254 -> 5115158 (+0.04%); split: -0.82%, +0.85% Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Job Noorman <job@noorman.info> [NIR] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35720>	2025-06-26 16:41:55 +00:00
Alyssa Rosenzweig	caa0854da8	nir: plumb load_global_bounded this lets the backend implement bounded loads (i.e. robust SSBOs) in a way that's more clever than a full branch. similar idea to load_global_constant_bound which should eventually be merged into this. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Job Noorman <job@noorman.info> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35720>	2025-06-26 16:41:53 +00:00
Georg Lehmann	7de352e99e	nir,radv: add an option to not move 8/16bit vecs ACO will overestimate the register demand of the sources, so we don't want to create the vector later. Foz-DB Navi48: Totals from 240 (0.30% of 80265) affected shaders: MaxWaves: 6429 -> 6435 (+0.09%) Instrs: 3406069 -> 3406646 (+0.02%); split: -0.01%, +0.03% CodeSize: 18231596 -> 18233288 (+0.01%); split: -0.01%, +0.02% VGPRs: 14768 -> 14732 (-0.24%) Latency: 18981274 -> 18979170 (-0.01%); split: -0.02%, +0.01% InvThroughput: 4247331 -> 4246634 (-0.02%); split: -0.02%, +0.01% VClause: 85453 -> 85458 (+0.01%); split: -0.01%, +0.01% Copies: 262046 -> 261971 (-0.03%); split: -0.06%, +0.03% PreVGPRs: 10899 -> 10775 (-1.14%) VALU: 1923441 -> 1923485 (+0.00%); split: -0.01%, +0.01% SALU: 457983 -> 457982 (-0.00%) VOPD: 4980 -> 4861 (-2.39%); split: +0.48%, -2.87% Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35729>	2025-06-26 09:29:43 +00:00
Georg Lehmann	7ac9a87572	nir/opt_sink: don't assume moving conversion can't increase register pressure Foz-DB Navi48: Totals from 11311 (14.09% of 80265) affected shaders: MaxWaves: 337664 -> 337648 (-0.00%); split: +0.00%, -0.01% Instrs: 10102221 -> 10101625 (-0.01%); split: -0.05%, +0.04% CodeSize: 55000184 -> 54999292 (-0.00%); split: -0.04%, +0.03% VGPRs: 571052 -> 571064 (+0.00%); split: -0.03%, +0.03% Latency: 59247189 -> 59204726 (-0.07%); split: -0.13%, +0.06% InvThroughput: 10236407 -> 10215675 (-0.20%); split: -0.26%, +0.06% VClause: 211730 -> 211677 (-0.03%); split: -0.07%, +0.04% SClause: 284802 -> 284762 (-0.01%); split: -0.07%, +0.06% Copies: 702890 -> 702539 (-0.05%); split: -0.18%, +0.13% Branches: 205117 -> 205112 (-0.00%) PreSGPRs: 475898 -> 475825 (-0.02%); split: -0.02%, +0.00% PreVGPRs: 366318 -> 366449 (+0.04%); split: -0.14%, +0.17% VALU: 5764791 -> 5764349 (-0.01%); split: -0.02%, +0.01% SALU: 1259529 -> 1259517 (-0.00%); split: -0.04%, +0.04% VOPD: 5854 -> 5724 (-2.22%); split: +0.70%, -2.92% Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35729>	2025-06-26 09:29:43 +00:00
Lionel Landwerlin	16fca611d7	nir: add new intel ssbo intrinsics Similar to ir3 ones, to optimize offsets in the backend. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>	2025-06-22 10:55:23 +00:00
Alyssa Rosenzweig	b0f8c22682	nir/opt_sink: sink agx backfacing helps an elden ring shader: Totals from 1 (0.03% of 3206) affected shaders: Instrs: 4146 -> 4141 (-0.12%) CodeSize: 27374 -> 27334 (-0.15%) ALU: 2554 -> 2549 (-0.20%) FSCIB: 2554 -> 2549 (-0.20%) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35559>	2025-06-20 16:09:28 +00:00
Emma Anholt	7db62e6dad	nir: Split nir_load_frag_coord_zw to separate z/w intrinsics. This will be a win for Intel for tracking which payload values need to be set up. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>	2025-06-18 23:11:36 +00:00
Mary Guillemard	e0be93d881	nir: Add Panfrost specific shader_output intrinsic On Avalon, this is a bitfield that holds information on what values a vertex shader should output. Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33910>	2025-03-10 07:38:16 +01:00
Alyssa Rosenzweig	9a58a8257e	treewide: Switch to nir_progress Via the Coccinelle patch at the end of the commit message, followed by sed -ie 's/progress = progress \| /progress \|=/g' $(git grep -l 'progress = prog') ninja -C ~/mesa/build clang-format cd ~/mesa/src/compiler/nir && clang-format -i *.c agxfmt @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} -return prog; +return nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -return true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -return false; -} +bool progress = prog_expr; +return nir_progress(progress, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all); -return prog; +return nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all); +nir_progress(prog, impl, metadata); @@ expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); -return true; +return nir_progress(true, impl, metadata); @@ expression impl; @@ -nir_metadata_preserve(impl, nir_metadata_all); -return false; +return nir_no_progress(impl); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} -other_prog \|= prog; +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +nir_progress(prog, impl, metadata); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -other_prog = true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; identifier prog; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -prog = true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +prog = prog \| nir_progress(impl_progress, impl, metadata); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -other_prog = true; -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; identifier prog; @@ -if (prog_expr) { -prog = true; -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +prog = prog \| nir_progress(impl_progress, impl, metadata); @@ expression prog_expr, impl, metadata; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +nir_progress(impl_progress, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); -prog = true; +prog = nir_progress(true, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} -return prog; +return nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} +nir_progress(prog, impl, metadata); @@ expression impl; @@ -nir_metadata_preserve(impl, nir_metadata_all); +nir_no_progress(impl); @@ expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); +nir_progress(true, impl, metadata); squashme! sed -ie 's/progress = progress \| /progress \|=/g' $(git grep -l 'progress = prog') Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>	2025-02-26 15:19:53 +00:00
Benjamin Lee	6f541e2016	panfrost: add intrinsic to load frag coord at a barycentric This is needed for noperspective lowering, where we need to multiply the varying value by gl_FragCoord.w at the same barycentric as the varying. Normal nir_load_frag_coord_zw instructions are lowered to the new intrinsic on bifrost with the pan_lower_frag_coord_zw pass. Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>	2025-01-03 07:04:05 +00:00
Georg Lehmann	b8fa9daf0c	nir: sink/move alu with two identical, non constant sources. Foz-DB Navi21: Totals from 32363 (40.76% of 79395) affected shaders: MaxWaves: 787499 -> 787675 (+0.02%); split: +0.02%, -0.00% Instrs: 28783404 -> 28783464 (+0.00%); split: -0.01%, +0.01% CodeSize: 156763536 -> 156765148 (+0.00%); split: -0.01%, +0.02% VGPRs: 1493304 -> 1492848 (-0.03%); split: -0.04%, +0.01% Latency: 243022511 -> 243051994 (+0.01%); split: -0.08%, +0.09% InvThroughput: 57827398 -> 57828129 (+0.00%); split: -0.05%, +0.05% VClause: 582208 -> 582298 (+0.02%); split: -0.07%, +0.08% SClause: 959634 -> 959312 (-0.03%); split: -0.07%, +0.04% Copies: 1965821 -> 1965826 (+0.00%); split: -0.17%, +0.17% Branches: 710593 -> 710596 (+0.00%); split: -0.00%, +0.01% PreSGPRs: 1313513 -> 1313632 (+0.01%); split: -0.00%, +0.01% PreVGPRs: 1210596 -> 1209103 (-0.12%); split: -0.12%, +0.00% VALU: 19463445 -> 19463497 (+0.00%); split: -0.02%, +0.02% SALU: 3319529 -> 3319500 (-0.00%); split: -0.01%, +0.01% Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32783>	2024-12-30 13:28:30 +00:00
Caterina Shablia	f4fcfa8016	pan,nir: introduce load_attribute_pan load_attribute_pan is a panfrost-specific intrinsic for loading vertex attributes. Takes explicit vertex and instance IDs which we need in order to implement vertex attribute divisor with non-zero base instance on v9+. Passes which are used by panvk are modified to be aware of load_attribute_pan. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32039>	2024-12-18 08:33:16 +00:00
Georg Lehmann	dbf63a0788	nir: remove nir_op_is_derivative Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Georg Lehmann	41e82b8b8e	nir: sink is_subgroup_invocation_lt_amd Having it closer to the branches means we can eliminate an exec copy. Foz-DB Navi31: Totals from 11615 (14.63% of 79395) affected shaders: Instrs: 6804372 -> 6804903 (+0.01%); split: -0.04%, +0.05% CodeSize: 33684672 -> 33680584 (-0.01%); split: -0.07%, +0.05% VGPRs: 578616 -> 578604 (-0.00%) SpillSGPRs: 1506 -> 1304 (-13.41%) Latency: 29817034 -> 29821320 (+0.01%); split: -0.03%, +0.05% InvThroughput: 3581587 -> 3581217 (-0.01%); split: -0.02%, +0.01% VClause: 124826 -> 124782 (-0.04%); split: -0.04%, +0.00% SClause: 187916 -> 187645 (-0.14%); split: -0.27%, +0.13% Copies: 520969 -> 510027 (-2.10%); split: -2.20%, +0.10% PreSGPRs: 442584 -> 421344 (-4.80%) VALU: 3810755 -> 3810267 (-0.01%); split: -0.01%, +0.00% SALU: 763402 -> 752650 (-1.41%); split: -1.48%, +0.07% Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>	2024-09-26 14:29:14 +00:00
Georg Lehmann	7fa7812219	nir: merge out of loop decision with nir_can_move_instr logic One place to modify instead of two when adding new intrinsics here. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>	2024-09-12 21:49:34 +00:00
Georg Lehmann	91f8e32a85	nir/opt_sink: do not sink inverse_ballot out of loops Inverse_ballot result is undefined if the input is not dynamically uniform. And sinking out of loops might make the input divergent. Fixes: `18a0ff137f` ("nir: sink/move inverse_ballot like moves") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>	2024-09-12 21:49:34 +00:00
Georg Lehmann	1ec3cc2aed	nir/opt_sink: do not sink load_ubo_vec4 out of loops Same reason as for load_ubo. Fixes: `d199d65c3a` ("nir/nir_opt_move,sink: Include load_ubo_vec4 as a load_ubo instr.") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>	2024-09-12 21:49:34 +00:00
Daniel Schürmann	50d416fe89	nir: add nir_block *nir_src_get_block(src) helper Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7710>	2024-08-29 09:42:55 +00:00
Marek Olšák	b2d32ae246	nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag Instead of having 1 bit in nir_io_semantics indicating a per-primitive FS input, add a dedicated intrinsic for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>	2024-07-23 16:13:16 +00:00
Daniel Schürmann	ffef3d1709	nir/opt_sink: ignore loops without backedge Loops without backedge should not be considered loops. For RADV, 2069 (2.61% of 79395) affected shaders. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28783>	2024-07-16 12:29:08 +00:00
Karol Herbst	d5da434851	nir/opt_sink: add load_kernel_input Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362>	2024-06-26 10:04:02 +00:00
Alyssa Rosenzweig	15257b65c6	treewide: use nir_metadata_control_flow Via Coccinelle patch: @@ @@ -nir_metadata_block_index \| nir_metadata_dominance +nir_metadata_control_flow ...plus some manual fixups for call sites missed by coccinelle. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom] Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>	2024-06-17 16:28:14 -04:00
Georg Lehmann	18a0ff137f	nir: sink/move inverse_ballot like moves It's just a copy for the backends that don't lower it. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29502>	2024-06-04 15:40:57 +00:00
Alyssa Rosenzweig	c39896b17b	nir: Use getters for nir_src::parent_* First, we need to give the parent_instr field a unique name to be able to replace with a helper. We have parent_instr fields for both nir_src and nir_def, so let's rename nir_src::parent_instr in preparation for rework. This was done with a combination of sed and manual fix-ups. Then we use semantic patches plus manual fixups: @@ expression s; @@ -s->renamed_parent_instr +nir_src_parent_instr(s) @@ expression s; @@ -s.renamed_parent_instr +nir_src_parent_instr(&s) @@ expression s; @@ -s->parent_if +nir_src_parent_if(s) @@ expression s; @@ -s.renamed_parent_if +nir_src_parent_if(&s) @@ expression s; @@ -s->is_if +nir_src_is_if(s) @@ expression s; @@ -s.is_if +nir_src_is_if(&s) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>	2023-10-10 04:58:05 -04:00
Alyssa Rosenzweig	4bcb62d203	nir/opt_sink: Also consider load_preamble as const Acts like constants, schedule them like constants. This lets us move lowered frag coord code down. Results on dolphin ubers: total instructions in shared programs: 195144 -> 196633 (0.76%) instructions in affected programs: 175737 -> 177226 (0.85%) helped: 28 HURT: 27 Instructions are HURT. total bytes in shared programs: 1379980 -> 1388308 (0.60%) bytes in affected programs: 1244250 -> 1252578 (0.67%) helped: 28 HURT: 27 Bytes are HURT. total halfregs in shared programs: 13591 -> 13557 (-0.25%) halfregs in affected programs: 2176 -> 2142 (-1.56%) helped: 12 HURT: 2 Inconclusive result (%-change mean confidence interval includes 0). total threads in shared programs: 233728 -> 234112 (0.16%) threads in affected programs: 3264 -> 3648 (11.76%) helped: 6 HURT: 0 Threads are helped. Results on Android shader-db: total instructions in shared programs: 1775324 -> 1775912 (0.03%) instructions in affected programs: 155305 -> 155893 (0.38%) helped: 353 HURT: 548 Instructions are HURT. total bytes in shared programs: 11676650 -> 11678454 (0.02%) bytes in affected programs: 1058924 -> 1060728 (0.17%) helped: 370 HURT: 547 Inconclusive result (value mean confidence interval includes 0). total halfregs in shared programs: 484143 -> 471212 (-2.67%) halfregs in affected programs: 98833 -> 85902 (-13.08%) helped: 2478 HURT: 674 Halfregs are helped. Instr count changes due to losing the RA lottery. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>	2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig	aead5316d2	nir/opt_sink: Move ALU with constant sources In general, sinking ALU instructions can negatively impact register pressure, since it extends the live ranges of the sources, although it does shrink the live range of the destination. However, constants do not usually contribute to register pressure. This is not a totally true assumption, but it's pretty good in practice, since... * constants can be rematerialized (backend-dependent) * constants can often be inlined (ISA-dependent) * constants can sometimes be promoted to free uniform registers (ISA-dependent) * constants can live in scalar registers although the ALU destination might need a vector register (and vector registers are assumed to be much more expensive than scalar registers, again ISA-dependent) So, assume that constants have zero effect on register pressure. Now consider an ALU instruction where all but one source is a constant. Then there are two cases: 1. The ALU instruction is moved past when its source was otherwise killed. Then there is no effect on register pressure, since the source live range is extended exactly as much as the destination live range shrinks. 2. The ALU instruction is moved down but its source is still alive where it's moved to. Then register pressure is improved, since the source live range is unchanged while the destination live range shrinks. So, as a heuristic, we always move ALU instructions where n-1 sources are constant. As an inevitable special case, this also (necessarily) moves unary ALU ops, which should be beneficial by the same justification. This is not 100% perfect but it is well-motivated. Results on AGX are decent: total instructions in shared programs: 1796101 -> 1795652 (-0.02%) instructions in affected programs: 326822 -> 326373 (-0.14%) helped: 800 HURT: 371 Inconclusive result (%-change mean confidence interval includes 0). total bytes in shared programs: 11805004 -> 11801424 (-0.03%) bytes in affected programs: 2610630 -> 2607050 (-0.14%) helped: 912 HURT: 462 Inconclusive result (%-change mean confidence interval includes 0). total halfregs in shared programs: 525818 -> 515399 (-1.98%) halfregs in affected programs: 118197 -> 107778 (-8.81%) helped: 2095 HURT: 804 Halfregs are helped. total threads in shared programs: 18916608 -> 18917056 (<.01%) threads in affected programs: 4800 -> 5248 (9.33%) helped: 7 HURT: 0 Threads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>	2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig	561df40211	nir/opt_sink: Do not move derivatives At the moment, this does nothing. It will prevent problems from the next patch, however. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>	2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig	469fd36fba	nir/opt_sink: Sink frag coord instructions load_input-like. ubershaders: instructions in affected programs: 72392 -> 72522 (0.18%) helped: 8 HURT: 18 Inconclusive result (value mean confidence interval includes 0). total bytes in shared programs: 1468550 -> 1469170 (0.04%) bytes in affected programs: 560486 -> 561106 (0.11%) helped: 10 HURT: 17 Inconclusive result (value mean confidence interval includes 0). total halfregs in shared programs: 13946 -> 13898 (-0.34%) halfregs in affected programs: 3642 -> 3594 (-1.32%) helped: 21 HURT: 0 Halfregs are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>	2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig	c07a9dca65	nir/opt_sink: Sink load_local_pixel_agx This is the AGX version of load_output, which shaders can use for framebuffer fetch. It is beneficial to sink framebuffer fetch as late as possible, both to reduce register pressure but also to reduce serialization of overlapping fragments. Results on a collection of ubershaders: total bytes in shared programs: 1468928 -> 1468550 (-0.03%) bytes in affected programs: 495300 -> 494922 (-0.08%) helped: 24 HURT: 0 Bytes are helped. total halfregs in shared programs: 14162 -> 13946 (-1.53%) halfregs in affected programs: 5148 -> 4932 (-4.20%) helped: 27 HURT: 0 Halfregs are helped. total threads in shared programs: 216896 -> 217664 (0.35%) threads in affected programs: 6912 -> 7680 (11.11%) helped: 12 HURT: 0 Threads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>	2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig	596682ad4b	nir/opt_sink: Sink load_constant_agx By the time this runs, we will have already lowered load_ubo and load_vbo to load_constant_agx so we need to handle the backend version. This is very important for reducing register pressure in monolithic VS+GS shaders on AGX. Since no other backend has _agx intrinsics, there's no need for an option to gate this. The additional instruction count is from more frequent wait instructions due to fewer instructions grouped together. This should be mitigated in the future with an ACO-style latency-reducing scheduler in the backend, after register pressure is reduced by opt_sink. total instructions in shared programs: 1793385 -> 1796101 (0.15%) instructions in affected programs: 199816 -> 202532 (1.36%) helped: 3 HURT: 941 Instructions are HURT. total bytes in shared programs: 11799628 -> 11805004 (0.05%) bytes in affected programs: 1345656 -> 1351032 (0.40%) helped: 34 HURT: 919 Bytes are HURT. total halfregs in shared programs: 533151 -> 525818 (-1.38%) halfregs in affected programs: 40335 -> 33002 (-18.18%) helped: 613 HURT: 42 Halfregs are helped. total threads in shared programs: 18910464 -> 18916608 (0.03%) threads in affected programs: 6144 -> 12288 (100.00%) helped: 12 HURT: 0 Threads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>	2023-09-18 08:38:16 -04:00

1 2

72 commits