fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 18:08:15 +02:00

Author	SHA1	Message	Date
Ella Stanforth	bb07364c54	v3d/compiler: remove num_samplers_used from shader key This is only ever used by assertions. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34742>	2025-05-08 06:25:22 +00:00
Ella Stanforth	01d0ccd664	v3d/compiler: remove unused texture swizzle Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34742>	2025-05-08 06:25:22 +00:00
Ella Stanforth	76e27d2d0d	v3d/compiler: remove return_channels from the shader key This isn't used anywhere. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34742>	2025-05-08 06:25:22 +00:00
Ella Stanforth	b39fc710ee	v3d/compiler: remove int/uint tracking We don't need this anymore as we do not support anything older than 4.2. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34742>	2025-05-08 06:25:22 +00:00
Ella Stanforth	42154029fc	v3d/compiler: Implement software blend lowering Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33942>	2025-04-23 09:03:41 +00:00
Ella Stanforth	a6f67d5b69	v3d/compiler: Only lower logic ops for color buffers that exist Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33942>	2025-04-23 09:03:41 +00:00
Ella Stanforth	1ec0cdb733	v3d/compiler: Fixup output types for all 8 outputs Cc: mesa-stable Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33942>	2025-04-23 09:03:41 +00:00
Juan A. Suarez Romero	f5e36e382f	broadcom/compiler: initialize register This fixes issue detected by static analyzer: passed-by-value struct argument contains uninitialized data (e.g., field: 'file'). Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34050>	2025-04-04 15:55:13 +00:00
Juan A. Suarez Romero	0e50b09d4a	broadcom/compiler: don't use VLA on emit alu Using constant-size array instead of variable-length array is preferred due several issues with the latter. Particularly, for this case using VLA generates several warnings by static analyzer: passed-by-value struct argument contains uninitialized data (e.g., field: 'file'). Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34050>	2025-04-04 15:55:13 +00:00
Juan A. Suarez Romero	01151f045f	broadcom/compiler: use safe iterator to remove instructions The current approach has an issue detected by static analyzer: use of memory after it is freed. Using a proper iterator makes things safer. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34050>	2025-04-04 15:55:13 +00:00
Connor Abbott	7a55e13939	nir, compiler: Rename needs_quad_helper_invocations This currently treats coarse and fine derivatives the same, but Qualcomm needs to know whether just coarse derivatives are used or fine derivatives/quad ops are also used. Rename this to needs_coarse_quad_helper_invocations make clear the difference from the new field, needs_full_quad_helper_invocations. Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> Fixes: `264d8a6766` ("ir3: Set need_full_quad depending on info.fs.require_full_quads") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33862>	2025-03-14 21:55:57 +00:00
Ella Stanforth	332b313547	v3d: enable framebuffer fetch Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33766>	2025-03-12 13:28:16 +00:00
Ella Stanforth	6023a46d02	v3d/compiler: Implement load_output Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33766>	2025-03-12 13:28:16 +00:00
Alyssa Rosenzweig	9a58a8257e	treewide: Switch to nir_progress Via the Coccinelle patch at the end of the commit message, followed by sed -ie 's/progress = progress \| /progress \|=/g' $(git grep -l 'progress = prog') ninja -C ~/mesa/build clang-format cd ~/mesa/src/compiler/nir && clang-format -i *.c agxfmt @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} -return prog; +return nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -return true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -return false; -} +bool progress = prog_expr; +return nir_progress(progress, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all); -return prog; +return nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all); +nir_progress(prog, impl, metadata); @@ expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); -return true; +return nir_progress(true, impl, metadata); @@ expression impl; @@ -nir_metadata_preserve(impl, nir_metadata_all); -return false; +return nir_no_progress(impl); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} -other_prog \|= prog; +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +nir_progress(prog, impl, metadata); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -other_prog = true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; identifier prog; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -prog = true; -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +prog = prog \| nir_progress(impl_progress, impl, metadata); @@ identifier other_prog, prog; expression impl, metadata; @@ -if (prog) { -other_prog = true; -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +other_prog = other_prog \| nir_progress(prog, impl, metadata); @@ expression prog_expr, impl, metadata; identifier prog; @@ -if (prog_expr) { -prog = true; -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +prog = prog \| nir_progress(impl_progress, impl, metadata); @@ expression prog_expr, impl, metadata; @@ -if (prog_expr) { -nir_metadata_preserve(impl, metadata); -} else { -nir_metadata_preserve(impl, nir_metadata_all); -} +bool impl_progress = prog_expr; +nir_progress(impl_progress, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); -prog = true; +prog = nir_progress(true, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} -return prog; +return nir_progress(prog, impl, metadata); @@ identifier prog; expression impl, metadata; @@ -if (prog) { -nir_metadata_preserve(impl, metadata); -} +nir_progress(prog, impl, metadata); @@ expression impl; @@ -nir_metadata_preserve(impl, nir_metadata_all); +nir_no_progress(impl); @@ expression impl, metadata; @@ -nir_metadata_preserve(impl, metadata); +nir_progress(true, impl, metadata); squashme! sed -ie 's/progress = progress \| /progress \|=/g' $(git grep -l 'progress = prog') Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>	2025-02-26 15:19:53 +00:00
Georg Lehmann	f26069fdd9	nir: replace nir_opt_conditional_discard with nir_opt_peephole_select Foz-DB Navi21: Totals from 118 (0.15% of 79377) affected shaders: Instrs: 208001 -> 207355 (-0.31%); split: -0.33%, +0.01% CodeSize: 1080428 -> 1078432 (-0.18%); split: -0.20%, +0.02% SpillSGPRs: 202 -> 211 (+4.46%) Latency: 1923508 -> 1919093 (-0.23%); split: -0.62%, +0.39% InvThroughput: 407475 -> 407081 (-0.10%); split: -0.12%, +0.02% SClause: 7050 -> 7033 (-0.24%); split: -0.31%, +0.07% Copies: 12156 -> 11821 (-2.76%); split: -3.04%, +0.28% PreSGPRs: 8198 -> 8331 (+1.62%); split: -0.02%, +1.65% PreVGPRs: 7628 -> 7528 (-1.31%) VALU: 155747 -> 155657 (-0.06%); split: -0.06%, +0.00% SALU: 18295 -> 17782 (-2.80%); split: -2.98%, +0.18% SMEM: 10521 -> 10519 (-0.02%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33590>	2025-02-20 21:59:17 +00:00
Georg Lehmann	ca8147edbe	nir/peephole_select: add options struct Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33590>	2025-02-20 21:59:16 +00:00
Juan A. Suarez Romero	1e0e521a7d	broadcom/compiler: move stores to the end of shader It is possible that shader comes with output stores executed before loading inputs. As the memory to read the inputs and store the outputs is the same, this mean it could be overwriting the inputs before reading them. This move avoids this situation. This partially improves https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33053. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33310>	2025-02-03 17:10:47 +00:00
Konstantin Seurer	60a20bcf3d	nir: Stop using instructions for debug info Annotating ssa defs without affecting compilation is impossible with debug info instructions since referencing a nir_def from the debug info instr will add uses. The old approach also stops worrking if passes reorder instructions. This patch proposes a solution which should not regress performance just like the old approach. The difference is that this one allocates a bit more space for debug info instead of adding a new instruction for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>	2025-01-30 20:14:01 +00:00
Daniel Schürmann	f3be7ce01b	nir/from_ssa: only consider divergence if requested This pass used to unconditionally use divergence information which forced the caller to either call divergence_analysis or ensure that the divergence is properly reset. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>	2025-01-23 01:31:23 +00:00
Alyssa Rosenzweig	7a4469681e	nir: pass a callback to nir_lower_robust_access rather than try to enumerate everything a driver might want with an unmanageable collection of booleans, just do a filter callback + data. this ends up simpler overall, and will allow Intel to use this pass for just 64-bit images without needing to add even more booleans. while we're churning the pass signature, also do a quick port to nir_shader_intrinsics_pass Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [NIR and V3D] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32907>	2025-01-08 15:59:05 +00:00
Juan A. Suarez Romero	0d14e129bc	v3d: avoid 0-size variable length array Declaring a variable-length array (VLA) based on a variable that can be 0 is declared dangerous. In this case, the variable can't take value 0, so adding an assertion fixes the issue. This was detected by static analyzer. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32819>	2025-01-07 14:21:42 +00:00
Marek Olšák	c21bc65ba7	nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads A negative hole size means the loads overlap. This will be used by drivers to handle overlapping loads in the callback easily. Reviewed-by: Mel Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>	2025-01-01 00:03:55 +00:00
Juan A. Suarez Romero	fd19106773	broadcom/compiler: fix fp16 conversion operations The case for converting a 32-bit integer to 16-bit float is not correctly implemented. Fixes: `214121e9b0` ("broadcom/compiler: handle fp16 conversion ops") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32593>	2024-12-16 10:56:38 +00:00
Juan A. Suarez Romero	8ffdf5a2ab	broadcom/compiler: ensure offset source exists As the lowering is applied on a load uniform intrinsic, there must be an offset source number. This fixes CID#1604734 ("Negative array index read") detected by Coverity Scan. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32593>	2024-12-16 10:56:38 +00:00
Marek Olšák	7f4e36ff7d	gallium: replace PIPE_SHADER_CAP_INDIRECT_INPUT/OUTPUT_ADDR with NIR options This is a prerequisite for enabling nir_opt_varyings for all gallium drivers. nir_lower_io_passes (called by the GLSL linker) only uses NIR options to lower indirect IO access before lowering IO and calling nir_opt_varyings. Most drivers report full support for indirect IO and lower it themselves, which prevents compaction of lowered indirectly accessed varyings because nir_opt_varyings doesn't touch indirect varyings. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> (Rb for asahi) Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> (for r300) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32423>	2024-12-03 12:57:36 +00:00
Iago Toral Quiroga	f988a2f336	broadcom: move double-buffer heuristic helpers to the compiler This avoids pulling the dependency on NIR headers in libbroadcom_v3d. Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32240>	2024-11-21 07:21:47 +00:00
Marek Olšák	25d4943481	nir: make use_interpolated_input_intrinsics a nir_lower_io parameter This will need to be set to true when the GLSL linker lowers IO, which can later be unlowered by st/mesa, and then drivers can lower it again without load_interpolated_input. Therefore, it can't be a global immutable option. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32229>	2024-11-20 02:45:37 +00:00
Rhys Perry	45c1280d2c	nir_lower_mem_access_bit_sizes: pass access to callback Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Rhys Perry	61752152f7	nir_lower_mem_access_bit_sizes: add nir_mem_access_shift_method Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Jose Maria Casanova Crespo	5b951bcdd7	v3d: Enable Early-Z with discards when depth updates are disabled The Early-Z optimization is disabled when there is a discard instruction in the shader used in the draw call. But if discard is the only reason to disable Early-Z, and at draw call time the updates in the draw call are disabled we can enable Early-Z using a shader variant. If there are occlussion queries active we also need to disable Early-z optimization. So this patch enables Early-Z in this scenario. The performance improvement is significant when running gfxbench benchmark showing an average improvement of 11.15% fps_avg helped: gl_gfxbench_aztec_high.trace: 3.13 -> 3.73 (19.13%) fps_avg helped: gl_gfxbench_aztec.trace: 4.82 -> 5.68 (17.88%) fps_avg helped: gl_gfxbench_manhattan31.trace: 5.10 -> 6.00 (17.59%) fps_avg helped: gl_gfxbench_manhattan.trace: 7.24 -> 8.36 (15.52%) fps_avg helped: gl_gfxbench_trex.trace: 19.25 -> 20.17 ( 4.81%) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32028>	2024-11-12 13:26:38 +00:00
Daniel Schürmann	c8348139fd	nir: change signature of nir_src_is_divergent() Now, it takes nir_src * instead of nir_src. Also move the implementation to nir_divergence_analysis.c. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	c25c63ebc0	nir/divergence: separately indicate whether loops have divergent continues or breaks bool nir_loop_is_divergent(nir_loop *) replaces the previous loop->divergent indicator. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Marek Olšák	65ace5649b	nir: reject unsupported component counts from all vectorize callbacks If you allow an unsupported component count in the callback for loads, nir_opt_load_store_vectorize will align num_components to the next supported vector size, essentially overfetching. This changes all callbacks to reject it. AMD will enable it in a later commit. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Marek Olšák	02923e237d	nir: add hole_size parameter into the vectorize callback It will be used to allow merging loads with a hole between them. Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
Christian Gmeiner	cf939334e6	v3d: Add a few function traces Sprinkle around a few traces that were useful in locating submit and fence waits. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31575>	2024-10-14 12:21:51 +00:00
Iago Toral Quiroga	4d1971f17f	broadcom: fix pairing tmu lookup with previous ldtmu There are some restrictions when pairing a new TMU lookup with a previous LDTMU and we had code to handle this but we were not limiting the restriction only to TMU lookups. total instructions in shared programs: 10856992 -> 10823967 (-0.30%) instructions in affected programs: 1823670 -> 1790645 (-1.81%) helped: 10212 HURT: 110 Instructions are helped. total max-temps in shared programs: 2234069 -> 2233153 (-0.04%) max-temps in affected programs: 15100 -> 14184 (-6.07%) helped: 660 HURT: 3 Max-temps are helped. total sfu-stalls in shared programs: 15935 -> 15967 (0.20%) sfu-stalls in affected programs: 317 -> 349 (10.09%) helped: 31 HURT: 57 Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 10872927 -> 10839934 (-0.30%) inst-and-stalls in affected programs: 1824656 -> 1791663 (-1.81%) helped: 10199 HURT: 111 Inst-and-stalls are helped. total nops in shared programs: 185612 -> 185767 (0.08%) nops in affected programs: 4865 -> 5020 (3.19%) helped: 164 HURT: 256 Nops are HURT. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31574>	2024-10-10 06:58:15 +00:00
Iago Toral Quiroga	c58bfb355a	broadcom/compiler: generate mali opcodes for clamping on Pi5 Models C0 and D0 support these opcodes too. total instructions in shared programs: 10869461 -> 10856992 (-0.11%) instructions in affected programs: 1467666 -> 1455197 (-0.85%) helped: 6012 HURT: 1413 Instructions are helped. total threads in shared programs: 431014 -> 431010 (<.01%) threads in affected programs: 8 -> 4 (-50.00%) helped: 0 HURT: 2 total uniforms in shared programs: 5432771 -> 5430909 (-0.03%) uniforms in affected programs: 183047 -> 181185 (-1.02%) helped: 976 HURT: 128 Uniforms are helped. total max-temps in shared programs: 2235272 -> 2234069 (-0.05%) max-temps in affected programs: 38163 -> 36960 (-3.15%) helped: 1262 HURT: 168 Max-temps are helped. total spills in shared programs: 4331 -> 4363 (0.74%) spills in affected programs: 964 -> 996 (3.32%) helped: 6 HURT: 47 total fills in shared programs: 6527 -> 6622 (1.46%) fills in affected programs: 2047 -> 2142 (4.64%) helped: 6 HURT: 47 total sfu-stalls in shared programs: 15807 -> 15935 (0.81%) sfu-stalls in affected programs: 787 -> 915 (16.26%) helped: 71 HURT: 172 Sfu-stalls are HURT. total inst-and-stalls in shared programs: 10885268 -> 10872927 (-0.11%) inst-and-stalls in affected programs: 1469423 -> 1457082 (-0.84%) helped: 5998 HURT: 1417 Inst-and-stalls are helped. total nops in shared programs: 184280 -> 185612 (0.72%) nops in affected programs: 10000 -> 11332 (13.32%) helped: 311 HURT: 1193 Nops are HURT. The results show a reduction in register pressure, but an increase in spills, which looks contradictory. This is because for some reason, this optimization makes the NIR scheduler produce code for some shaders in Godot that cause additional spilling, but the problem seems to be exclusive to Godot shaders and not really related to the optimization itself but to how the NIR scheduler works. Excluding Godot shaders we actually see a decrease in spills and a slightly larger improvement in instruction counts: total instructions in shared programs: 10720106 -> 10707621 (-0.12%) instructions in affected programs: 1375316 -> 1362831 (-0.91%) helped: 5948 HURT: 1364 Instructions are helped. total threads in shared programs: 428248 -> 428244 (<.01%) threads in affected programs: 8 -> 4 (-50.00%) helped: 0 HURT: 2 total spills in shared programs: 3729 -> 3712 (-0.46%) spills in affected programs: 451 -> 434 (-3.77%) helped: 6 HURT: 0 total fills in shared programs: 4738 -> 4714 (-0.51%) fills in affected programs: 564 -> 540 (-4.26%) helped: 6 HURT: 0 Comparing only shaders from Godot: total instructions in shared programs: 149355 -> 149371 (0.01%) instructions in affected programs: 92350 -> 92366 (0.02%) helped: 64 HURT: 49 Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 16477 -> 16472 (-0.03%) max-temps in affected programs: 180 -> 175 (-2.78%) helped: 5 HURT: 0 Max-temps are helped. total spills in shared programs: 602 -> 651 (8.14%) spills in affected programs: 513 -> 562 (9.55%) helped: 0 HURT: 47 total fills in shared programs: 1789 -> 1908 (6.65%) fills in affected programs: 1483 -> 1602 (8.02%) helped: 0 HURT: 47 Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>	2024-10-03 09:02:08 +00:00
Iago Toral Quiroga	c57be33d96	broadcom/compiler: implement NIR mali opcodes for clamping These translate directly to new unpack modifiers on V3D 7.x. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>	2024-10-03 09:02:08 +00:00
Iago Toral Quiroga	5a62d47762	broadcom/compiler: don't use small immediates in geometry stages Shader-db shows this is beneficial, even if it comes with a small increase in register pressure. total instructions in shared programs: 10889197 -> 10869857 (-0.18%) instructions in affected programs: 3625014 -> 3605674 (-0.53%) helped: 14911 HURT: 8324 Instructions are helped. total threads in shared programs: 431034 -> 431014 (<.01%) threads in affected programs: 40 -> 20 (-50.00%) helped: 0 HURT: 10 Threads are HURT. total uniforms in shared programs: 5308006 -> 5432767 (2.35%) uniforms in affected programs: 2204951 -> 2329712 (5.66%) helped: 9 HURT: 30766 Uniforms are HURT. total max-temps in shared programs: 2226471 -> 2235269 (0.40%) max-temps in affected programs: 272670 -> 281468 (3.23%) helped: 2372 HURT: 8479 Max-temps are HURT. total spills in shared programs: 4318 -> 4331 (0.30%) spills in affected programs: 39 -> 52 (33.33%) helped: 2 HURT: 7 total fills in shared programs: 6514 -> 6527 (0.20%) fills in affected programs: 42 -> 55 (30.95%) helped: 2 HURT: 7 total sfu-stalls in shared programs: 15166 -> 15808 (4.23%) sfu-stalls in affected programs: 2389 -> 3031 (26.87%) helped: 513 HURT: 944 Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 10904363 -> 10885665 (-0.17%) inst-and-stalls in affected programs: 3660930 -> 3642232 (-0.51%) helped: 14878 HURT: 8450 Inst-and-stalls are helped. total nops in shared programs: 183672 -> 184256 (0.32%) nops in affected programs: 12532 -> 13116 (4.66%) helped: 1841 HURT: 2251 Nops are HURT. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31355>	2024-09-25 14:21:46 +00:00
Iago Toral Quiroga	390849f6a2	broadcom/compiler: don't add const offset to unifa if it is 0 Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31355>	2024-09-25 14:21:46 +00:00
Iago Toral Quiroga	09e0e53a3b	broadcom/compiler: avoid register conflict with ldunif(a) and ldvary ldvary instructions have implicit writes to rf0 (r5 in Pi4) that are read in follow-up instructions to complete the interpolation calculations so we rather not allocate ldunif(a)'s dst to rf0/r5 during these sequence too to facilitate pairing. This gives us -0.25% of instructions for fragment shaders in shader-db for Pi5 and -0.64% on Pi4. Shader-db Pi5: total instructions in shared programs: 10890641 -> 10889197 (-0.01%) instructions in affected programs: 575506 -> 574062 (-0.25%) helped: 2506 HURT: 1378 Instructions are helped. total max-temps in shared programs: 2226555 -> 2226471 (<.01%) max-temps in affected programs: 5061 -> 4977 (-1.66%) helped: 139 HURT: 78 Max-temps are helped. total sfu-stalls in shared programs: 15143 -> 15166 (0.15%) sfu-stalls in affected programs: 310 -> 333 (7.42%) helped: 134 HURT: 195 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 10905784 -> 10904363 (-0.01%) inst-and-stalls in affected programs: 577053 -> 575632 (-0.25%) helped: 2497 HURT: 1415 Inst-and-stalls are helped. total nops in shared programs: 183945 -> 183672 (-0.15%) nops in affected programs: 3862 -> 3589 (-7.07%) helped: 478 HURT: 234 Nops are helped. Shader-db Pi4: total instructions in shared programs: 12842116 -> 12835720 (-0.05%) instructions in affected programs: 996970 -> 990574 (-0.64%) helped: 6027 HURT: 367 Instructions are helped. total max-temps in shared programs: 2251877 -> 2251707 (<.01%) max-temps in affected programs: 2670 -> 2500 (-6.37%) helped: 167 HURT: 9 Max-temps are helped. total sfu-stalls in shared programs: 21132 -> 21093 (-0.18%) sfu-stalls in affected programs: 114 -> 75 (-34.21%) helped: 92 HURT: 55 Sfu-stalls are helped. total inst-and-stalls in shared programs: 12863248 -> 12856813 (-0.05%) inst-and-stalls in affected programs: 1008237 -> 1001802 (-0.64%) helped: 6070 HURT: 359 Inst-and-stalls are helped. total nops in shared programs: 281645 -> 281200 (-0.16%) nops in affected programs: 2241 -> 1796 (-19.86%) helped: 501 HURT: 88 Nops are helped. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31355>	2024-09-25 14:21:46 +00:00
Iago Toral Quiroga	917e8e5439	broadcom/compiler: rename is_ldunif_dst to try_rf0 We flag nodes used to ldunif dst so we can try and favor allocating rf0 to them, so be more explicit about its purpose. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31355>	2024-09-25 14:21:46 +00:00
Iago Toral Quiroga	68014b0d9b	broadcom/compiler: skip small immediates optimization on vpm instructions total instructions in shared programs: 11164938 -> 10890641 (-2.46%) instructions in affected programs: 6557250 -> 6282953 (-4.18%) helped: 59134 HURT: 9752 Instructions are helped. total threads in shared programs: 431068 -> 431034 (<.01%) threads in affected programs: 68 -> 34 (-50.00%) helped: 0 Threads are HURT. total uniforms in shared programs: 3880437 -> 5308006 (36.79%) uniforms in affected programs: 2669367 -> 4096936 (53.48%) helped: 2 HURT: 74046 Uniforms are HURT. total max-temps in shared programs: 2244298 -> 2226555 (-0.79%) max-temps in affected programs: 463611 -> 445868 (-3.83%) helped: 17473 HURT: 8040 Max-temps are helped. total spills in shared programs: 4312 -> 4318 (0.14%) spills in affected programs: 0 -> 6 helped: 0 HURT: 2 total fills in shared programs: 6508 -> 6514 (0.09%) fills in affected programs: 0 -> 6 helped: 0 HURT: 2 total sfu-stalls in shared programs: 14794 -> 15143 (2.36%) sfu-stalls in affected programs: 1261 -> 1610 (27.68%) helped: 238 HURT: 586 Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). total inst-and-stalls in shared programs: 11179732 -> 10905784 (-2.45%) inst-and-stalls in affected programs: 6570407 -> 6296459 (-4.17%) helped: 59126 HURT: 9786 Inst-and-stalls are helped. total nops in shared programs: 273422 -> 183945 (-32.72%) nops in affected programs: 139446 -> 49969 (-64.17%) helped: 60679 HURT: 2277 Nops are helped. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31259>	2024-09-23 07:45:46 +00:00
Konstantin Seurer	ce24486ee4	nir: Introduce nir_debug_info_instr Adds a new instruction type that stores metadata that might be useful for debugging purposes. Passes must ignore these instructions when making decisions. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18903>	2024-08-25 10:26:33 +00:00
Iago Toral Quiroga	ad9ff707ce	broadcom: drop backend implementation of nir_op_ufind_msb We can have NIR do this for us now that we have uclz. Suggested by Georg Lehmann. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30614>	2024-08-13 13:16:18 +02:00
Iago Toral Quiroga	35a10f5d5a	broadcom: implement nir_op_uclz This enables some algebraic optimizations. No changes in shader-db, but it does cause some CTS tests to produce less instructions. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30614>	2024-08-13 13:16:11 +02:00
Alyssa Rosenzweig	c3d999dec9	broadcom: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30570>	2024-08-09 13:54:11 +00:00
Zan Dobersek	7fd5f76393	nir/lower_vars_to_scratch: calculate threshold-limited variable size separately ir3's lowering of variables to scratch memory has to treat 8-bit values as 16-bit ones when comparing such value's size against the given threshold since those values are handled through 16-bit half-registers. But those values can still use natural 8-bit size and alignment for storing inside scratch memory. nir_lower_vars_to_scratch now accepts two size-and-alignment functions, one used for calculating the variable size and the other for calculating the size and alignment needed for storing inside scratch memory. Non-ir3 uses of this pass can just duplicate the currently-used function. ir3 provides a separate variable-size function that special-cases 8-bit types. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>	2024-08-07 14:32:28 +00:00
Iago Toral Quiroga	086ed1e54b	broadcom/compiler: emit instructions producing flags earlier We usually emit flags right before consuming them but this is suboptimal from the point of view of register pressure: if an instruction is only used to generate flags then waiting to emit it right before reading the flags extends the liveness of the sources used to generate the flags for no gain. This pass will check for such instructions and try to move them as early as possible. Shader-db results below show this is effective to reduce register pressure, allowing a few shaders to increase thread counts and/or reduce spilling: total instructions in shared programs: 11057173 -> 11057076 (<.01%) instructions in affected programs: `1955543` -> 1955446 (<.01%) helped: 4214 HURT: 3905 Inconclusive result (value mean confidence interval includes 0). total threads in shared programs: 425096 -> 425170 (0.02%) threads in affected programs: 74 -> 148 (100.00%) helped: 37 HURT: 0 Threads are helped. total uniforms in shared programs: 3846275 -> 3845674 (-0.02%) uniforms in affected programs: 23574 -> 22973 (-2.55%) helped: 217 HURT: 30 Uniforms are helped. total max-temps in shared programs: 2222910 -> 2220488 (-0.11%) max-temps in affected programs: 61904 -> 59482 (-3.91%) helped: 2145 HURT: 113 Max-temps are helped. total spills in shared programs: 4294 -> 4280 (-0.33%) spills in affected programs: 148 -> 134 (-9.46%) helped: 8 HURT: 0 total fills in shared programs: 6497 -> 6468 (-0.45%) fills in affected programs: 291 -> 262 (-9.97%) helped: 8 HURT: 0 total sfu-stalls in shared programs: 14344 -> 14611 (1.86%) sfu-stalls in affected programs: 1308 -> 1575 (20.41%) helped: 217 HURT: 335 Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 11071517 -> 11071687 (<.01%) inst-and-stalls in affected programs: 1946767 -> 1946937 (<.01%) helped: 4191 HURT: 3909 Inconclusive result (value mean confidence interval includes 0). total nops in shared programs: 270628 -> 269829 (-0.30%) nops in affected programs: 22032 -> 21233 (-3.63%) helped: 1213 HURT: 571 Inconclusive result (%-change mean confidence interval includes 0). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30511>	2024-08-07 09:28:39 +02:00
Daniel Stone	e05415a82e	format: Generate endian-independent format aliases Instead of having a hardcoded list of endian-independent format aliases in the header, generate them from the format definitions. Signed-off-by: Daniel Stone <daniels@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>	2024-07-19 13:50:42 +00:00

1 2 3 4 5 ...

939 commits