fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-28 16:28:14 +02:00

Author	SHA1	Message	Date
Faith Ekstrand	58cba7887a	nir: Add a new nir_texop_gradient_pan Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>	2026-05-05 01:27:16 +00:00
Faith Ekstrand	e0fffabda7	nir/builder: Allow backend1/2 in nir_build_tex() Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>	2026-05-05 01:27:16 +00:00
Faith Ekstrand	337aaa0ab9	pan,nir: Add cube face intrinsics Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>	2026-05-05 01:27:15 +00:00
Rhys Perry	081feabf9c	nir/search: fix nir_algebraic_automaton after constant folding op(bcsel) Likely fixes https://gitlab.freedesktop.org/mesa/mesa/-/jobs/98917704 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `f4812dc11d` ("nir/opt_constant_folding: constant-fold op(bcsel(), #c) -> bcsel(.., #c1, #c2)") Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41343>	2026-05-04 17:27:38 +00:00
Daniel Schürmann	f4812dc11d	nir/opt_constant_folding: constant-fold op(bcsel(), #c) -> bcsel(.., #c1, #c2) for all ALU instructions except fneg instead of using nir_opt_algebraic for a small subset. Totals from 17711 (8.49% of 208640) affected shaders: (Navi48) MaxWaves: 364391 -> 364397 (+0.00%); split: +0.01%, -0.01% Instrs: 33873994 -> 33780398 (-0.28%); split: -0.31%, +0.03% CodeSize: 198627596 -> 198259724 (-0.19%); split: -0.23%, +0.05% VGPRs: 1435516 -> 1435144 (-0.03%); split: -0.04%, +0.02% SpillSGPRs: 652827 -> 654577 (+0.27%); split: -0.00%, +0.27% SpillVGPRs: 594840 -> 593598 (-0.21%); split: -0.28%, +0.07% Scratch: 31791360 -> 31543552 (-0.78%) Latency: 417824569 -> 415881858 (-0.46%); split: -0.48%, +0.02% InvThroughput: 80376232 -> 80307996 (-0.08%); split: -0.10%, +0.01% VClause: 557238 -> 554770 (-0.44%); split: -0.50%, +0.06% SClause: 688297 -> 688125 (-0.02%); split: -0.04%, +0.02% Copies: 3571756 -> 3566704 (-0.14%); split: -0.44%, +0.29% Branches: 628710 -> 628576 (-0.02%); split: -0.07%, +0.05% PreSGPRs: 1100316 -> 1103478 (+0.29%); split: -0.02%, +0.30% PreVGPRs: 1132139 -> 1128765 (-0.30%); split: -0.30%, +0.00% VALU: 18944830 -> 18912030 (-0.17%); split: -0.20%, +0.03% SALU: 4363054 -> 4342748 (-0.47%); split: -0.57%, +0.10% VMEM: 1894420 -> 1891754 (-0.14%); split: -0.19%, +0.05% SMEM: 1073860 -> 1073741 (-0.01%); split: -0.01%, +0.00% VOPD: 1734659 -> 1735718 (+0.06%); split: +0.20%, -0.14% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40848>	2026-05-04 09:42:59 +00:00
Daniel Schürmann	8b1c60add4	nir/opt_constant_folding: create const_value_for_alu() helper Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40848>	2026-05-04 09:42:59 +00:00
Georg Lehmann	52b195b4e8	nir/opt_algebraic: add more fmulz pattern Totals from 3 (0.00% of 202440) affected shaders: (Navi48) Instrs: 5684 -> 5641 (-0.76%); split: -0.77%, +0.02% CodeSize: 30952 -> 30708 (-0.79%); split: -0.80%, +0.01% Latency: 9236 -> 9199 (-0.40%); split: -0.42%, +0.02% InvThroughput: 2287 -> 2273 (-0.61%) VALU: 3900 -> 3884 (-0.41%) SALU: 305 -> 289 (-5.25%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40848>	2026-05-04 09:42:59 +00:00
Georg Lehmann	38e691fc0a	nir/opt_varyings: do no_signed_zero linking even for non removable stores Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details E.g. position in VS. Foz-DB Navi48: Totals from 948 (0.79% of 120695) affected shaders: MaxWaves: 26816 -> 26828 (+0.04%) Instrs: 799692 -> 796993 (-0.34%); split: -0.34%, +0.01% CodeSize: 3855744 -> 3846816 (-0.23%); split: -0.24%, +0.01% VGPRs: 50256 -> 50220 (-0.07%) Latency: 2209359 -> 2207667 (-0.08%); split: -0.09%, +0.01% InvThroughput: 305260 -> 303519 (-0.57%); split: -0.57%, +0.00% VClause: 11640 -> 11643 (+0.03%); split: -0.01%, +0.03% SClause: 21152 -> 21149 (-0.01%) Copies: 51658 -> 51675 (+0.03%); split: -0.11%, +0.14% Branches: 18656 -> 18655 (-0.01%) PreVGPRs: 37999 -> 37984 (-0.04%) VALU: 469752 -> 467406 (-0.50%); split: -0.50%, +0.00% SALU: 105433 -> 105323 (-0.10%); split: -0.11%, +0.00% Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41292>	2026-05-03 19:55:10 +00:00
Georg Lehmann	fac4edbcba	nir/opt_varyings: back propagate signed zero information to outputs Foz-DB Navi48: Totals from 809 (0.67% of 120695) affected shaders: MaxWaves: 21804 -> 21808 (+0.02%) Instrs: 863131 -> 861310 (-0.21%); split: -0.22%, +0.01% CodeSize: 4535500 -> 4523232 (-0.27%); split: -0.30%, +0.03% VGPRs: 47304 -> 47280 (-0.05%) SpillSGPRs: 170 -> 82 (-51.76%) Latency: 6791484 -> 6786880 (-0.07%); split: -0.07%, +0.00% InvThroughput: 906281 -> 905301 (-0.11%); split: -0.11%, +0.00% VClause: 16910 -> 16917 (+0.04%); split: -0.01%, +0.05% SClause: 21856 -> 21827 (-0.13%); split: -0.14%, +0.01% Copies: 61890 -> 61436 (-0.73%); split: -0.80%, +0.06% Branches: 19725 -> 19640 (-0.43%) PreSGPRs: 38011 -> 37851 (-0.42%) PreVGPRs: 36482 -> 36454 (-0.08%) VALU: 465316 -> 464323 (-0.21%); split: -0.22%, +0.00% SALU: 143757 -> 143395 (-0.25%); split: -0.33%, +0.08% VMEM: 36827 -> 36806 (-0.06%) SMEM: 37769 -> 37768 (-0.00%) Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41292>	2026-05-03 19:55:10 +00:00
Georg Lehmann	b2bc57551a	nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics Just like for ALU. No Foz-DB changes. Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41292>	2026-05-03 19:55:10 +00:00
Marek Olšák	f583f6e717	nir: use nir_build_frag_coord everywhere nir_build_frag_coord generates the correct sysval loads based on NIR options. nir_load_frag_coord shouldn't be used directly because drivers don't have to support it. v2: RADV can't use it because nir->options isn't set, so use load_pixel_coord. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>	2026-05-03 13:03:01 +00:00
Marek Olšák	b63a9a8b39	nir: add direct lowered frag_coord building to replace lowering passes Instead of lowering frag_coord 4 times during compilation, just use this. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>	2026-05-03 13:03:00 +00:00
Marek Olšák	9c5ad16819	nir/opt_frag_coord_to_pixel_coord: handle frag_coord_xy Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>	2026-05-03 13:03:00 +00:00
Marek Olšák	076b0aaf1d	nir/lower_wpos_ytransform: handle frag_coord_xy Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>	2026-05-03 13:03:00 +00:00
Marek Olšák	e49f29f25e	nir: add frag_coord_xy to strengthen and simplify pixel_coord lowering Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>	2026-05-03 13:03:00 +00:00
Daniel Schürmann	012d72f2b0	nir/opt_algebraic: add some imul24_relaxed pattern Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41178>	2026-05-01 10:07:26 +00:00
Daniel Schürmann	708093d830	nir/opt_algebraic: use imul24_relaxed for lowered dot4x8_add Totals from 28 (0.04% of 72819) affected shaders: (Navi10) MaxWaves: 181 -> 186 (+2.76%) Instrs: 406735 -> 338360 (-16.81%) CodeSize: 2913588 -> 2469712 (-15.23%) VGPRs: 5520 -> 5468 (-0.94%) SpillVGPRs: 32 -> 0 (-inf%) LDS: 64512 -> 62464 (-3.17%) Scratch: 10240 -> 0 (-inf%) Latency: 11028252 -> 4357120 (-60.49%) InvThroughput: 11004126 -> 4079018 (-62.93%) VClause: 1686 -> 2055 (+21.89%); split: -0.89%, +22.78% SClause: 890 -> 852 (-4.27%) Copies: 4516 -> 2644 (-41.45%); split: -41.59%, +0.13% PreSGPRs: 982 -> 974 (-0.81%) PreVGPRs: 5356 -> 4284 (-20.01%) VALU: 370529 -> 330201 (-10.88%) SALU: 28850 -> 1170 (-95.94%) VMEM: 2616 -> 2560 (-2.14%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41178>	2026-05-01 10:07:25 +00:00
Lorenzo Rossi	63aceb07ff	nir/opt_sink: Add pan-specific load_input Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>	2026-04-30 18:26:10 +00:00
Lorenzo Rossi	30d8f9c554	nir/lower_point_size: Handle 16-bit point sizes panfrost has float16 point size, handling that precision too allows the compiler to call lower_point_size later in the compilation pipeline Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>	2026-04-30 18:26:10 +00:00
Lorenzo Rossi	2a7d817591	nir/opt_algebraic: optimize fadd/fmul with 16-bit source and constant Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096>	2026-04-30 17:33:09 +00:00
Lorenzo Rossi	89436db611	nir: Extract float_is_half tests in common code Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096>	2026-04-30 17:33:09 +00:00
Karol Herbst	4e67582ddf	nir: add fmul_rtz optimizations NVK is only going to use it for `fmul_rtz(frcp(ipa), ipa)` patterns, so try not too hard to optimize this. Totals from 10 (0.00% of 1212873) affected shaders: CodeSize: 34480 -> 34288 (-0.56%); split: -0.60%, +0.05% Static cycle count: 6225 -> 6132 (-1.49%); split: -1.57%, +0.08% Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>	2026-04-30 15:42:40 +00:00
Karol Herbst	2e09b4ac68	nir: handle fmul_rtz in a couple of places Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>	2026-04-30 15:42:40 +00:00
Karol Herbst	4e520f671c	nir: add fmul_rtz It's needed in NVK for correctness with interpolation. Backport-to: 26.1 Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>	2026-04-30 15:42:40 +00:00
Marek Olšák	a3e3bf0ac2	nir/opt_dce: add shader_info::assert_inputs_not_dead Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41166>	2026-04-30 07:07:32 +00:00
Marek Olšák	7bd5856cc6	nir/opt_dce: factor out dead instruction removal into a helper Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41166>	2026-04-30 07:07:32 +00:00
Alyssa Rosenzweig	0c49738211	nir/opt_reassociate: fix exactness bug For an inexact-associative operation (fadd or fmul), can_reassociate ensures the root of the chain is inexact to allow reassociating. However, build_chain just checks for opcodes to match up after, although we do sum up exactness across the chain. Although an Effort Was Made, it still seems incorrect to reassociate %3 = fadd! %0, %1 %4 = fadd %3, %2 to instead be (ex.) %3 = fadd! %0, %2 %4 = fadd! %3, %1 Closes: #14418 Fixes: `e0b0f7e73c` ("nir: add ALU reassocation pass") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41162>	2026-04-28 21:14:56 +00:00
Georg Lehmann	599a52174b	nir: disable fp class analysis for 64bit transcendentals Some backends have terrible precision for these fp64 opcodes, so don't try to do anything clever. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15334 Fixes: `5a298f3560` ("nir: rewrite fp range analysis as a fp class analysis") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41206>	2026-04-28 13:26:42 +00:00
Simon Perretta	57791c4a99	pco: track how many tg4/raw sample comps are needed Rather than always emitting and swizzling 16 components for raw samples, scale it by the number actually needed as defined by the selected tg4 channel/components. Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>	2026-04-28 12:04:03 +01:00
Marek Olšák	3dcba87ca3	nir/opt_licm: hoist instructions across multiple levels of nested loops radv gfx12: Totals: Instrs: 42861311 -> 42861476 (+0.00%); split: -0.00%, +0.00% CodeSize: 227917476 -> 227918160 (+0.00%); split: -0.00%, +0.00% Latency: 265381068 -> 265373506 (-0.00%); split: -0.00%, +0.00% InvThroughput: 42954018 -> 42952350 (-0.00%) VClause: 819026 -> 819024 (-0.00%) SClause: 1210348 -> 1210293 (-0.00%) Copies: 2919525 -> 2919597 (+0.00%); split: -0.00%, +0.00% PreSGPRs: `2889432` -> 2889406 (-0.00%) VALU: 23757371 -> 23757377 (+0.00%); split: -0.00%, +0.00% SALU: 5981417 -> 5981485 (+0.00%); split: -0.00%, +0.00% VOPD: 8966 -> 8964 (-0.02%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>	2026-04-27 23:58:21 +00:00
Marek Olšák	8e036fcaec	nir/opt_licm: use nir_metadata_control_flow Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>	2026-04-27 23:58:21 +00:00
Marek Olšák	e0112be522	nir/opt_licm: add a private state structure for the pass The structure will grow in later commits. The major change is that the preheader and exit blocks are replaced by tracking just the innermost optimized nir_loop * and getting the predecessor and successor blocks out of it. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>	2026-04-27 23:58:20 +00:00
Timothy Arceri	a42c55da46	amd/radeonsi: dont clamp packed user varyings ac_nir_optimize_outputs() might pack user varyings into the color built-ins. If this happens we skip adding clamping to the components that contain the user varying. This change also fixes a second bug where a color built-in can be packed into a non-color slot and was no longer being clamped. Fixes: `3777a5d7` ("radeonsi: assign param export indices before compilation") Closes: #14443 Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40594>	2026-04-27 22:59:58 +00:00
Simon Perretta	af1669d9e2	pco: reserve additional outputs for trilinear sampled coeffs Sampling coeffs with trilinear filtering will output 2x sets of data. Whether bilinear or trilinear filtering is in use can't be determined without checking state words, so unconditionally reserve 2x to avoid clobbering output regs. Fixes: `7df32ba09d` ("pco: initial texture/sampler compiler support") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41051>	2026-04-27 11:32:29 +00:00
squidbus	a41f0e62bb	asahi,nir: Move asahi dynamic clipz pass to common. Acked-by: Alyssa Rosenzweig <alyssa@rosenz.ca> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41088>	2026-04-27 11:00:59 +00:00
Rhys Perry	91d555c2cb	radv: lower indirect derefs after linking Scratch access isn't very optimizable, so more stores are optimized away if we lower indirect derefs after both linking and radv_optimize_nir. fossil-db (navi21): Totals from 1264 (0.62% of 202427) affected shaders: Instrs: 1504703 -> 1504708 (+0.00%); split: -0.02%, +0.02% CodeSize: 8031388 -> 8031020 (-0.00%); split: -0.02%, +0.02% SpillSGPRs: 1865 -> 1869 (+0.21%) Latency: 12106362 -> 12106464 (+0.00%); split: -0.01%, +0.01% InvThroughput: 4056269 -> 4056044 (-0.01%); split: -0.01%, +0.00% VClause: 13927 -> 13940 (+0.09%) SClause: 32382 -> 32396 (+0.04%); split: -0.03%, +0.08% Copies: 188004 -> 187897 (-0.06%); split: -0.17%, +0.11% Branches: 39045 -> 39052 (+0.02%); split: -0.01%, +0.03% PreSGPRs: 79885 -> 79814 (-0.09%); split: -0.11%, +0.02% VALU: 1072639 -> 1072532 (-0.01%); split: -0.01%, +0.00% SALU: 187317 -> 187375 (+0.03%); split: -0.11%, +0.14% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31265>	2026-04-24 11:01:03 +00:00
Alyssa Rosenzweig	6a43e6c9e0	nir/opt_algebraic: add redundant u2u32/unpack_64_2x32_split_x patterns reduces hello world kernel 57 -> 44 inst on jay. why do we have two opcodes that do literally the same thing? :/ Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41085>	2026-04-23 19:54:21 +00:00
Daniel Schürmann	806fcc6193	nir/opt_loop: always try to peel initial break from loops with unrolling hint Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This allows to unroll these loops, even if loop analyze is unable to calculate the iteration count. As always with loops, the throughput stats are meaningless. Totals from 6 (0.00% of 202440) affected shaders: (Navi48) Instrs: 7825 -> 6201 (-20.75%) CodeSize: 37056 -> 30412 (-17.93%) Latency: 21563 -> 16934 (-21.47%) InvThroughput: 144649 -> 77962 (-46.10%) SClause: 139 -> 133 (-4.32%) Copies: 536 -> 388 (-27.61%) Branches: 156 -> 84 (-46.15%) PreVGPRs: 298 -> 296 (-0.67%); split: -1.01%, +0.34% VALU: 2493 -> 2378 (-4.61%); split: -4.65%, +0.04% SALU: 3263 -> 2199 (-32.61%) SMEM: 188 -> 183 (-2.66%) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40349>	2026-04-22 10:34:58 +00:00
Daniel Schürmann	738cc6a7db	nir/opt_loop: stop recursion at loop header phi in can_constant_fold() Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40349>	2026-04-22 10:34:58 +00:00
Daniel Schürmann	1f9a0490c6	nir/opt_loop: Don't peel initial break from do-while loops As the main purpose of this optimization is to transform while- into do-while loops, don't apply on loops which are already in do-while form. Also set nir_loop::do_while after this transformation, so that it is only applied once. Totals from 576 (0.28% of 202440) affected shaders: (Navi48) Instrs: 1337529 -> 1253438 (-6.29%); split: -6.36%, +0.07% CodeSize: 8390852 -> 7837328 (-6.60%); split: -6.61%, +0.01% VGPRs: 50856 -> 50844 (-0.02%) SpillSGPRs: 42198 -> 35395 (-16.12%); split: -16.13%, +0.01% SpillVGPRs: 47608 -> 44620 (-6.28%) Latency: 31043828 -> 44143753 (+42.20%); split: -0.06%, +42.26% InvThroughput: 6973433 -> 10079000 (+44.53%); split: -0.08%, +44.61% VClause: 26839 -> 24718 (-7.90%); split: -7.91%, +0.00% SClause: 21831 -> 21583 (-1.14%); split: -1.52%, +0.38% Copies: 183503 -> 150040 (-18.24%); split: -18.84%, +0.61% Branches: 27738 -> 26848 (-3.21%); split: -5.12%, +1.91% PreSGPRs: 40233 -> 39083 (-2.86%); split: -2.88%, +0.02% PreVGPRs: 38745 -> 38903 (+0.41%); split: -0.02%, +0.43% VALU: 688396 -> 645948 (-6.17%); split: -6.17%, +0.01% SALU: 189792 -> 177642 (-6.40%); split: -6.97%, +0.57% VMEM: 121500 -> 112748 (-7.20%) SMEM: 38765 -> 37767 (-2.57%); split: -2.58%, +0.00% VOPD: 102488 -> 89071 (-13.09%); split: +0.24%, -13.33% Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40349>	2026-04-22 10:34:58 +00:00
Daniel Schürmann	32436731a3	nir: add nir_loop::do_while to indicate do-while loops Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40349>	2026-04-22 10:34:58 +00:00
Eric R. Smith	4ae192a3d9	glsl, spirv: Improve accuracy of asin() and acos() Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The polynomial used for asin_expr() was suboptimal (and its source was not documented). A better approximation is found in the _Handbook_of_Mathematical_Functions_ by Abramowitz and Stegun, which is used in Nvidia's Cg toolkit. However, while this approximation gives a good absolute error bound, its relative error exceeds the 4096 ulp allowed by the Vulkan spec. Taking a page from the spirv implementation of asin(), we implement a piecewise approximation where a Taylor series is used for small values of \|x\|. This patch also harmonizes the GLSL and Vulkan implementations by moving the implementation to common code (nir_builder). Running tests on asin() with a grid of 64000 samples between 0.0 and +1.0, the original asin() at 32 bits has: ``` glsl spirv RMSE: 1.756451e-04 1.609091e-04 worst abs error: 3.904104e-04 at 0.937001 3.904104e-04 at 0.937001 worst ulp error: 11800 at 6.2499e-05 3826 at 0.841331 ``` whereas the new implementation has for both: ``` RMSE: 2.528056e-05 worst abs error: 4.962087e-05 at 0.451149 worst ulp error: 2379 at 0.215106 ``` Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40862>	2026-04-21 21:10:22 +00:00
Brandon Jones	d1dd65d425	nir/opt_algebraic: fix fabs optimization Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This fixes a regression found in blender's unit testing, which called fabs(-0.0) and invoked an NIR optimization that is was not valid for the parameter -0.0. IEEE 754 requires that abs clear the sign bit for the value -0.0. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41060>	2026-04-21 04:10:29 +00:00
Lionel Landwerlin	bbeb6be6eb	nir: expose nir_opt_dce_impl Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>	2026-04-20 21:53:35 +03:00
Patrick Lerda	9815901f86	r600: implement tes and tcs instanced gl_PrimitiveID support This change extends r600_lds_constant_buffer to implement a fully conformant gl_PrimitiveID at the tes and tcs stages. This change was tested on cayman and barts. Here are the tests fixed: spec/arb_tessellation_shader/execution/tcs-primitiveid-instanced: fail pass spec/arb_tessellation_shader/execution/tes-no-tcs-primitiveid-instanced: fail pass spec/arb_tessellation_shader/execution/tes-primitiveid-instanced: fail pass khr-gl4[4-6]/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass khr-gles31/core/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass khr-glesext/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass Signed-off-by: Patrick Lerda <patrick9876@free.fr> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40297>	2026-04-20 13:21:55 +00:00
Janne Grunau	98a97cb413	nir/gather_info: clear interpolation qualifiers only in fragment stage Asahi wants the the interpolation qualifiers from the shader info in the vertex shader. Clear them only in the fragment stage so they can propagate back. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15288 Backport-to: 26.0 Fixes: `a72704d0fb` ("nir/gather_info: clear interpolation qualifiers before gathering") Signed-off-by: Janne Grunau <j@jannau.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41040>	2026-04-19 10:10:15 +00:00
Alyssa Rosenzweig	4b81cb6206	nir/opt_generate_bfi: avoid trivial instructions Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details With the pass order shuffling, code like `(x & 0xf) + (x & 0xfffffff0)` gets optimized to bitfield_select(0xF, x, x). But it would be much better to optimize simply to x. nir_opt_algebraic would do that for us but we run this pass too late for algebraic to save us from ourselves, so be smarter. Observed on dEQP-GLES31.functional.compute.basic.image_atomic_op_local_size_8 with Jay, this saves an instruction there. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40956>	2026-04-16 13:54:41 +00:00
Georg Lehmann	f949e3b819	nir: remove nir_link_xfb_varyings Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details RADV was the last user. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40977>	2026-04-16 08:49:23 +00:00
Marek Olšák	835f5faf14	nir: add back color0/1 system values and VARYING_SLOT_PARAM_GEN_AMD It turns out we need the color sysvals recorded in system_values_read, and PARAM_GEN is for point smoothing. Acked-by: Pierre-Eric Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40556>	2026-04-15 18:12:07 +00:00
Natalie Vock	57f796752d	nir/deref: Elide loads/stores from deref cast of undef These can never be meaningful. DOOM: The Dark Ages also relies on this. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40799>	2026-04-15 08:42:12 +00:00

1 2 3 4 5 ...

7434 commits