Rhys Perry
5d92942241
nir/search: remove creation of swizzle
...
match_expression() only accesses the first instr->def.num_components
elements, so we don't need to ensure the rest are zero.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39808 >
2026-02-12 14:47:06 +00:00
jiajia Qian
f16d17a454
nir/opt_phi_precision: Fix bit size mismatch when moving widening conversions
...
Add a check to ensure that when load_const can be narrowed, the bit size
from other widening conversion sources must be 16-bit to maintain
consistency across all phi sources.
Signed-off-by: jiajia Qian <jiajia.qian@nxp.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39773 >
2026-02-12 12:27:55 +00:00
Karol Herbst
a274b9c6a8
nak: Fold constant ishl into shared ld/st/atoms
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Totals:
CodeSize: 9459006048 -> 9458124656 (-0.01%); split: -0.01%, +0.00%
Number of GPRs: 47358402 -> 47358138 (-0.00%)
SLM Size: 5409064 -> 5409024 (-0.00%)
Static cycle count: 6129914910 -> 6129436959 (-0.01%); split: -0.01%, +0.00%
Spills to memory: 44471 -> 44453 (-0.04%)
Fills from memory: 44471 -> 44453 (-0.04%)
Spills to reg: 186364 -> 186365 (+0.00%); split: -0.00%, +0.00%
Fills from reg: 226975 -> 226976 (+0.00%); split: -0.00%, +0.00%
Max warps/SM: 50638680 -> 50638804 (+0.00%)
Totals from 9700 (0.83% of 1163204) affected shaders:
CodeSize: 234188480 -> 233307088 (-0.38%); split: -0.43%, +0.05%
Number of GPRs: 567950 -> 567686 (-0.05%)
SLM Size: 39952 -> 39912 (-0.10%)
Static cycle count: 225267269 -> 224789318 (-0.21%); split: -0.26%, +0.05%
Spills to memory: 4792 -> 4774 (-0.38%)
Fills from memory: 4792 -> 4774 (-0.38%)
Spills to reg: 33250 -> 33251 (+0.00%); split: -0.00%, +0.01%
Fills from reg: 27531 -> 27532 (+0.00%); split: -0.00%, +0.01%
Max warps/SM: 349200 -> 349324 (+0.04%)
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39709 >
2026-02-11 03:42:05 +01:00
Karol Herbst
18bf6fb96d
nir: add nvidias shared memory non unform address shift
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39709 >
2026-02-11 03:41:23 +01:00
Georg Lehmann
fbc0562203
nir/algebraic: allow inexact optimizations with sz/inf/nan preserve
...
Vulkan says these options only apply after possible contract/reassoc/transform
optimizations using real number rules.
No Foz-DB Navi48:
Totals from 3923 (4.76% of 82405) affected shaders:
MaxWaves: 113159 -> 113121 (-0.03%); split: +0.01%, -0.05%
Instrs: 6946272 -> 6933510 (-0.18%); split: -0.22%, +0.03%
CodeSize: 38894140 -> 38844432 (-0.13%); split: -0.16%, +0.03%
VGPRs: 206280 -> 206412 (+0.06%); split: -0.06%, +0.12%
Latency: 45991075 -> 45964455 (-0.06%); split: -0.09%, +0.03%
InvThroughput: 8555282 -> 8546561 (-0.10%); split: -0.15%, +0.05%
VClause: 159765 -> 159745 (-0.01%); split: -0.05%, +0.04%
SClause: 160199 -> 160263 (+0.04%); split: -0.07%, +0.11%
Copies: 550751 -> 550432 (-0.06%); split: -0.17%, +0.11%
Branches: 192949 -> 192960 (+0.01%)
PreSGPRs: 189198 -> 189314 (+0.06%); split: -0.07%, +0.13%
PreVGPRs: 142732 -> 142544 (-0.13%); split: -0.33%, +0.20%
VALU: 3579904 -> 3569665 (-0.29%); split: -0.34%, +0.05%
SALU: 1072897 -> 1072440 (-0.04%); split: -0.18%, +0.14%
VMEM: 262759 -> 262791 (+0.01%)
SMEM: 246224 -> 246230 (+0.00%)
VOPD: 369734 -> 369207 (-0.14%); split: +0.08%, -0.23%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
4e2f1345d8
nir/opt_algebraic: make fcmp(a+b, 0.0) -> fcmp(a, -b) exact using ninf
...
And remove some cases that never happen because we remove fneg on compare with constants.
Foz-DB Navi48:
Totals from 1305 (1.58% of 82405) affected shaders:
MaxWaves: 32872 -> 32854 (-0.05%)
Instrs: 4554013 -> 4551638 (-0.05%); split: -0.06%, +0.01%
CodeSize: 25269108 -> 25255428 (-0.05%); split: -0.06%, +0.00%
VGPRs: 87660 -> 87732 (+0.08%)
Latency: 33291152 -> 33285023 (-0.02%); split: -0.03%, +0.01%
InvThroughput: 8965288 -> 8963071 (-0.02%); split: -0.03%, +0.00%
VClause: 104008 -> 103947 (-0.06%); split: -0.09%, +0.03%
SClause: 97577 -> 97574 (-0.00%); split: -0.01%, +0.00%
Copies: 372741 -> 372628 (-0.03%); split: -0.05%, +0.02%
Branches: 134076 -> 134072 (-0.00%)
PreSGPRs: 65109 -> 65110 (+0.00%); split: -0.00%, +0.00%
PreVGPRs: 68911 -> 68968 (+0.08%); split: -0.01%, +0.10%
VALU: 2247091 -> 2245815 (-0.06%); split: -0.07%, +0.01%
SALU: 810190 -> 810001 (-0.02%); split: -0.02%, +0.00%
VOPD: 205075 -> 205016 (-0.03%); split: +0.04%, -0.07%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
ef7dd040d9
nir/opt_algebraic: make a < 0.0 ? -a : a exact using search helpers
...
Foz-DB Navi21:
Totals from 104 (0.13% of 82405) affected shaders:
Instrs: 175964 -> 175514 (-0.26%); split: -0.26%, +0.00%
CodeSize: 909008 -> 908744 (-0.03%); split: -0.05%, +0.02%
Latency: 1515203 -> 1514560 (-0.04%); split: -0.05%, +0.01%
InvThroughput: 308751 -> 308573 (-0.06%); split: -0.06%, +0.00%
Copies: 10318 -> 10315 (-0.03%); split: -0.06%, +0.03%
PreVGPRs: 5767 -> 5755 (-0.21%)
VALU: 108151 -> 107745 (-0.38%)
VOPD: 738 -> 737 (-0.14%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
0474ad1504
nir/opt_algebraic: make ffract(is_integral) exact using nnan
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
b8d1763e0a
nir/opt_algebraic: make some more fcmp patterns exact using nnan
...
No Foz-DB changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
8d52c59505
nir/opt_algebraic: make some fmin/fmax/fsat patterns exact using nsz/nnan
...
Foz-DB Navi48:
Totals from 90 (0.11% of 82405) affected shaders:
Instrs: 52109 -> 52032 (-0.15%); split: -0.16%, +0.01%
CodeSize: 263916 -> 263900 (-0.01%); split: -0.05%, +0.05%
Latency: 504693 -> 504775 (+0.02%); split: -0.01%, +0.03%
InvThroughput: 81444 -> 81157 (-0.35%)
Copies: 2894 -> 2895 (+0.03%)
VALU: 30097 -> 29991 (-0.35%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
486ea54184
nir/opt_algebraic: make bcsel(fcmp(b, a), b, a) -> fmin/fmax patterns exact
...
These patterns need is_only_used_as_float because fmin/fmax might change NaN
patterns, while bcsel is bit exact. For the same reason, the replacement
must not add undefined results, so make the replacement NaN/inf preserving.
It's impossible to make them signed zero correct (-0.0 == +0.0),
so it's also important that the user alu doesn't care.
Otherwise, the only thing that matters is is whether a is NaN.
Foz-DB Navi48:
Totals from 453 (0.55% of 82405) affected shaders:
MaxWaves: 8242 -> 8270 (+0.34%)
Instrs: 2382059 -> 2380094 (-0.08%); split: -0.09%, +0.00%
CodeSize: 13197208 -> 13179488 (-0.13%); split: -0.14%, +0.00%
VGPRs: 44688 -> 44604 (-0.19%)
Latency: 22839894 -> 22838985 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 4873352 -> 4872924 (-0.01%)
VClause: 50862 -> 50883 (+0.04%); split: -0.02%, +0.06%
SClause: 54000 -> 53993 (-0.01%)
Copies: 250215 -> 250233 (+0.01%); split: -0.00%, +0.01%
PreVGPRs: 39694 -> 39620 (-0.19%)
VALU: 1116881 -> 1116073 (-0.07%); split: -0.07%, +0.00%
SALU: 492799 -> 492139 (-0.13%); split: -0.14%, +0.00%
VOPD: 85457 -> 85461 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
aa78083477
nir: make alu fp_math_ctrl helpers const
...
No Foz-DB changes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
f55668bb50
nir/opt_algebraic: update flt -> fneu patterns
...
And remove the ones that are redundant because we already move the fneg to
the constant source.
No Foz-DB changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
15b13d5fd4
nir/opt_algebraic: optimize flt/fge(#c, fadd(a, #b))
...
I guess these were missing because the author forgot flt/fge aren't commutative.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
2355b63cb5
nir/opt_algebraic: use better float control for some fcmp patterns
...
Foz-DB Navi48:
Totals from 1084 (1.32% of 82405) affected shaders:
Instrs: 1969973 -> 1968947 (-0.05%); split: -0.08%, +0.02%
CodeSize: 11349704 -> 11344884 (-0.04%); split: -0.06%, +0.02%
VGPRs: 59076 -> 59064 (-0.02%); split: -0.06%, +0.04%
Latency: 20766031 -> 20755032 (-0.05%); split: -0.07%, +0.01%
InvThroughput: 2849402 -> 2846733 (-0.09%); split: -0.10%, +0.01%
VClause: 40736 -> 40740 (+0.01%)
SClause: 91835 -> 91832 (-0.00%)
Copies: 217961 -> 217868 (-0.04%); split: -0.07%, +0.02%
Branches: 60045 -> 60031 (-0.02%)
PreSGPRs: 50639 -> 50618 (-0.04%); split: -0.06%, +0.02%
PreVGPRs: 39593 -> 39590 (-0.01%); split: -0.01%, +0.01%
VALU: 960270 -> 959524 (-0.08%); split: -0.10%, +0.02%
SALU: 326638 -> 326680 (+0.01%); split: -0.04%, +0.06%
VOPD: 23963 -> 23929 (-0.14%); split: +0.04%, -0.18%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
7238888d93
nir/opt_algebraic: remove redundant patterns with fcmp(fneg(...), #c)
...
We already have patterns to move the negation to the constant.
No Foz-DB changes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:03 +00:00
Georg Lehmann
03c497f236
nir/opt_algebraic: make 1.0 - fsat(a) -> fsat(1.0 - a) pattern exact using nnan
...
Foz-DB Navi48:
Totals from 50 (0.06% of 82405) affected shaders:
CodeSize: 137072 -> 137456 (+0.28%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
79e4530a9b
nir/opt_algebraic: make pattern pushing fmul into bcsel exact
...
The only special case here is d == -0.0.
Foz-DB Navi48:
Totals from 3 (0.00% of 82405) affected shaders:
CodeSize: 29140 -> 29188 (+0.16%)
InvThroughput: 2945 -> 2951 (+0.20%)
VALU: 3217 -> 3223 (+0.19%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
a3bc94a3d0
nir/opt_algebraic: remove inexact from floor->trunc pattern
...
This was marked inexact because of me in !21475 , but I don't see why now,
even after checking all the special values.
No Foz-DB changes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
da7abb1337
nir/opt_algebraic: mark fmulz(finite, finite) -> fmul pattern as nsz
...
No Foz-DB chagnes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
ea87f1f9bc
nir/opt_algebraic: add a - a with nnan
...
Foz-DB Navi48:
Totals from 576 (0.70% of 82405) affected shaders:
MaxWaves: 16706 -> 16726 (+0.12%)
Instrs: 618677 -> 580965 (-6.10%); split: -6.10%, +0.00%
CodeSize: 3022552 -> 2861612 (-5.32%); split: -5.33%, +0.00%
VGPRs: 28008 -> 28860 (+3.04%); split: -0.51%, +3.56%
Latency: 2689318 -> 2655887 (-1.24%); split: -1.25%, +0.01%
InvThroughput: 403512 -> 393404 (-2.51%); split: -2.51%, +0.00%
VClause: 7584 -> 7577 (-0.09%); split: -0.17%, +0.08%
SClause: 19974 -> 19086 (-4.45%); split: -4.48%, +0.03%
Copies: 43862 -> 40888 (-6.78%); split: -6.87%, +0.09%
Branches: 12457 -> 11407 (-8.43%)
PreSGPRs: 28315 -> 27046 (-4.48%); split: -4.53%, +0.05%
PreVGPRs: 20751 -> 19397 (-6.52%)
VALU: 317224 -> 290151 (-8.53%); split: -8.53%, +0.00%
SALU: 124297 -> 121347 (-2.37%); split: -2.39%, +0.02%
VMEM: 11918 -> 11907 (-0.09%)
SMEM: 27582 -> 26241 (-4.86%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
16db9f79d1
nir/opt_algebraic: remove inexact a * 0.0 patterns
...
We already have some with nnan,nsz.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
63d199a01e
nir: remove special fp_math_ctrl rules
...
All opcodes should now respect the nan/inf/sz preserving flags.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
e443229644
nir/opt_algebraic: mark newly created fmulz nan/inf preserving
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
b678899ef8
nir/opt_algebraic: use nan/inf/sz preserve flags instead of exact for cmp/min/max replacement
...
And remove some, because they should be covered by the search pattern anyway.
Foz-DB Navi48:
Totals from 560 (0.68% of 82405) affected shaders:
MaxWaves: 11279 -> 11291 (+0.11%)
Instrs: 5214229 -> 5214386 (+0.00%); split: -0.02%, +0.02%
CodeSize: 29613884 -> 29616740 (+0.01%); split: -0.01%, +0.02%
VGPRs: 50400 -> 50328 (-0.14%)
Latency: 36481700 -> 36481157 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 7309905 -> 7307905 (-0.03%); split: -0.05%, +0.02%
VClause: 131423 -> 131424 (+0.00%); split: -0.00%, +0.00%
SClause: 111485 -> 111499 (+0.01%); split: -0.00%, +0.01%
Copies: 441899 -> 442029 (+0.03%); split: -0.02%, +0.05%
Branches: 165599 -> 165597 (-0.00%)
PreVGPRs: 43558 -> 43525 (-0.08%)
VALU: 2573609 -> 2573324 (-0.01%); split: -0.03%, +0.02%
SALU: 851172 -> 851271 (+0.01%); split: -0.01%, +0.02%
VOPD: 366409 -> 366934 (+0.14%); split: +0.23%, -0.08%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
a8ad72b912
nir/search: add option to set nan/inf/sz preserve on replacement patterns
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
52eab085e6
nir/lower_uniform_subgroup: use nan/inf preserve instead of exact for feq
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
30da75e8b1
nir/lower_double_ops: don't create more exact ops than the input requires
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Georg Lehmann
e2301164c7
nir/format_convert: use nan/inf preserve flag for fmax instead of exact
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641 >
2026-02-10 18:42:02 +00:00
Daniel Schürmann
e362011cca
nir/loop_analyze: also set force_unroll if the array_size is larger than max_trip_count
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Loop peeling can reduce the trip_count. It is also not
necessary that the array_size exactly matches the trip_count.
Totals from 54 (0.06% of 84383) affected shaders: (Navi48)
MaxWaves: 758 -> 884 (+16.62%)
Instrs: 284511 -> 343292 (+20.66%)
CodeSize: 1524940 -> 1837996 (+20.53%)
VGPRs: 5904 -> 5544 (-6.10%)
Scratch: 18432 -> 0 (-inf%)
Latency: 7317179 -> 7186789 (-1.78%); split: -1.80%, +0.02%
InvThroughput: 1646024 -> 1545357 (-6.12%); split: -6.19%, +0.08%
VClause: 5840 -> 6867 (+17.59%); split: -1.92%, +19.50%
SClause: 6959 -> 7935 (+14.03%)
Copies: 25516 -> 31310 (+22.71%); split: -4.87%, +27.58%
Branches: 9205 -> 10571 (+14.84%); split: -3.25%, +18.09%
PreSGPRs: 5586 -> 5394 (-3.44%); split: -3.67%, +0.23%
PreVGPRs: 5087 -> 4674 (-8.12%); split: -8.18%, +0.06%
VALU: 145243 -> 174719 (+20.29%)
SALU: 53128 -> 67594 (+27.23%); split: -0.00%, +27.23%
VMEM: 8911 -> 10221 (+14.70%); split: -1.41%, +16.11%
SMEM: 8519 -> 9509 (+11.62%)
VOPD: 419 -> 796 (+89.98%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39778 >
2026-02-10 09:24:23 +00:00
Daniel Schürmann
b5439c4fbf
nir/opt_loop_unroll: Always unroll loops with a known trip-count of 0
...
Loop peeling decrements the calculated trip count, which might
result in a known trip-count of 0 for single-iteration loops.
Thus, also unroll loops if max_trip_count == 0 and exact_trip_count_known.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39778 >
2026-02-10 09:24:23 +00:00
Faith Ekstrand
02bade5cfa
nir/lower_bool_to_bit_size: Make smarter canonicalization choices
...
Instead of blindly taking the first source, take the first source that
isn't a constant. That way we won't accidentally expand things to
32-bit just because a constant came first.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725 >
2026-02-09 18:16:40 +00:00
Faith Ekstrand
711b3358a8
nir/lower_bool_to_bit_size: Use the correct num_components for conversions
...
There's a nice little comment here saying we use the same write mask (an
out of date term in NIR) and swizzle but we're no longer actually doing
that. Depending on nir_builder magic, we may actually generate a scalar
when we really want a vector. The fix is to use more builder helpers
and just eat the potential copy.
Fixes: 3180656bbc ("nir: don't use nir_build_alu() with incomplete sources")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725 >
2026-02-09 18:16:40 +00:00
Alyssa Rosenzweig
91550d0709
nir: disable fast-math for lowering conversions
...
the lowerings for e.g. f2f16_rtp have carefully written sequences using
Infinity. nir_opt_algebraic will stomp right through this. `feq x, inf`
without an exact flag is basically always a bug. Disable fast math here.
Fixes OpenCL CTS test_half on Iris.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39740 >
2026-02-09 17:22:02 +00:00
Iago Toral Quiroga
f6a2d14008
nir/opt_vectorize_load_store: allow sizes unaligned with high offset for loads
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was added specifically for vectorized stores, so allow for loads.
Without this, the pass will fail to vectorize 2 consecutive 16-bit loads
into a single 32-bit load.
Fixes: 2ed79f80ba ("nir/load_store_vectorize: Skip new bit-sizes that are unaligned with high_offset")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39713 >
2026-02-09 07:59:21 +00:00
Kenneth Graunke
beb4b78fe7
intel: Rename intel_msaa_flags to intel_fs_config
...
This started out as dynamic configuration for MSAA related state, but
has since expanded to cover many dynamic fragment shader options.
We rename it to intel_fs_config, similar to intel_tess_config, to
better indicate its purpose.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748 >
2026-02-06 20:51:43 -08:00
Daniel Schürmann
f71a38e9de
nir/opt_load_store_vectorize: don't use shared2 vectorization across blocks
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Besides the undesireable combinations this can produce,
it would also require to update the last_entry in every
previous block.
Totals from 99 (0.12% of 84383) affected shaders: (Navi48)
Instrs: 288989 -> 289727 (+0.26%); split: -0.02%, +0.28%
CodeSize: 1542572 -> 1546616 (+0.26%); split: -0.02%, +0.28%
SpillSGPRs: 17 -> 16 (-5.88%)
Latency: 2104020 -> 2103286 (-0.03%); split: -0.17%, +0.13%
InvThroughput: 472380 -> 472265 (-0.02%); split: -0.08%, +0.05%
VClause: 9778 -> 9779 (+0.01%)
Copies: 24937 -> 25173 (+0.95%); split: -0.05%, +0.99%
Branches: 10124 -> 10156 (+0.32%); split: -0.01%, +0.33%
PreSGPRs: 6112 -> 6091 (-0.34%)
PreVGPRs: 4079 -> 4069 (-0.25%); split: -0.39%, +0.15%
VALU: 120208 -> 120421 (+0.18%); split: -0.03%, +0.21%
SALU: 56338 -> 56312 (-0.05%); split: -0.09%, +0.04%
VOPD: 34 -> 37 (+8.82%)
Fixes: 4ca7ee7bd7 ('nir/opt_load_store_vectorize: Allow to vectorize at most one entry of each type across blocks')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39733 >
2026-02-06 16:34:15 +00:00
Daniel Schürmann
5e86cfac8e
nir/opt_load_store_vectorize: Vectorize speculatable instructions across blocks
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This should always be safe.
Totals from 446 (0.53% of 84383) affected shaders: (Navi48)
Instrs: 995942 -> 994416 (-0.15%); split: -0.17%, +0.02%
CodeSize: 5500372 -> 5489900 (-0.19%); split: -0.20%, +0.01%
SpillSGPRs: 197 -> 195 (-1.02%)
Latency: 14872922 -> 14851646 (-0.14%); split: -0.15%, +0.00%
InvThroughput: 2395050 -> 2391537 (-0.15%); split: -0.15%, +0.00%
VClause: 20207 -> 20195 (-0.06%); split: -0.07%, +0.01%
SClause: 27090 -> 26427 (-2.45%); split: -2.51%, +0.07%
Copies: 84182 -> 84228 (+0.05%); split: -0.08%, +0.13%
Branches: 22927 -> 22928 (+0.00%)
PreSGPRs: 27275 -> 27524 (+0.91%); split: -0.02%, +0.93%
PreVGPRs: 29116 -> 29131 (+0.05%)
VALU: 545565 -> 545549 (-0.00%); split: -0.01%, +0.00%
SALU: 124275 -> 124329 (+0.04%); split: -0.05%, +0.09%
VMEM: 39044 -> 39030 (-0.04%)
SMEM: 44052 -> 43205 (-1.92%)
VOPD: 32354 -> 32337 (-0.05%); split: +0.02%, -0.07%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39373 >
2026-02-06 10:16:50 +00:00
Daniel Schürmann
4ca7ee7bd7
nir/opt_load_store_vectorize: Allow to vectorize at most one entry of each type across blocks
...
The idea is to initialize the vectorization table with one
entry from the previous blocks if it's the same for all predecessors.
In order to not speculatively load out-of-bounds, backends need to
set a new bounds_checked_modes option indicating variable modes
for which per-component bounds checks are supported.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39373 >
2026-02-06 10:16:50 +00:00
Daniel Schürmann
0a07ea20e6
nir/opt_load_store_vectorize: create add_entry_to_hash_table() helper
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39373 >
2026-02-06 10:16:50 +00:00
Daniel Schürmann
e5bd9cbf90
nir/opt_load_store_vectorize: use linear allocator instead of ralloc
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39373 >
2026-02-06 10:16:49 +00:00
Georg Lehmann
5e2f28e723
nir: remove split unpack_half opcodes
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511 >
2026-02-06 06:12:36 +00:00
Georg Lehmann
81e3162cf8
microsoft/compiler: switch to a backend specific unpack half opcode
...
Sadly, just f2f32 isn't enough for dxil.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511 >
2026-02-06 06:12:36 +00:00
Georg Lehmann
45cb1d3b6f
nir/opt_algebraic: remove unpack_half_2x16_split
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511 >
2026-02-06 06:12:36 +00:00
Georg Lehmann
5a2ef27f7d
nir/format_convert: use f2f32 instead of unpack_half
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511 >
2026-02-06 06:12:36 +00:00
Georg Lehmann
a3bd2ae465
nir/opt_16bit_tex_image: remove unpack_half support
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511 >
2026-02-06 06:12:36 +00:00
Georg Lehmann
6f7d4cd75b
nir/lower_tex: use f2f32 instead of unpack_half
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511 >
2026-02-06 06:12:36 +00:00
Georg Lehmann
609c46cf23
nir/lower_alu_width: emit f2f32 for unpack_half_2x16
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511 >
2026-02-06 06:12:36 +00:00
Georg Lehmann
b18d9c1b33
nir/opt_algebraic: optimize unpack_32_2x16 of extract
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511 >
2026-02-06 06:12:36 +00:00
Timothy Arceri
da6c3ad237
nir: speedup nir_find_inlinable_uniforms()
...
Here we speedup nir_find_inlinable_uniforms() by making sure we only
check a src is inlinable once.
If we have a bunch of nested if-statements where the conditions keep
building on the alu chains of previous conditions we can end up
with exponential processing times due to repeatedly processing the
same srcs over and over.
A big cause of the exponential grow seems to be instructions like
`ffma %594, %594, %599` or `fmul %600, %600` where each essentially
causes us to process the entire previous part of the chain
twice.
Shaders such as that in issue #14663 took multiple minutes to
compile previously, calling collect_src_uniforms billions of times
and now compile within a second with this change.
Closes : mesa/mesa#14663
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39664 >
2026-02-05 23:19:29 +00:00