This means we don't run into undefined behavior when testing nan/inf inputs.
Also make sure that patterns using is_only_used_as_float are signed zero correct.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
This was used because the exact bit meant something different for
comparisons than it did for the replacement expression, but that isn't the
case anymore.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39809>
These patterns need is_only_used_as_float because fmin/fmax might change NaN
patterns, while bcsel is bit exact. For the same reason, the replacement
must not add undefined results, so make the replacement NaN/inf preserving.
It's impossible to make them signed zero correct (-0.0 == +0.0),
so it's also important that the user alu doesn't care.
Otherwise, the only thing that matters is is whether a is NaN.
Foz-DB Navi48:
Totals from 453 (0.55% of 82405) affected shaders:
MaxWaves: 8242 -> 8270 (+0.34%)
Instrs: 2382059 -> 2380094 (-0.08%); split: -0.09%, +0.00%
CodeSize: 13197208 -> 13179488 (-0.13%); split: -0.14%, +0.00%
VGPRs: 44688 -> 44604 (-0.19%)
Latency: 22839894 -> 22838985 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 4873352 -> 4872924 (-0.01%)
VClause: 50862 -> 50883 (+0.04%); split: -0.02%, +0.06%
SClause: 54000 -> 53993 (-0.01%)
Copies: 250215 -> 250233 (+0.01%); split: -0.00%, +0.01%
PreVGPRs: 39694 -> 39620 (-0.19%)
VALU: 1116881 -> 1116073 (-0.07%); split: -0.07%, +0.00%
SALU: 492799 -> 492139 (-0.13%); split: -0.14%, +0.00%
VOPD: 85457 -> 85461 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641>
Removes the runtime code for this, and means we propergate the
signed zero/inf/nan checks to subexpessions too, not just exact.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39616>
The properly terminated regex automatically detects this case now.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39586>
As I introduced another layer of iteration for signed zero testing, the
former logic got unwieldy. In fact, it was already unwieldy enough that I
forgot to clear all_skipped when the assert failed, allowing a failing
test to be marked UNSUPPORTED instead of XFAIL.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
Using a string literal enclosed with the same type of quotation marks
with the outer f-string isn't supported on Python 3.10, which is
currently still with security maintainance.
This leads to syntax error when building Mesa with Python 3.10.
Fix this by alternating these string literals' quotation mark to '' (as
the outer f-string uses "").
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14673
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39372>
Missing f in other cases seems to be caught either elsewhere in the
script or by the C compiler.
Fixes: c49d6e0480 ("nir/algebraic: Elide range clamping of f2u sources")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39031>
The fmul+fadd -> fma rules in nir_opt_algebraic are marked imprecise,
because they are a contraction. However, they respect signed zero/Inf/NaN rules.
As such, it is legal to do this fusion with shader float controls as long as the
exact bit is not set (mapping to SPIR-V NoContract).
Unfortunately, NIR's imprecise rules do not distinguish between contraction
issues versus float special case issues, forcing nir_search to skip all
imprecise rules when any shader float control modes are used. This notably
affects DXVK, which sets shader float controls to get D3D11 float behaviour and
hence loses FMA fusing.
Therefore, we plumb in the exact bit to express NoContract independent of the
float controls, and weaken the requirement for fma fusion to allowable
contraction. For fma splitting, it's a similar issue, as inexact GLSL fma in
SPIR-V is just a multiply add that we're allowed to contract rather than the
real deal.
Drivers that use their own FMA fusing passes (notably, Intel and AMD) are
unaffected, but DXVK-capable drivers using fuse_ffma should like this. Results
on hk shown:
Totals from 2194 (4.06% of 54019) affected shaders:
MaxWaves: 2174272 -> 2175936 (+0.08%); split: +0.08%, -0.01%
Instrs: 1173283 -> 1131494 (-3.56%); split: -3.57%, +0.01%
CodeSize: 8568168 -> 8381724 (-2.18%); split: -2.18%, +0.01%
Spills: 1094 -> 747 (-31.72%)
Fills: 988 -> 681 (-31.07%)
Scratch: 4444 -> 3820 (-14.04%)
ALU: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
FSCIB: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
IC: 215398 -> 215274 (-0.06%)
GPRs: 139865 -> 139032 (-0.60%); split: -1.56%, +0.96%
Uniforms: 414886 -> 414466 (-0.10%); split: -0.14%, +0.04%
Preamble instrs: 646398 -> 644017 (-0.37%); split: -0.43%, +0.07%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>
This is required to optimize FP64 and Int64 shaders generated by
virglrenderer. It generates pack/unpack around every 64-bit op,
which NIR currently can't eliminate. This fixes that.
There is a new constraint ".y", which means that the use of an instruction
should have swizzle.y. This allows us to add patterns that have Y swizzle
on results of instructions.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172>
812b3415 added rules for upcasts with comparisons with a variety of
types. The float & unsigned rules should be ok, but the signed integer rules are
unsound as currently implemented. This can cause end-to-end miscompiles.
I originally hit this issue while debugging a large real world OpenCL kernel. I
found the bug symptoms changed when disabling loop unrolling, which tipped me
off to a compiler bug. I've reduced it to a minimal test case. Imagine my
surprise when I find out the NIR my backend ingested was already constant folded
to be wrong.
In the minimal test case, during optimization we have NIR:
32 %6 = ....
64 %9 = i2i64 %6
64 %44 = load_const (0x0000000000000001)
1 %45 = ilt %9, %44 (0x1)
This is a simple check (int64_t)%6 < 1.
nir_opt_algebraic turns this into:
32 %6 = ...
64 %9 = i2i64 %6
64 %44 = load_const (0x0000000000000001)
64 %55 = load_const (0x0000000080000000 = 2147483648)
1 %56 = ilt %55 (0x80000000), %44 (0x1)
64 %57 = load_const (0x000000007fffffff = 2147483647)
1 %58 = ilt %57 (0x7fffffff), %44 (0x1)
32 %59 = i2i32 %44 (0x1)
1 %60 = ilt %6, %59
1 %61 = ior %58, %60
1 %62 = iand %56, %61
This pile of math constant-folds to an unconditional "false"! The problem is
%56. At first glance, INT32_MIN < 1 is true so %56 should be true. Indeed, it
should. But here's the kicker: both constants are 64-bit here, so the ilt
operation is a 64-bit comparison -- that left-hand side is INT32_MIN
zero-extended to 64-bit for the signed comparison at 64-bit. So in fact, it
evaluates to false, causing the whole expression to go false. If we're going to
do a 64-bit comparison for %56, then we need to sign-extend the bound. So we'll
just adjust the Python and be on our way, right?
Unfortunately the issue is deeper. According to the comment in the generated
nir_opt_algebraic.c file, the guilty algebraic rule is:
('ilt', ('i2i64', 'a@32'), '#b') =>
('iand', ('ilt', -2147483648, 'b'), ('ior', ('ilt', 2147483647, 'b'), ('ilt', 'a', ('i2i32', 'b'))))
From a Python perspective? That rule is correct. -2147483648 < 1 is a true
statement. Adjusting the Python rule is not the appropriate solution here, since
the issue is more fundamental and might affect other rules. The real problem is
the translation of that Python replacement tree into C, incorrectly
zero-extending -2147483648 into 0x0000000080000000 instead of sign-extending to
0xffffffff80000000.
Crawling down the rabbit hole of the generated algebraic file, we see the
constant encoded as:
{ .constant = {
{ nir_search_value_constant, 64 },
nir_type_int, { -0x80000000 /* -2147483648 */ },
} },
NIR correctly translates the negative constant to a C level negate operation of
its absolute value. This maps to the correct sign-extension...
...for all constants except for INT_MIN. Because that constant lacks a ULL
suffix, it is a 32-bit integer. And for this integer (only), negating it hits
signed integer overflow (UB!) and then we end up with an effective
zero-extension when going to 64-bit.
This patch fixes the end-to-end miscompile.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Closes: #11402
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29952>
Those are passed as an optional argument and are declared as a list of
(type, name) tuples.
At the moment this can only be used for conditions.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26214>
Done by hand at each call site but going very quickly with funny Vim motions and
common regexes. This is a very common idiom in NIR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23807>
Builds on the work of !15121. This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on 1a35acd8d9.
v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.
v4: Another rebase. Remove f2b stuff from Midgard.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
A later commit adds a pattern
(('umin', ('iand', a, '#b(is_pos_power_of_two)'),
('iand', c, '#b(is_pos_power_of_two)')),
('iand', ('iand', a, b), ('iand', c, b))),
When I originally made that pattern, I copied and pasted the search to
the replacement as
(('umin', ('iand', a, '#b(is_pos_power_of_two)'),
('iand', c, '#b(is_pos_power_of_two)')),
('iand', ('iand', a, '#b(is_pos_power_of_two)'),
('iand', c, '#b(is_pos_power_of_two)'))),
The caused the variables in the replacement to be marked is_constant,
and that resulted in an assertion failure deep inside nir_search.
src/compiler/nir/nir_search.c:530: construct_value: Assertion `!var->is_constant' failed.
These extra validation rules catch this kind of error at compile time
rather than at run time.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
This gets our union's size down to 22 bytes (now smaller than any of the
union's types were before we made the union!). Cuts another 48kb off of
the drivers.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each variable assuming
variables mostly don't have conditions or the conditions are mostly
reused). Reduces libvulkan_intel.so size by 49kb.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each expression
(assuming expressions mostly don't have conditions or the conditions are
mostly reused). Reduces libvulkan_intel.so size by 8.7kb.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
Even with packing all 3 types into a 40-byte union (nir_search_constant
being 24 bytes and nir_search_expression having formerly been 32), and
having a single array of them, this cuts 1.7MB from each of
libvulkan_intel.so and libgallium_dri.so.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
I'm going to be adding some more tables to reduce relocations in the
generated code, so move the current tables to a struct for arg-passing
sanity.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>