This is now 100% equivalent to the new rt_resume intrinsic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
Really copying Jason's pass.
Changes:
- Instead of all the intel lowering introduce rt_{execute_callable,trace_ray,resume}
- Add the ability to use scratch intrinsics directly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10339>
About half or more of the text here is actually from Connor Abbot. I've
edited it a bit to bring it up-to-date and make a few things more clear.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
Long ago, the semantics of bcsel were such that it took a single boolean
value and selected between whole vectors. These days, it takes a vector
boolean with the assumption that if you want the old behavior you can
just use a .xxxx swizzle. There currently are no opcodes which use a
output_size of 0 but have a scalar or fixed-vector input. Let's
disallow it for now to force us to think through the semantics again if
this ever comes up as something someone actually wants.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
For debug on Android, it's useful to be able to print shaders to the
android log interface, since you don't usually have stdout/stderr.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9262>
limit==0 is the signal for "don't peephole anything but a move that will
be optimized aways." limit > 0 is "up to N alu instructions may be moved
out." nir-to-tgsi uses ~0 as the indicator of "No, we really need to
eliminate all if instructions" on hardware like i915 that doesn't have
control flow.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
We need to make get it updated after we may have nir_instr_remove()d an
instruction, and when we cross blocks. This didn't really matter before
because the only builder usage was idiv, which other users of
lower_int_to_float were probably never hitting.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
Use a single set and ensure dominance by checking after a equivalent
instruction is found.
Besides removing the need to copy a set, this also lets us resize the set
at the start of the pass in the next commit.
ministat (CSE only):
Difference at 95.0% confidence
-984.956 +/- 28.8559
-6.90075% +/- 0.190231%
(Student's t, pooled s = 26.9052)
ministat (entire run):
Difference at 95.0% confidence
-1246.1 +/- 257.253
-0.998972% +/- 0.205094%
(Student's t, pooled s = 239.863)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6390>
Now that there's a common NIR pass, there's no point in us doing this in
the back-end anymore. In order to use this pass in i965, we do have to
make one tiny change. Gallium runs the pass after assigning input and
output locations and so needs the pass to respect those locations and
num_inputs. i965, however, runs it before any location assignment or
I/O lowering so we don't care. We do, however, need the pass to succeed
with num_inputs == 0 because we set that later.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
In theory you can rerun the info gather pass, but in practice that
doesn't always end well. Be consistent inside this pass and update the
info.
While we're here, change the inputs read to use VERT_BIT_EDGEFLAG.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
The v_mbcnt instructions can take an extra source that they add to
the result. This is not exposed in SPIR-V but we now expose it in NIR.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
These map directly to v_perm_b32 and v_permlane_b32.
Unfortunately there is no corresponding NIR opcode or
intrinsics, and it's too tedious to puzzle these things
together from the existing NIR instructions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
NIR currently doesn't have any intrinsics for a horizontal packed add,
so this one is modeled after AMD's v_sad_u8.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
The helpers will be reused for per-primitive variables that are also
arrayed, so use a more general name.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11252>
At best, this is an extra instruction for NIR to optimize out. At worst,
depending on pass ordering nir_load_output could sneak into the final
NIR, even on drivers that don't support fbfetch.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11255>
if we have
if ... {
return;
} else {
// block X
}
// block Y
phi(X: ...)
then nir_lower_returns tries to move block Y into the else body,
except nir_cf_extract doesn't move the phi. As the return is removed
in the then-body the phi suddenly has the wrong number of arguments
(and the phi doesn't dominate its uses anymore).
In this case we know that the phi has to be single arg, so we can just
rewrite the users of the phis and drop them.
Hit this in my RT adventures, not sure if this is actually reachable
right now, as single arg phis tend to be kind of exceptional outside
of CSSA and we typically call nir_lower_returns pretty early.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11207>
Found in some sottr shaders (originally iand(ishr(a, 16), 0xffff))
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
On AGX, the special register for front facing is inverted from its meaning in
APIs. We need to lower load_front_face to inot(load_back_face). Doing this in
the backend is trivial, but then we would miss out on algebraic optimizations
for the inot.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11199>
These are similar to AYUV, but the channel ordering is different... in
such a way that there's no RGBA format that will make the channels line
up right.
v2: Rebase on bc438c91d9 ("nir/lower_tex: ignore texture_index if
tex_instr has deref src")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9610>
The flt version could have been added in 56e21647e2, but our
collective understanding of NaN and comparisons was poor in 2015. The
new "is_a_number" predicate makes the others possible.
All of the helped shaders in shader-db are either from Mad Max or Skia.
Some of the Skia shaders just get decimated by this change:
instructions helped: shaders/skia/580-4.shader_test FS SIMD8: 81 -> 29 (-64.20%) (scheduled: top-down)
I looked at a couple of those shaders, and they had sequences like:
vec1 32 ssa_44 = flt32 ssa_32, ssa_32
vec1 32 ssa_45 = b32csel ssa_44, ssa_43, ssa_0
vec1 32 ssa_46 = fge32 ssa_32, ssa_32
vec1 32 ssa_47 = b32csel ssa_46, ssa_0, ssa_45
vec1 32 ssa_48 = iand ssa_46, ssa_44
vec1 32 ssa_49 = b32csel ssa_48, ssa_43, ssa_0
ssa_44 is replaced with False. Then ssa_47 selects between ssa_0 and
ssa_0, so ssa_47 and ssa_46 are eliminated. ssa_48 is (False && don't
care), so ssa_48 and ssa_49 are eliminated. After that, many
calculations now involve constants of zero, so they are optimized down
too. So it continues until there's not much left!
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21072238 -> 21071386 (<.01%)
instructions in affected programs: 33722 -> 32870 (-2.53%)
helped: 146
HURT: 1
helped stats (abs) min: 1 max: 62 x̄: 5.84 x̃: 2
helped stats (rel) min: 0.19% max: 62.35% x̄: 4.09% x̃: 1.07%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.20% max: 0.20% x̄: 0.20% x̃: 0.20%
95% mean confidence interval for instructions value: -7.94 -3.65
95% mean confidence interval for instructions %-change: -5.87% -2.25%
Instructions are helped.
total cycles in shared programs: 856203326 -> 856192238 (<.01%)
cycles in affected programs: 749966 -> 738878 (-1.48%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 1226 x̄: 74.92 x̃: 18
helped stats (rel) min: 0.07% max: 49.70% x̄: 2.69% x̃: 0.46%
95% mean confidence interval for cycles value: -104.82 -45.02
95% mean confidence interval for cycles %-change: -4.01% -1.37%
Cycles are helped.
LOST: 4
GAINED: 0
Fossil-db results:
Tiger Lake
Instructions in all programs: 160915223 -> 160898354 (-0.0%)
SENDs in all programs: 6812780 -> 6812780 (+0.0%)
Loops in all programs: 38340 -> 38340 (+0.0%)
Cycles in all programs: 7434144207 -> 7433978462 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304537 -> 304537 (+0.0%)
Ice Lake
Instructions in all programs: 145296298 -> 145279531 (-0.0%)
SENDs in all programs: 6863692 -> 6863692 (+0.0%)
Loops in all programs: 38334 -> 38334 (+0.0%)
Cycles in all programs: 8800257014 -> 8800088384 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334248 -> 334248 (+0.0%)
Skylake
Instructions in all programs: 135891664 -> 135874910 (-0.0%)
SENDs in all programs: 6802946 -> 6802946 (+0.0%)
Loops in all programs: 38331 -> 38331 (+0.0%)
Cycles in all programs: 8444273433 -> 8444130932 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301114 -> 301114 (+0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
If the values are known to be numbers, the the replacements are exact.
This is only applied to the patterns with constants. Constants should
always be numbers, and shaders with NaN constants should be handled in a
different way.
No shader-db or fossil-db changes on any Intel platform. The intention
is to make these patterns more future proof.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Haswell and later Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21049056 -> 21048939 (<.01%)
instructions in affected programs: 4716 -> 4599 (-2.48%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.99% max: 5.43% x̄: 2.80% x̃: 2.51%
95% mean confidence interval for instructions value: -3.46 -2.54
95% mean confidence interval for instructions %-change: -3.22% -2.38%
Instructions are helped.
total cycles in shared programs: 855141411 -> 855141159 (<.01%)
cycles in affected programs: 54491 -> 54239 (-0.46%)
helped: 28
HURT: 5
helped stats (abs) min: 2 max: 34 x̄: 12.82 x̃: 12
helped stats (rel) min: 0.06% max: 2.73% x̄: 0.94% x̃: 0.75%
HURT stats (abs) min: 2 max: 52 x̄: 21.40 x̃: 6
HURT stats (rel) min: 0.11% max: 2.46% x̄: 0.90% x̃: 0.56%
95% mean confidence interval for cycles value: -13.72 -1.55
95% mean confidence interval for cycles %-change: -1.01% -0.31%
Cycles are helped.
Tiger Lake
Instructions in all programs: 160902191 -> 160899554 (-0.0%)
SENDs in all programs: 6812435 -> 6812435 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7428581420 -> 7428555881 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)
A lot of fragment shaders in Shadow of the Tomb Raider were helped, and
a bunch of vertex shaders in Octopath Traveler were hurt.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
It seems worth the small amount of damage to give an extra cushion of
not having to debug problems later.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21043197 -> 21043359 (<.01%)
instructions in affected programs: 4409 -> 4571 (3.67%)
helped: 0
HURT: 25
HURT stats (abs) min: 1 max: 16 x̄: 6.48 x̃: 5
HURT stats (rel) min: 0.39% max: 15.38% x̄: 4.59% x̃: 4.40%
95% mean confidence interval for instructions value: 4.37 8.59
95% mean confidence interval for instructions %-change: 2.93% 6.26%
Instructions are HURT.
total cycles in shared programs: 856175986 -> 856176921 (<.01%)
cycles in affected programs: 58908 -> 59843 (1.59%)
helped: 0
HURT: 25
HURT stats (abs) min: 7 max: 70 x̄: 37.40 x̃: 38
HURT stats (rel) min: 0.27% max: 5.63% x̄: 1.87% x̃: 1.39%
95% mean confidence interval for cycles value: 31.11 43.69
95% mean confidence interval for cycles %-change: 1.35% 2.39%
Cycles are HURT.
No fossil-db changes on any Intel platform.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
When most of these patterns were created, we believed, incorrectly, that
fsat(NaN) was NaN. We have since realized that fsat(NaN) is zero.
Originally, this changed the patterns to use is_a_number. This didn't
help any shaders, so it's easier to just drop the optimizations.
This commit crossed paths with 4c3ad4d065 ("nir/algebraic: mark more
optimization with fsat(NaN) as inexact") and bc123c396a
("nir/algebraic: mark some optimizations with fsat(NaN) as inexact").
Given that these don't impact very many shaders, it seems safer to just
remove them.
As discussed in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8716, I tried
modifying these patterns to use !(b cmp a). Unfortunately, on Intel
GPUs, the results were much worse than just removing the patterns
altogether.
Some other related patterns will be addressed in later commits.
There are still a number of patterns that use the identity fsat(1-X) ==
1 - fsat(X). If X is NaN, the former is zero while the latter is 1.0.
I haven't evaluted these patterns yet. If changes are needed in these
patterns, it should be a separate commit anyway.
v2: Replace arrow `=>` with `->` in comments because the `=>` looks a
lot like `<=` comparison. Suggested by Rhys.
Fixes: 92b75c126b ("nir/algebraic: Replace checks that a value is between (or not) [0, 1]")
Fixes: a7f0c57673 ("nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel hardware had similar results. (Ice Lake shown)
total instructions in shared programs: 20029060 -> 20029670 (<.01%)
instructions in affected programs: 69236 -> 69846 (0.88%)
helped: 0
HURT: 263
HURT stats (abs) min: 1 max: 20 x̄: 2.32 x̃: 1
HURT stats (rel) min: 0.30% max: 11.11% x̄: 1.35% x̃: 0.98%
95% mean confidence interval for instructions value: 1.86 2.78
95% mean confidence interval for instructions %-change: 1.18% 1.52%
Instructions are HURT.
total cycles in shared programs: 979821278 -> 979834425 (<.01%)
cycles in affected programs: 1476848 -> 1489995 (0.89%)
helped: 49
HURT: 204
helped stats (abs) min: 1 max: 812 x̄: 102.31 x̃: 20
helped stats (rel) min: 0.01% max: 21.43% x̄: 2.23% x̃: 0.52%
HURT stats (abs) min: 2 max: 2600 x̄: 89.02 x̃: 16
HURT stats (rel) min: 0.04% max: 27.27% x̄: 1.49% x̃: 0.72%
95% mean confidence interval for cycles value: 13.18 90.75
95% mean confidence interval for cycles %-change: 0.29% 1.25%
Cycles are HURT.
No fossil-db changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>