This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
We already throw out any variables which may have a complex use so we
just need to make sure that our mode checks don't assert if we have a
deref which may_be but not must_be nir_var_function_temp.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9068>
NIR derefs currently have exactly one variable mode. This is about to
change so we can handle OpenCL generic pointers. In order to transition
safely, we need to audit every deref->mode check. This commit adds a
set of helpers that provide more nuanced mode checks and converts most
of NIR to use them.
For simple cases, we add nir_deref_mode_is and nir_deref_mode_is_one_of
helpers. These can be used in passes which don't have to bother with
generic pointers and just want to know what mode a thing is. If the
pass ever encounters generic pointers in a way that this check would be
unsafe, it will assert-fail to alert developers that they need to think
harder about things and fix the pass.
For more complex passes which require a more nuanced understanding of
modes, we add nir_deref_mode_may_be and nir_deref_mode_must_be helpers
which accurately describe the compiler's best knowledge about the given
deref. Unfortunately, we may not be able to exactly identify the mode
in a generic pointers scenario so we have to be very careful when we use
these. Conversion of these passes is left to later commits.
For the case of mass lowering of a particular mode (nir_lower_explicit_io
is one good example), we add nir_deref_mode_is_in_set. This is also
pretty assert-happy like nir_deref_mode_is but is for a set containment
comparison on deref modes where you expect the deref to either be all-in
or all-out.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
We already have the variable so we know the mode exactly. Just use that
instead of the deref mode. If these paths ever have to handle variable
pointers (not likely since they're OpenGL-specific), we can fix them to
handle crazy deref modes then.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
Out-of-bounds writes could be eliminated per spec:
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds writes may be discarded or overwrite
other variables of the active program."
Fixes: 1235850522
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
This pass works entirely with variables, all we have to do is ignore any
derefs we see which we can't chase back to the variable. The one
interesting case we have to handle is if we have a complex use of a
deref_var. In that case, we have to flag it non-constant.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6210>
We also very slightly change the semantics. It no longer is one index
per list for global variables and is a single index over-all.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
We make some assumptions in opt_large_constants such as the size_align
function returning the obvious sizes for vectors. Now that we've got
the deref_size lying around, we may as well assert it's consistent with
our assumptions. In particular, we now assert that it really claims
booleans are 32-bit. If anyone's driver ever decides to be clever and
change this, we'll now catch the breakage earlier.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
In f1883cc73d we tried to pass through alignments from load_constant
intrinsics when rewriting them to load_ubo in iris. However, those
intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices. It's easy
enough to add them. We just call the size/align function on the vector
type at the end of our deref chain and use the alignment returned from
there. It's possible we could do better by walking the whole deref
chain but this should be good enough.
Fixes: f1883cc73d "iris: Set alignments on cbuf0 and constant reads"
Closes: #2739
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
This fixes some piglit tests on radeonsi NIR where a varying is
initialized to a constant array in the vertex shader. Varying packing
after nir_lower_io_to_temporaries creates writemasked stores which
persist after pulling the constant initialization down into the fragment
shader.
While we're here, rewrite handle_constant_store() to do the loop over
components outside the switch, so that we don't have to duplicate the
writemask checking for every bitsize.
Fixes: 1235850522 ("nir: Add a large constants optimization pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2: by J.Ekstrand suggestion moved lowering of large
constants after lowering of copy_deref is done.
CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
A filed of nir_variable.location may be equel to -1.
That may cause copying to invalid address of list-node,
making some internal fields corrupted.
Patch fixes segfault during freeing context due to
corrupted address of ralloc_header.destructor.
v2: copy data if var is constant (Connor Abbott)
CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b6d4753568 (nir/large_constants: De-duplicate constants)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111676
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
If a function has a constant and is called more than once, after
inlining we may end up with different variables representing the same
constant. This commit look into the data and de-duplicate them.
The first pass now will collect the constant data in a per variable
buffer, then de-duplication happens (by sorting then linear walk), and
the second pass will use the data in var->data.location.
One side-effect of the current implementation is that constants will
be reordered. If this turns out to be a problem is something that can
be fixed.
An alternative strategy considered was to perform this in a
per-function basis and then merge the results, the problem is that we
would have to fix up the offsets during the merge. Given the data we
have, the current patch is good enough.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used later on to allocate constant data for each
variable (and then deduplicate). Also drop initializing found_read,
as it is already implicitly false in the literal.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Relax the restriction that all the writes need to be in the first
block: now accept variables that have all the writes in the same
block, and all the reads are dominated by that block.
This let the pass identify large constants that are local to a helper
function. The writes will be at the place that the function is
inlined, possibly not in the first block (but still all in the same
block).
Results for vkpipeline-db in SKL:
total instructions in shared programs: 3624891 -> 3623145 (-0.05%)
instructions in affected programs: 79416 -> 77670 (-2.20%)
helped: 16
HURT: 0
total cycles in shared programs: 1458149667 -> 1458147273 (<.01%)
cycles in affected programs: 30154164 -> 30151770 (<.01%)
helped: 14
HURT: 2
total loops in shared programs: 2437 -> 2437 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 8813 -> 8745 (-0.77%)
spills in affected programs: 2894 -> 2826 (-2.35%)
helped: 8
HURT: 0
total fills in shared programs: 23470 -> 23392 (-0.33%)
fills in affected programs: 12248 -> 12170 (-0.64%)
helped: 6
HURT: 2
LOST: 0
GAINED: 0
Results for shader-db in SKL with Iris:
total instructions in shared programs: 15379442 -> 15379392 (<.01%)
instructions in affected programs: 837 -> 787 (-5.97%)
helped: 2
HURT: 2
helped stats (abs) min: 27 max: 27 x̄: 27.00 x̃: 27
helped stats (rel) min: 10.47% max: 10.67% x̄: 10.57% x̃: 10.57%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 1.23% max: 1.23% x̄: 1.23% x̃: 1.23%
95% mean confidence interval for instructions value: -39.14 14.14
95% mean confidence interval for instructions %-change: -15.51% 6.17%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 4880 -> 4880 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 370677237 -> 370676567 (<.01%)
cycles in affected programs: 17852 -> 17182 (-3.75%)
helped: 2
HURT: 1
helped stats (abs) min: 338 max: 356 x̄: 347.00 x̃: 347
helped stats (rel) min: 13.98% max: 14.64% x̄: 14.31% x̃: 14.31%
HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
total spills in shared programs: 11772 -> 11772 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 24948 -> 24948 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 0
GAINED: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This pass searches for reasonably large local variables which can be
statically proven to be constant and moves them into shader constant
data. This is especially useful when large tables are baked into the
shader source code because they can be moved into a UBO by the driver to
reduce register pressure and make indirect access cheaper.
v2 (Jason Ekstrand):
- Use a size/align function to ensure we get the right alignments
- Use the newly added deref offset helpers
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>