We have a name for that, it's called a uvec. This just makes the
function name a bit shorter. While we're here, we also add an assert
for one of the assumptions this function makes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There is nothing inherent about these opcodes that requires them to only
take scalars. It's very convenient if we let them take vectors as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Found by inspection. This doesn't help much now but we'll see this
pattern with images if you load UNORM and then store UNORM.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166916 -> 15166910 (<.01%)
instructions in affected programs: 761 -> 755 (-0.79%)
helped: 6
HURT: 0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds the "(a << N) >> M" family of mask or sign-extensions. Not a
huge win right now but this pattern will soon be generated by NIR format
lowering code.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166918 -> 15166916 (<.01%)
instructions in affected programs: 36 -> 34 (-5.56%)
helped: 2
HURT: 0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If it's not the right bit-size, it may not actually be the correct
extraction. For now, we'll only worry about 32-bit versions.
Fixes: 905ff86198 "nir: Recognize open-coded extract_u16"
Fixes: 76289fbfa8 "nir: Recognize open-coded extract_u8"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds support for unrolling the classic
do {
// ...
} while (false)
that is used to wrap multi-line macros. GLSL IR also wraps switch
statements in a loop like this.
shader-db results IVB:
total loops in shared programs: 2515 -> 2512 (-0.12%)
loops in affected programs: 33 -> 30 (-9.09%)
helped: 3
HURT: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis. In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This avoids validation issues with some CTS tests with the following
patch, but its possible this we could also see the same problem with
the existing unrolling passes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.
This will be used in the following patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().
One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This peephole optimization looks for a series of load/store_deref or
copy_deref instructions that copy an array from one variable to another
and turns it into a copy_deref that copies the entire array. The
pattern it looks for is extremely specific but it's good enough to pick
up on the input array copies in DXVK and should also be able to pick up
the sequence generated by spirv_to_nir for a OpLoad of a large composite
followed by OpStore. It can always be improved later if needed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass looks for variables with vector or array-of-vector types and
narrows the type to only the components used.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass looks for array variables where at least one level of the
array is never indirected and splits it into multiple smaller variables.
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through arrays of arrays and can detect indirects on just
one level or even see that arr[i][0][5] does not alias arr[i][1][j].
This pass exists to help other passes more easily see through arrays of
arrays. If a back-end does implement arrays using scratch or indirects
on registers, having more smaller arrays is likely to have better memory
efficiency.
v2 (Jason Ekstrand):
- Better comments and naming (some from Caio)
- Rework to use one hash map instead of two
v2.1 (Jason Ekstrand):
- Fix a couple of bugs that were added in the rework including one
which basically prevented it from running
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through structures and considers them to be "split". This
pass exists to help other passes more easily see through structure
variables. If a back-end does implement arrays using scratch or
indirects on registers, having more smaller arrays is likely to have
better memory efficiency.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Since there's no particular reason for the index to be 0, choose an
index that is not used by other block. This is convenient when we
store "per-block" data in an array AND look for the successors
data (e.g. any kind of backwards data-flow analysis).
v2: Add a note about end_block's index. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Deref paths may share the same deref instructions in their chains,
e.g.
ssa_100 = deref_var A
ssa_101 = deref_struct "array_field" of ssa_100
ssa_102 = deref_array "[1]" of ssa_101
ssa_103 = deref_struct "field_a" of ssa_102
ssa_104 = deref_struct "field_a" of ssa_103
when comparing the two last deref instructions, their paths will share
a common sequence ssa_100, ssa_101, ssa_102. This patch skips to next
iteration if the deref instructions are the same. Path[0] (the var)
is still handled specially, so in the case above, only ssa_101 and
ssa_102 will be skipped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only used, when asserts are enabled.
Fixes an unused-variable warning with gcc-8:
../../../src/compiler/nir/nir_opt_if.c: In function 'opt_peel_loop_initial_if':
../../../src/compiler/nir/nir_opt_if.c:109:15: warning: unused variable 'prev_block' [-Wunused-variable]
nir_block *prev_block =
^~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The innermost check was added to stop us from unrolling multiple
loops in a single pass, and to stop outer loops from unrolling.
When we successfully unroll a loop we need to run the analysis
pass again before deciding if we want to go ahead an unroll a
second loop.
However the logic was flawed because it never tried to unroll any
nested loops other than the first innermost loop it found.
If this innermost loop is not unrolled we end up skipping all
other nested loops.
This unrolls a loop in a Deus Ex: MD shader on ultra settings and
also unrolls a loop in a shader from the game Prey when running
on DXVK.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Equivalent to the already existing how_declared at GLSL IR. The only
difference is that we are not adding all the declaration_type
available on GLSL, only the one that we will use on the short term. We
would add more mode if needed on the future.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.
Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Python 3 lost the long type: now everything is an int, with the right
size.
This commit makes the script compatible with Python 2 (where we check
for both int and long) and Python 3 (where we only check for int).
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Mixing the two is a long-standing recipe for errors in Python 2, so much
so that Python 3 now completely separates them.
This commit stops treating both as if they were the same, and in the
process makes the script compatible with both Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The code was just reimplementing itertools.combinations_with_replacement
in a less efficient way.
This does change the order of the results slightly, but it should be ok.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
We're trying to write a unicode string (i.e decoded) to a file opened
in binary (i.e encoded) mode.
In Python 2 this works, because of the automatic conversion between
byte and unicode strings.
In Python 3 this fails though, as no automatic conversion is attempted.
This change makes the scripts compatible with both versions of Python.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14276892 -> 14276886 (<.01%)
instructions in affected programs: 484 -> 478 (-1.24%)
helped: 2
HURT: 0
total cycles in shared programs: 532578397 -> 532578395 (<.01%)
cycles in affected programs: 3522 -> 3520 (-0.06%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
No changes on any Gen platform.
v2: s/fmax/fmin/. Noticed by Thomas Helland.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277220 -> 14277216 (<.01%)
instructions in affected programs: 422 -> 418 (-0.95%)
helped: 2
HURT: 0
total cycles in shared programs: 532577908 -> 532577848 (<.01%)
cycles in affected programs: 2800 -> 2740 (-2.14%)
helped: 2
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Unlike the much older -abs(a) >= 0.0 transformation, this is not
precise. The behavior changes if the source is NaN.
No shader-db changes on any platform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>