When nir_rematerialize_derefs_in_use_blocks_impl was first written, I
attempted to optimize things a bit by not bothering to re-materialize
the sources of deref instructions figuring that the final caller would
take care of that. However, in the case of more complex deref chains
where the first link or two lives in block A and then another link and
the load/store_deref intrinsic live in block B it doesn't work. The
code in rematerialize_deref_in_block looks at the tail of the chain,
sees that it's already in block B and skips it, not realizing that part
of the chain also lives in block A.
The easy solution here is to just rematerialize deref sources of deref
instructions as well. This may potentially lead to a few more deref
instructions being created by the conditions required for that to
actually happen are fairly unlikely and, thanks to the caching, it's all
linear time regardless.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603
Fixes: 7d1d1208c2 "nir: Add a small pass to rematerialize derefs per-block"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
v2: Remove the original ALU instruciton after all of its readers are
modified to read the new ALU instruction.
v3: Fix an issue where a bcsel that may not be executed on a loop
iteration due to a break statement is converted to a phi (and therefore
incorrectly "executed"). Noticed by Tim.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109216
Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A single shader in Unigine Superposition is affected by this change.
A single iadd is moved to the end of a loop. This iadd is involved in
a complex set of logic to terminate the loop, and an extra mov
instruction is inserted. This shader really needs the optimization
suggested by bugzilla #94747, and I expect that to make this tiny
regression go away.
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15047543 -> 15047545 (<.01%)
instructions in affected programs: 565 -> 567 (0.35%)
helped: 0
HURT: 2
total cycles in shared programs: 369977253 -> 369978253 (<.01%)
cycles in affected programs: 127910 -> 128910 (0.78%)
helped: 0
HURT: 2
v2: Skip nir_op_vec{2,3,4} and nir_op_[fi]mov instructions to avoid
infinite optimization loops. Remove the original ALU instruciton after
all of its readers are modified to read the new ALU instruction.
v3: Extend to the more general case. The if the prev-block value from
the phi is not undef, this means the ALU instruction has to be
duplicated in both the prev-block and the continue-block.
Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This will be used in a couple more places soon.
The function name is... horribly long. Neither Matt nor I could think
of any thing that was shorter and still more descriptive than
"is_phi_foo". I'm willing to entertain suggestions.
Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There are a number of reasons for the rewrite.
1. Adding support for packing tess patch varyings in a sane way.
2. Making use of qsort allowing the code to be much easier to
follow.
3. Fixes a bug where different interp types caused component
packing to be skipped for all varyings in some scenarios.
4. Allows us to add a crude live range analysis for deciding
which components should be packed together. This support can
optionally be added in a future patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used in the following patches to determine if we
support packing the components of a varying.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This adds support needed for marking the varyings as used but we
don't actually support packing patches in this patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compact arrays are used for special variables like clip and cull
distances, or tessellation levels. Drivers using compact arrays
assume that these values will always be actual arrays. We don't
want to turn a float[1] gl_CullDistance into a single float; that
would confuse drivers.
Today, i965 uses compact arrays, and Gallium drivers use
nir_lower_io_arrays_to_elements, so we haven't had any overlap
that would demonstrate the issue. Iris will use both.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A couple places in st/nir assume that cull distances have been lowered
away, so it will need to call this lowering pass for drivers which opt
out of the GLSL IR lowering. The Intel backend also calls this pass,
for i965 and anv. We need to only do it once.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We have a GLSL IR pass to convert clip/cull distance float[] arrays
into vec4[2] arrays. In ff281e6204, we attempted to skip this pass
if the GLSL IR lowering had already run. But, that code was not quite
right, as we forgot to strip away the per-vertex IO array layer for
geometry and tessellation shader varyings.
If the GLSL IR pass has run, the variables will not be marked as
"compact". So we can simply check that and bail.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
radeonsi uses a system value for gl_FragCoord rather than an input var.
These get translated into load_frag_coord NIR intrinsics, which lose the
pixel_center_integer and origin_upper_left decorations. To cope with
this, Tim added a shader_info field for pixel_center_integer, and made
glsl_to_nir set it accordingly.
prog_to_nir also needs to handle these fragcoord conventions. Instead
of duplicating the logic to set the info field, just move it to
nir_lower_system_values so it'll happen regardless of who makes the NIR.
(For what it's worth, we don't need an info flag for origin_upper_left,
because radeonsi lowers origin conventions in nir_lower_wpos_ytransform
before nir_lower_system_values destroys the variable and qualifiers.)
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
All things being equal is better to keep the original order. Since
the new block is empty, push the phis in order to tail.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Use the trick of adding and then subtracting 2**52 (52 is the number of
explicit mantissa bits a double-precision floating-point value has) to
implement round-to-even.
Cuts the number of instructions on SKL of the piglit test
fs-roundEven-double.shader_test from 109 to 21.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
These correspond roughly to reading/writing OpenCL global pointers. The
idea is that they just take a bare address and load/store from it. Of
course, exactly what this address means is driver-dependent.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
In order to allow nir_gather_xfb_info to be used on OpenGL,
specifically ARB_gl_spirv.
So, from OpenGL 4.6 spec, section 11.1.2.1, "Output Variables":
"outputs specifying both an *XfbBuffer* and an *Offset* are
captured, while outputs not specifying both of these are not
captured. Values are captured each time the shader writes to such
a decorated object."
This implies that are captured if both are present, and not if one of
those are lacking. Technically, it doesn't explicitly point that
having just one or the other is a mistake. In some cases, glslang is
adding some extra XfbBuffer without XfbOffset around, and mentioning
that technically that is not a bug (see issue#1526)
And for the case of Vulkan, as the same glslang issue mentions, it is
not clear if that should be a mistake or not. But even if it is a
mistake, it is not really needed to be checked on the driver, and we
can let the validation layers to check that.
v2: simplify explicit_xfb_buffer and explicit_offset checks (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Before, we were double-counting the component slots when we had a dvec3
or dvec4. Instead, just add them in once and manually offset the
recorded output offset.
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If we have a transform feedback output like:
float[2] x2_out (VARYING_SLOT_VAR1.x, 0, 0)
which is lowered by nir_lower_io_arrays_to_elements to,
float x2_out (VARYING_SLOT_VAR1.x, 0, 0)
float x2_out@5 (VARYING_SLOT_VAR2.x, 0, 0)
We have to update the destination offset to avoid overwriting
the same value.
v2 (Jason Ekstrand):
- Compute the correct offsets for arrays of vectors and/or doubles
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When a xfb buffer is explicitely declared on a varying
variable, we shouldn't remove it at link time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
used for CL kernels
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: add assert to verify we have at least one valid bit_size
v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With OpenCL some system values match the address bits, but in GLSL we also
have some system values being 64 bit like subgroup masks.
With this it is possible to adjust the builder functions so that depending
on the bit_sizes the correct bit_size is used or an additional argument is
added in case of multiple possible values.
v2: validate dest bit_size
v3: generate hex values in python code
remove useless imports
rename and move bit_sizes
v4: add 1 to legal bit_sizes for front_face
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: rename nir_var_global to nir_var_mem_global
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Passes' function names, separated by comma, listed in NIR_SKIP
environment variable will be skipped in debug mode. The mechanism is
hooked into the _PASS macro, like NIR_PRINT.
The extra macro NIR_SKIP is available as a developer convenience, to
skip at pointer other than the passes entry points.
v2: Fix typo in NIR_SKIP macro. (Bas)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise writes get propagated across atomics if no barrier is
used. Without barrier writes should still be visible in the same
invocation, so an atomic has to be considered a write.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering.
ior: fmax instead of fadd allows removing the fsat.
inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better
with the scalar instruction set.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Even though no one's been brave enough to ever use this pass, I like to
keep it functionally working.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 393b59e077
('nir: Rework nir_lower_constant_initializers() to handle functions')
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
NIR metadata validation verifies that the debug bit was unset (by a call
to nir_metadata_preserve) if a NIR optimization pass made progress on
the shader. With the expectation that the NIR shader consists of only a
single main function, it has been safe to call nir_metadata_preserve()
iff progress was made.
However, most optimization passes calculate progress per-function and
then return the union of those calculations. In the case that an
optimization pass makes progress only on a subset of the functions in
the shader metadata validation will detect the debug bit is still set on
any unchanged functions resulting in a failed assertion.
This patch offers a quick solution (short of a larger scale refactoring
which I do not wish to undertake as part of this series) that simply
unsets the debug bit on unchanged functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>