so that drivers don't have to call nir_strip manually.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Previously, this could have made the resource divergent in code like
that which is genereated by nir_lower_non_uniform_access.
Fixes: da8ed68a ('nir: replace nir_move_load_const() with nir_opt_sink()')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Previously, for code like:
loop {
loop {
a = load_ubo()
}
use(a)
}
adjust_block_for_loops() would return the block before the first loop.
Now we compute the range of allowed blocks and then walk the dominance
tree directly, guaranteeing directly that we always choose a block that
dominates all the uses and is dominated by the definition.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
These can appear after loop unrolling.
v2: stylistic changes
v2: replace state->mem_ctx with state->shader
v2: add bounds checking
v3: use nir_intrinsic_range() for bounds checking
v3: fix issue where partially out-of-bounds reads are replaced with undefs
v4: fix merge conflicts during rebase
v5: split into two commits
v6: set constant_data to NULL after freeing (fixes nir_sweep()/Iris)
v7: don't remove the constant data if there are no constant loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v6)
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
We only have the subgroup variant in NIR (equivalent to clockARB), so
only support that for now.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Working on the algebraic implementation, I was being driven nuts by my
editor not highlighting and handling indentation for the C code. It turns
out that it's basically not pass-specific code, and we can move it over to
the relevant .c file. Replaces 30KB of code with 34KB of data on my i965
build. No perf diff on shader-db (n=3)
Reviewed-by: Ian Romanick <ian.d.romainck@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This lets us memoize range analysis work across instructions. Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Having passes generate these is just making more work for copy
propagation (and thus probably calling more optimization passes)
later. Noticed while trying to debug nir_opt_algebraic()
top-to-bottom having O(n^2) behavior due to not finding new matches in
replacement code.
Reviewed-by: Ian Romanick <ian.d.romainck@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This matches what we do for uses_sample_qualifier, and what we
do in ir_set_program_inouts.cpp as well.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From EXT_demote_to_helper_invocation, implemented with the existing
nir_intrinsic_is_helper_invocation.
Such builtin is necessary when using `demote` because we can't
redefine the value of gl_HelperInvocation (since it is an input
variable).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the EXT_demote_to_helper_invocation extension is enabled,
`demote` is treated as a keyword, and produces an ir_demote.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To represent the new `demote` keyword when using
EXT_demote_to_helper_invocation extension. Most of the changes are to
include it in the visitors.
Demote is not considered a control flow, so also include an empty
visit member function in ir_control_flow_visitor.
Only NIR actually supports `demote`, so assert the translations for
TGSI and Mesa's gl_program -- since the demote is not expected to
appear for those.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.
This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
glsl 4.4 spec section '5.9 expressions':
"The operator is multiply (*), where both operands are matrices or one operand is a vector and the
other a matrix. A right vector operand is treated as a column vector and a left vector operand as a
row vector. In all these cases, it is required that the number of columns of the left operand is equal
to the number of rows of the right operand. Then, the multiply (*) operation does a linear
algebraic multiply, yielding an object that has the same number of rows as the left operand and the
same number of columns as the right operand. Section 5.10 “Vector and Matrix Operations”
explains in more detail how vectors and matrices are operated on."
This fix disallows a multiplication of incompatible matrices like:
mat4x3(..) * mat4x3(..)
mat4x2(..) * mat4x2(..)
mat3x2(..) * mat3x2(..)
....
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111664
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
We include shader_enums.h from freedreno's compiler for both GL and
Vulkan, and the main/config.h include resulted in polluting the
namespace with things like MAX_VIEWPORTS that other Vulkan drivers use
as their driver-specific maximums.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This allows the reslut of mov and bcsel to be separately interpreted as
float or int depending on the use.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Some shaders are hurt by this change because now a
load_const(0x00000000) is not recognized as eq_zero when loaded as a
float. This behavior is restored in a later patch (nir/range-analysis:
Use types to provide better ranges from bcsel and mov).
v2: Add a comment about reinterpretation of int/uint/bool. Suggested by
Caio. Rewrite condition the check for types being float versus checking
for types not being all the things that aren't float.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16327543 -> 16328255 (<.01%)
instructions in affected programs: 55928 -> 56640 (1.27%)
helped: 0
HURT: 208
HURT stats (abs) min: 1 max: 16 x̄: 3.42 x̃: 3
HURT stats (rel) min: 0.33% max: 6.74% x̄: 1.31% x̃: 1.12%
95% mean confidence interval for instructions value: 3.06 3.79
95% mean confidence interval for instructions %-change: 1.17% 1.46%
Instructions are HURT.
total cycles in shared programs: 363682759 -> 363683977 (<.01%)
cycles in affected programs: 325758 -> 326976 (0.37%)
helped: 44
HURT: 133
helped stats (abs) min: 1 max: 179 x̄: 33.61 x̃: 5
helped stats (rel) min: 0.06% max: 14.21% x̄: 2.47% x̃: 0.29%
HURT stats (abs) min: 1 max: 157 x̄: 20.28 x̃: 14
HURT stats (rel) min: 0.07% max: 14.44% x̄: 1.42% x̃: 0.73%
95% mean confidence interval for cycles value: 0.38 13.39
95% mean confidence interval for cycles %-change: -0.06% 0.96%
Inconclusive result (%-change mean confidence interval includes 0).
Sandy Bridge
total instructions in shared programs: 10787433 -> 10787443 (<.01%)
instructions in affected programs: 1842 -> 1852 (0.54%)
helped: 0
HURT: 10
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.33% max: 1.85% x̄: 0.73% x̃: 0.49%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.36% 1.10%
Instructions are HURT.
total cycles in shared programs: 153724543 -> 153724563 (<.01%)
cycles in affected programs: 8407 -> 8427 (0.24%)
helped: 1
HURT: 3
helped stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18
helped stats (rel) min: 0.98% max: 0.98% x̄: 0.98% x̃: 0.98%
HURT stats (abs) min: 4 max: 18 x̄: 12.67 x̃: 16
HURT stats (rel) min: 0.21% max: 0.75% x̄: 0.56% x̃: 0.72%
95% mean confidence interval for cycles value: -21.31 31.31
95% mean confidence interval for cycles %-change: -1.11% 1.46%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on Iron Lake or GM45.
When handling two variables with overlapping locations, we process the
one with lower location first, and then extend the location ->
driver_location map to guarantee that it's contiguous for the second
variable too. But the loop had the wrong bound, so we weren't extending
the map 100%, which could lead to problems later such as an incorrect
num_inputs. The loop index i is an index into the slots of the variable,
so we need to stop at the final slot of the variable (var_size) instead
of the number of unassigned slots.
This fixes
spec@arb_enhanced_layouts@execution@component-layout@vs-fs-array-interleave-range
on radeonsi NIR.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Without this, we'll incorrectly round off huge values to the nearest
representable double instead of keeping it at the exact value as
we're supposed to.
Found by inspecting compiler-warnings.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 85faf5082f ("glsl: Add 64-bit integer support for constant expressions")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This can happen with loops with unreachable exits which are later
optimized away.
Fixes assertion in dEQP-VK.graphicsfuzz.unreachable-loops with RADV.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes some piglit tests on radeonsi NIR where a varying is
initialized to a constant array in the vertex shader. Varying packing
after nir_lower_io_to_temporaries creates writemasked stores which
persist after pulling the constant initialization down into the fragment
shader.
While we're here, rewrite handle_constant_store() to do the loop over
components outside the switch, so that we don't have to duplicate the
writemask checking for every bitsize.
Fixes: 1235850522 ("nir: Add a large constants optimization pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
In a3268599f3, I attempted to fix nir_repair_ssa for unreachable
blocks. However, that commit missed the possibility that the use is in
a block which, itself, is unreachable. In this case, we can end up in
an infinite loop trying to replace a def with itself. Even though a
no-op replacement is a fine operation, it keeps extending the end of the
uses list as we're walking it. Instead of explicitly checking for the
group of conditions, just check if the phi builder gives us a different
def. That's guaranteed to be 100% reliable and, while it lacks symmetry
with the is_valid checks, should be more reliable.
Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant. For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry. This
can result in Z-fighting artifacts (at best). For now, disable this
optimization in these stages.
In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.
total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.
There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Refactor the code to avoid calling a lot of time to auxiliary functions
when it is not really needed.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
New added cases "stole" the previous break.
Fixes: 420ad0a1a3 ("spirv: check support for SPV_KHR_float_controls capabilities")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The pass assumed that "Most ALU ops produce an undefined result if any
source is undef" which is completely untrue. Due to how we lower if
statements to selects and then optimize on those selects later, we
simply cannot make that assumption. In particular this pass tried to
replace an ior of undef and true, which had been generated by
optimizing a select which itself came from flattening an if statement,
to undef causing a miscompilation for a CTS test with radeonsi NIR.
We fix this by always doing what the non-undef path did, i.e. duplicate
the instruction twice. If there are cases where the instruction before
the loop can be folded away due to having an undef source, we should add
these to opt_undef instead.
The comment above the pass says that if the phi source from before the
loop is undef, and we can fold the instruction before the loop to undef,
then we can ignore sources of the original instruction that don't
dominate the block before the loop because we don't need them to create
the instruction before the loop. This is incorrect, because the
instruction at the bottom of the loop would get those sources from the
wrong loop iteration. The code never actually did what the comment said,
so we only have to update the comment to match what the pass actually
does. We also update the example to more closely match what most actual
loops look like after vtn and peephole_select.
There are no shader-db changes with i965, radeonsi NIR, or radv. With
anv and my vkpipeline-db there's only one change:
total instructions in shared programs: 14125290 -> 14125300 (<.01%)
instructions in affected programs: 2598 -> 2608 (0.38%)
helped: 0
HURT: 1
total cycles in shared programs: 2051473437 -> 2051473397 (<.01%)
cycles in affected programs: 36697 -> 36657 (-0.11%)
helped: 1
HURT: 0
Fixes
KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input
with radeonsi NIR.
Having Python and C variables sharing name in the same block of code
makes its understanding a bit confusing. Make it explicit that the
Python bit_size variable refers to the destination bit size.
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Until now, it was using the floating point version of fmin/fmax,
instead of the double version.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2:
- Replace hard coded value with DBL_MIN (Connor).
v3:
- Have into account the FLOAT_CONTROLS_DENORM_PRESERVE_FP64
flag (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
According to VK_KHR_shader_float_controls:
"Denormalized values obtained via unpacking an integer into a vector
of values with smaller bit width and interpreting those values as
floating-point numbers must: be flushed to zero, unless the entry
point is declared with the code:DenormPreserve execution mode."
v2:
- Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor).
v3:
- Adapt to use the new NIR lowering framework (Andres).
v4:
- Updated to renamed shader info member and enum values (Andres).
v5:
- Simplify flags logic operations (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]