In a3268599f3, I attempted to fix nir_repair_ssa for unreachable
blocks. However, that commit missed the possibility that the use is in
a block which, itself, is unreachable. In this case, we can end up in
an infinite loop trying to replace a def with itself. Even though a
no-op replacement is a fine operation, it keeps extending the end of the
uses list as we're walking it. Instead of explicitly checking for the
group of conditions, just check if the phi builder gives us a different
def. That's guaranteed to be 100% reliable and, while it lacks symmetry
with the is_valid checks, should be more reliable.
Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant. For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry. This
can result in Z-fighting artifacts (at best). For now, disable this
optimization in these stages.
In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.
total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.
There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Refactor the code to avoid calling a lot of time to auxiliary functions
when it is not really needed.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
New added cases "stole" the previous break.
Fixes: 420ad0a1a3 ("spirv: check support for SPV_KHR_float_controls capabilities")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The pass assumed that "Most ALU ops produce an undefined result if any
source is undef" which is completely untrue. Due to how we lower if
statements to selects and then optimize on those selects later, we
simply cannot make that assumption. In particular this pass tried to
replace an ior of undef and true, which had been generated by
optimizing a select which itself came from flattening an if statement,
to undef causing a miscompilation for a CTS test with radeonsi NIR.
We fix this by always doing what the non-undef path did, i.e. duplicate
the instruction twice. If there are cases where the instruction before
the loop can be folded away due to having an undef source, we should add
these to opt_undef instead.
The comment above the pass says that if the phi source from before the
loop is undef, and we can fold the instruction before the loop to undef,
then we can ignore sources of the original instruction that don't
dominate the block before the loop because we don't need them to create
the instruction before the loop. This is incorrect, because the
instruction at the bottom of the loop would get those sources from the
wrong loop iteration. The code never actually did what the comment said,
so we only have to update the comment to match what the pass actually
does. We also update the example to more closely match what most actual
loops look like after vtn and peephole_select.
There are no shader-db changes with i965, radeonsi NIR, or radv. With
anv and my vkpipeline-db there's only one change:
total instructions in shared programs: 14125290 -> 14125300 (<.01%)
instructions in affected programs: 2598 -> 2608 (0.38%)
helped: 0
HURT: 1
total cycles in shared programs: 2051473437 -> 2051473397 (<.01%)
cycles in affected programs: 36697 -> 36657 (-0.11%)
helped: 1
HURT: 0
Fixes
KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input
with radeonsi NIR.
Having Python and C variables sharing name in the same block of code
makes its understanding a bit confusing. Make it explicit that the
Python bit_size variable refers to the destination bit size.
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Until now, it was using the floating point version of fmin/fmax,
instead of the double version.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2:
- Replace hard coded value with DBL_MIN (Connor).
v3:
- Have into account the FLOAT_CONTROLS_DENORM_PRESERVE_FP64
flag (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
According to VK_KHR_shader_float_controls:
"Denormalized values obtained via unpacking an integer into a vector
of values with smaller bit width and interpreting those values as
floating-point numbers must: be flushed to zero, unless the entry
point is declared with the code:DenormPreserve execution mode."
v2:
- Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor).
v3:
- Adapt to use the new NIR lowering framework (Andres).
v4:
- Updated to renamed shader info member and enum values (Andres).
v5:
- Simplify flags logic operations (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
If FLOAT_CONTROLS_SIGNED_ZERO_INF_NAN_PRESERVE or
FLOAT_CONTROLS_DENORM_FLUSH_TO_ZERO are enabled, do not apply the
inexact optimizations so the VK_KHR_shader_float_controls execution
mode is respected.
v2:
- Do not apply inexact optimizations if SHADER_DENORM_FLUSH_TO_ZERO is
enabled (Andres).
v3:
- Updated to renamed shader info member (Andres).
v4:
- Directly access execution mode instead of dragging it by parameter (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v1]
With the arrival of VK_KHR_shader_float_controls algebraic
optimizations for float types of the form (('fop', a, b), a) become
inexact depending on the execution mode.
For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case
of a denorm value for the "a" parameter, we cannot return it still as
a denorm, it needs to be flushed to zero. Therefore, we mark now all
those operations as inexact.
Suggested-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2:
- Move the op-code specific knowledge to nir_opcodes.py even if it
means a rount trip conversion (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
According to Vulkan spec, the new execution modes affect only
correctly rounded SPIR-V instructions, which includes fadd, fsub and
fmul.
v2:
- Fix fmul, fsub and fadd round-to-zero definitions, they should use
auxiliary functions to calculate the proper value because Mesa uses
round-to-nearest-even rounding mode by default (Connor).
v3:
- Do an actual fused multiply-add at ffma (Connor).
v4:
- Simplify fadd and fmul for bit sizes < 64 (Connor).
- Do not use double ffma for 32 bits float (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
f2f16's rounding modes are already handled and f2f64 don't need it
as there is not a floating point type with higher bit size than 64 for
now.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2:
- Refactor conditions and shared function (Connor).
- Move code to nir_eval_const_opcode() (Connor).
- Don't flush to zero on fquantize2f16
From Vulkan spec, VK_KHR_shader_float_controls section:
"3) Do denorm and rounding mode controls apply to OpSpecConstantOp?
RESOLVED: Yes, except when the opcode is OpQuantizeToF16."
v3:
- Fix bit size (Connor).
- Fix execution mode on nir_loop_analize (Connor).
v4:
- Adapt after API changes to nir_eval_const_opcode (Andres).
v5:
- Simplify constant_denorm_flush_to_zero (Caio).
v6:
- Adapt after API changes and to use the new constant
constructors (Andres).
- Replace MAYBE_UNUSED with UNUSED as the first is going
away (Andres).
v7:
- Adapt to newly added calls (Andres).
- Simplified the auxiliary to flush denorms to zero (Caio).
- Updated to renamed supported capabilities member (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v4]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2:
- Added more functions.
v3:
- Simplify most of the functions (Caio).
v4:
- Updated to renamed enum values (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
v2:
- Add support for rounding modes for each floating point bit size.
v3:
- Commit e68871f6a4 ("spirv: Handle constants and types before
execution modes") changed when the execution modes are handled,
which affects the result of the floating point constants when the
rounding mode is set in the execution mode. Moved the handling of
the rounding modes before we handle the constants.
v4:
- Rename vtn_decoration "literals" to "operands" (Andres).
- Simplify execution mode parsing util function (Caio).
- Extend the comment about the timing of the handling of the rounding
modes (Caio).
v5:
- Correct extension name (Caio).
- Rename shader info member (Andres).
- Rename float controls enum (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
SPIR-V 1.5 incorported the SPV_EXT_shader_viewport_index_layer but
splitting into the two capabilities above. Just handle them as we
support the extension already.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: by J.Ekstrand suggestion moved lowering of large
constants after lowering of copy_deref is done.
CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
A filed of nir_variable.location may be equel to -1.
That may cause copying to invalid address of list-node,
making some internal fields corrupted.
Patch fixes segfault during freeing context due to
corrupted address of ralloc_header.destructor.
v2: copy data if var is constant (Connor Abbott)
CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b6d4753568 (nir/large_constants: De-duplicate constants)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111676
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Perform all the NIR linking steps in order. Change iris and i965 to
use it. Suggested by Alejandro.
v2: Add gl_nir_linker_options struct.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
The parameter lists were not being created nor filled since i965
doesn't use them. In Gallium they are used for uniform handling, so
add a way to fill them.
The gl_uniform_storage struct got two new fields that let us go
- from a Parameter to the matching UniformStorage and,
- from the variable to the *first* UniformStorage
without relying on names -- since they are optional for ARB_gl_spirv.
Later patches will make use of them.
v2: Do not fill parameters for i965. (Timothy)
Use uint32_t for the new attributes. (Marek)
v3: Serialize the new fields. (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Don't use the UNMAPPED_UNIFORM_LOC (-1) to set the unsigned
max_uniform_location. Those unmapped uniforms don't have to be
accounted at this point.
Fixes: 7a9e5cdfbb ("nir/linker: Add gl_nir_link_uniforms()")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Currently the praser for s expressions assumes that newlines will be \n,
resulting in incorrect parsing on windows, where the newline is \r\n.
This patch just adds \r? to the regular expression used to parse the s
expressions, which fixes at 1 test on windows.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The dead_cf pass calls into the CF manipulation helpers which attempt to
keep NIR's SSA form sane. However, when the only break is removed from
a loop, dominance gets messed up anyway because the CF SSA clean-up code
only looks at phis and doesn't consider the case of code becoming
unreachable. One solution to this would be to put the loop into LCSSA
form before we modify any of its contents. Another (and the approach
taken by this pass) is to just run the repair_ssa pass afterwards
because the CF manipulation helpers are smart enough to keep all the
use/def stuff sane; they just don't always preserve dominance
properties.
While we're here, we clean up some bogus indentation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111405
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111069
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
NIR currently assumes that unreachable blocks are trivially dominated by
everything. However, when considering well-formed SSA, there is no path
from any block to an unreachable block. Therefore, we can break any
use-def chains where the use is in an unreachable block. This removes
any dependencies on code created by uses in unreachable blocks and lets
DCE do a better job of cleaning it up.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already bail and don't split the vars but we were passing a NULL to
_mesa_hash_table_search which is not allowed.
Fixes: f1cb3348f1 "nir/split_vars: Properly bail in the presence of ..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 02bc4aabb48 ('nir/lower_io_to_vector: allow FS outputs to be vectorized')
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This has lower_io_to_vector try to turn variables into arrays of 4-sized
vectors when possible and fall back to the old approach when that isn't
possible.
This is so that lower_io_to_vector can guarantee that only one variable is
used for each fragment shader output.
v2: handle dual-source blending
v3: don't try to merge structs and non-32-bit types in get_flat_type()
v3: fix per-vertex inputs
v3: fix and cleanup location advancement in get_flat_type() and it's
calling code
v4: prioritize the original mode over the flat mode
v4: don't create flat variables to merge only one variable
v5: don't skip an entire slot when encountering structs in the old mode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2: handle dual-source blending
v3: use a higher MAX_SLOTS
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
For loops which condition is false on the first iteration
iteration count was falsely calculated under the assumption
that loop's condition is true until it becomes false, meaning
it's true at least one time.
Now such loops are reported as having 0 iteration.
Similar to the fix e71fc7f2 done in NIR.
Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.
This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Load a 32-bit value then convert to 1-bit. Convert 1-bit to 32-bit
value, then Store it.
These cases started to appear when we changed Anvil to use derefs for
shared memory.
v2: Use `bit_size` in a couple of places we were missing. (Jason)
Reassign `value` instead of `src[0]`. (Jason)
Fixes: 024a46a407 ("anv: use derefs for shared memory access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
For radeonsi, we will prefer the NIR pass as it'll generate better code
(some index calculation and a single load vs. a load, then index
calculation, then another load) and oftentimes NIR optimization can kick
in and make all the access indices constant.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The precision for a function return type is now stored in
ir_function_signature. This will later be useful to implement mediump
to float16 lowering. In the meantime it is also useful to catch errors
where a function is redeclared with a different precision.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Otherwise it's impossible to know the maximum SSBO index for both
internal TGSI shaders from TTN (which don't have any notion of atomic
counters and no offset) as well as shaders from GLSL.
I fixed everything I could find while grepping for num_ssbos and
num_abos, which hopefully is everything (iris was the only user I could
find that uses it in a meaningful way).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's only correct when 'a' is an integral greater or equal to 0.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111493
Fixes: 5544b2cbbd ("nir/algebraic: Use value range analysis to eliminate useless unary ops")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>