If FLOAT_CONTROLS_SIGNED_ZERO_INF_NAN_PRESERVE or
FLOAT_CONTROLS_DENORM_FLUSH_TO_ZERO are enabled, do not apply the
inexact optimizations so the VK_KHR_shader_float_controls execution
mode is respected.
v2:
- Do not apply inexact optimizations if SHADER_DENORM_FLUSH_TO_ZERO is
enabled (Andres).
v3:
- Updated to renamed shader info member (Andres).
v4:
- Directly access execution mode instead of dragging it by parameter (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v1]
With the arrival of VK_KHR_shader_float_controls algebraic
optimizations for float types of the form (('fop', a, b), a) become
inexact depending on the execution mode.
For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case
of a denorm value for the "a" parameter, we cannot return it still as
a denorm, it needs to be flushed to zero. Therefore, we mark now all
those operations as inexact.
Suggested-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2:
- Move the op-code specific knowledge to nir_opcodes.py even if it
means a rount trip conversion (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
According to Vulkan spec, the new execution modes affect only
correctly rounded SPIR-V instructions, which includes fadd, fsub and
fmul.
v2:
- Fix fmul, fsub and fadd round-to-zero definitions, they should use
auxiliary functions to calculate the proper value because Mesa uses
round-to-nearest-even rounding mode by default (Connor).
v3:
- Do an actual fused multiply-add at ffma (Connor).
v4:
- Simplify fadd and fmul for bit sizes < 64 (Connor).
- Do not use double ffma for 32 bits float (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
f2f16's rounding modes are already handled and f2f64 don't need it
as there is not a floating point type with higher bit size than 64 for
now.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2:
- Refactor conditions and shared function (Connor).
- Move code to nir_eval_const_opcode() (Connor).
- Don't flush to zero on fquantize2f16
From Vulkan spec, VK_KHR_shader_float_controls section:
"3) Do denorm and rounding mode controls apply to OpSpecConstantOp?
RESOLVED: Yes, except when the opcode is OpQuantizeToF16."
v3:
- Fix bit size (Connor).
- Fix execution mode on nir_loop_analize (Connor).
v4:
- Adapt after API changes to nir_eval_const_opcode (Andres).
v5:
- Simplify constant_denorm_flush_to_zero (Caio).
v6:
- Adapt after API changes and to use the new constant
constructors (Andres).
- Replace MAYBE_UNUSED with UNUSED as the first is going
away (Andres).
v7:
- Adapt to newly added calls (Andres).
- Simplified the auxiliary to flush denorms to zero (Caio).
- Updated to renamed supported capabilities member (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v4]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2:
- Added more functions.
v3:
- Simplify most of the functions (Caio).
v4:
- Updated to renamed enum values (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
v2:
- Add support for rounding modes for each floating point bit size.
v3:
- Commit e68871f6a4 ("spirv: Handle constants and types before
execution modes") changed when the execution modes are handled,
which affects the result of the floating point constants when the
rounding mode is set in the execution mode. Moved the handling of
the rounding modes before we handle the constants.
v4:
- Rename vtn_decoration "literals" to "operands" (Andres).
- Simplify execution mode parsing util function (Caio).
- Extend the comment about the timing of the handling of the rounding
modes (Caio).
v5:
- Correct extension name (Caio).
- Rename shader info member (Andres).
- Rename float controls enum (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
SPIR-V 1.5 incorported the SPV_EXT_shader_viewport_index_layer but
splitting into the two capabilities above. Just handle them as we
support the extension already.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: by J.Ekstrand suggestion moved lowering of large
constants after lowering of copy_deref is done.
CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
A filed of nir_variable.location may be equel to -1.
That may cause copying to invalid address of list-node,
making some internal fields corrupted.
Patch fixes segfault during freeing context due to
corrupted address of ralloc_header.destructor.
v2: copy data if var is constant (Connor Abbott)
CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b6d4753568 (nir/large_constants: De-duplicate constants)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111676
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Perform all the NIR linking steps in order. Change iris and i965 to
use it. Suggested by Alejandro.
v2: Add gl_nir_linker_options struct.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
The parameter lists were not being created nor filled since i965
doesn't use them. In Gallium they are used for uniform handling, so
add a way to fill them.
The gl_uniform_storage struct got two new fields that let us go
- from a Parameter to the matching UniformStorage and,
- from the variable to the *first* UniformStorage
without relying on names -- since they are optional for ARB_gl_spirv.
Later patches will make use of them.
v2: Do not fill parameters for i965. (Timothy)
Use uint32_t for the new attributes. (Marek)
v3: Serialize the new fields. (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Don't use the UNMAPPED_UNIFORM_LOC (-1) to set the unsigned
max_uniform_location. Those unmapped uniforms don't have to be
accounted at this point.
Fixes: 7a9e5cdfbb ("nir/linker: Add gl_nir_link_uniforms()")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Currently the praser for s expressions assumes that newlines will be \n,
resulting in incorrect parsing on windows, where the newline is \r\n.
This patch just adds \r? to the regular expression used to parse the s
expressions, which fixes at 1 test on windows.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The dead_cf pass calls into the CF manipulation helpers which attempt to
keep NIR's SSA form sane. However, when the only break is removed from
a loop, dominance gets messed up anyway because the CF SSA clean-up code
only looks at phis and doesn't consider the case of code becoming
unreachable. One solution to this would be to put the loop into LCSSA
form before we modify any of its contents. Another (and the approach
taken by this pass) is to just run the repair_ssa pass afterwards
because the CF manipulation helpers are smart enough to keep all the
use/def stuff sane; they just don't always preserve dominance
properties.
While we're here, we clean up some bogus indentation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111405
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111069
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
NIR currently assumes that unreachable blocks are trivially dominated by
everything. However, when considering well-formed SSA, there is no path
from any block to an unreachable block. Therefore, we can break any
use-def chains where the use is in an unreachable block. This removes
any dependencies on code created by uses in unreachable blocks and lets
DCE do a better job of cleaning it up.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already bail and don't split the vars but we were passing a NULL to
_mesa_hash_table_search which is not allowed.
Fixes: f1cb3348f1 "nir/split_vars: Properly bail in the presence of ..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 02bc4aabb48 ('nir/lower_io_to_vector: allow FS outputs to be vectorized')
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This has lower_io_to_vector try to turn variables into arrays of 4-sized
vectors when possible and fall back to the old approach when that isn't
possible.
This is so that lower_io_to_vector can guarantee that only one variable is
used for each fragment shader output.
v2: handle dual-source blending
v3: don't try to merge structs and non-32-bit types in get_flat_type()
v3: fix per-vertex inputs
v3: fix and cleanup location advancement in get_flat_type() and it's
calling code
v4: prioritize the original mode over the flat mode
v4: don't create flat variables to merge only one variable
v5: don't skip an entire slot when encountering structs in the old mode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2: handle dual-source blending
v3: use a higher MAX_SLOTS
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
For loops which condition is false on the first iteration
iteration count was falsely calculated under the assumption
that loop's condition is true until it becomes false, meaning
it's true at least one time.
Now such loops are reported as having 0 iteration.
Similar to the fix e71fc7f2 done in NIR.
Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.
This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Load a 32-bit value then convert to 1-bit. Convert 1-bit to 32-bit
value, then Store it.
These cases started to appear when we changed Anvil to use derefs for
shared memory.
v2: Use `bit_size` in a couple of places we were missing. (Jason)
Reassign `value` instead of `src[0]`. (Jason)
Fixes: 024a46a407 ("anv: use derefs for shared memory access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
For radeonsi, we will prefer the NIR pass as it'll generate better code
(some index calculation and a single load vs. a load, then index
calculation, then another load) and oftentimes NIR optimization can kick
in and make all the access indices constant.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The precision for a function return type is now stored in
ir_function_signature. This will later be useful to implement mediump
to float16 lowering. In the meantime it is also useful to catch errors
where a function is redeclared with a different precision.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Otherwise it's impossible to know the maximum SSBO index for both
internal TGSI shaders from TTN (which don't have any notion of atomic
counters and no offset) as well as shaders from GLSL.
I fixed everything I could find while grepping for num_ssbos and
num_abos, which hopefully is everything (iris was the only user I could
find that uses it in a meaningful way).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's only correct when 'a' is an integral greater or equal to 0.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111493
Fixes: 5544b2cbbd ("nir/algebraic: Use value range analysis to eliminate useless unary ops")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes a hang in shadertoy for radeonsi where a buffer was initialized with:
value -= value
with value being undefined.
In this case LLVM replace the operation with an assignment to NaN.
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111241
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: Update several of the comments. Drop some redundant uses of
ASSERT_UNION_OF_OTHERS_MATCHES_UNKNOWN_*_SOURCE source. Suggested by
Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
One shader from Metro Last Light and the rest from Rochard. In the
Rochard cases, something like:
min(1.0, max(pow(saturate(x), y), z))
was transformed to
saturate(max(pow(saturate(x), y), z))
because the result of the pow must be >= 0.
The Metro Last Light case was similar. An instance of
min(pow(abs(x), y), 1.0)
became
saturate(pow(abs(x), y))
v2: Fix some comments. Suggested by Caio.
v3: Fix setting is_intgral when the exponent might be negative. See
also Mesa MR !1778.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16280670 -> 16280659 (<.01%)
instructions in affected programs: 1130 -> 1119 (-0.97%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.72% max: 1.43% x̄: 1.03% x̃: 0.97%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.19% -0.86%
Instructions are helped.
total cycles in shared programs: 367168430 -> 367168270 (<.01%)
cycles in affected programs: 10281 -> 10121 (-1.56%)
helped: 10
HURT: 1
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.31% max: 2.43% x̄: 1.79% x̃: 1.70%
HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel) min: 3.10% max: 3.10% x̄: 3.10% x̃: 3.10%
95% mean confidence interval for cycles value: -20.06 -9.04
95% mean confidence interval for cycles %-change: -2.36% -0.32%
Cycles are helped.
I discovered this while looking at a shader that was hurt by some other
work I'm doing. When I examined the changes, I was confused that one
instance of a comparison that was used in a discard_if was (incorrectly)
eliminated, while another instance used by a bcsel was (correctly) not
eliminated. I had to use NIR_PRINT=true to see exactly where things
when wrong.
A bunch of shaders in Goat Simulator, Dungeon Defenders, Sanctum 2, and
Strike Suit Zero were impacted.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16280659 -> 16281075 (<.01%)
instructions in affected programs: 21042 -> 21458 (1.98%)
helped: 0
HURT: 136
HURT stats (abs) min: 1 max: 9 x̄: 3.06 x̃: 3
HURT stats (rel) min: 1.16% max: 6.12% x̄: 2.23% x̃: 2.03%
95% mean confidence interval for instructions value: 2.93 3.19
95% mean confidence interval for instructions %-change: 2.08% 2.37%
Instructions are HURT.
total cycles in shared programs: 367168270 -> 367170313 (<.01%)
cycles in affected programs: 172020 -> 174063 (1.19%)
helped: 14
HURT: 111
helped stats (abs) min: 2 max: 80 x̄: 21.21 x̃: 9
helped stats (rel) min: 0.10% max: 4.47% x̄: 1.35% x̃: 0.79%
HURT stats (abs) min: 2 max: 584 x̄: 21.08 x̃: 5
HURT stats (rel) min: 0.12% max: 17.28% x̄: 1.55% x̃: 0.40%
95% mean confidence interval for cycles value: 5.41 27.28
95% mean confidence interval for cycles %-change: 0.64% 1.81%
Cycles are HURT.
Found by inspection. I tried really, really hard to make a test case
that would trigger this problem, but I was unsuccesful. It's very hard
to get an instruction to produce a ne_zero result without ne_zero
sources. The most plausible way is using bcsel. That proves
problematic because bcsel interprets its sources as integers, so it
cannot currently be used to "clean" values for floating point
instructions.
No shader-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Fixes piglit tests (new in piglit!110):
- fs-underflow-fma-compare-zero.shader_test
- fs-underflow-mul-compare-zero.shader_test
v2: Add back part of comment accidentally deleted. Noticed by
Caio. Remove is_not_zero function as it is no longer used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: fa116ce357 ("nir/range-analysis: Range tracking for ffma and flrp")
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All Gen7+ platforms** had similar results. (Ice Lake shown)
total instructions in shared programs: 16278465 -> 16279492 (<.01%)
instructions in affected programs: 16765 -> 17792 (6.13%)
helped: 0
HURT: 23
HURT stats (abs) min: 7 max: 275 x̄: 44.65 x̃: 8
HURT stats (rel) min: 1.15% max: 17.51% x̄: 4.23% x̃: 1.62%
95% mean confidence interval for instructions value: 9.57 79.74
95% mean confidence interval for instructions %-change: 1.85% 6.61%
Instructions are HURT.
total cycles in shared programs: 367135159 -> 367154270 (<.01%)
cycles in affected programs: 279306 -> 298417 (6.84%)
helped: 0
HURT: 23
HURT stats (abs) min: 13 max: 6029 x̄: 830.91 x̃: 54
HURT stats (rel) min: 0.17% max: 45.67% x̄: 7.33% x̃: 0.49%
95% mean confidence interval for cycles value: 100.89 1560.94
95% mean confidence interval for cycles %-change: 0.94% 13.71%
Cycles are HURT.
total spills in shared programs: 8870 -> 8869 (-0.01%)
spills in affected programs: 19 -> 18 (-5.26%)
helped: 1
HURT: 0
total fills in shared programs: 21904 -> 21901 (-0.01%)
fills in affected programs: 81 -> 78 (-3.70%)
helped: 1
HURT: 0
LOST: 0
GAINED: 1
** On Broadwell, a shader was hurt for spills / fills instead of
helped.
No changes on any earlier platforms.
Fix the a / b ordering in some compares. Delete duplicate patterns.
Add a table explaining things. While I was cleaning this up, I managed
to confuse myself. The table helped sort that out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This didn't fix bug #111308, but it was found will trying to find the
actual cause of that bug.
Fixes piglit tests (new in piglit!110):
- fs-fract-of-NaN.shader_test
- fs-lt-nan-tautology.shader_test
- fs-ge-nan-tautology.shader_test
No shader-db changes on any Intel platform.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: b77070e293 ("nir/algebraic: Use value range analysis to eliminate tautological compares")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This caused a problem on Sandybridge where an open-coded
bitfieldReverse() function could be optimized to a
nir_op_bitfield_reverse that would generate an unsupported BFREV
instruction in the backend. This was encountered in some Unreal4 tech
demos in shader-db. The bug was not previously noticed because we don't
actually try to run those demos on Sandybridge.
The fixes tag is a bit a lie. The actual bug was introduced about
26,000 commits earlier in 371c4b3c48 ("nir: Recognize open-coded
bitfield_reverse."). Without the NIR lowering pass, the flag needed to
avoid the optimization does not exist. Hopefully nobody will care to
fix this on an earlier Mesa release.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 7afa26d4e3 ("nir: Add lowering for nir_op_bitfield_reverse.")
The helper check_node_type() is only used when DEBUG is set (in the
function below), but ASSERTED macro uses NDEBUG. So just guard the
helper with #ifdef. If we see more such cases we might consider a
ASSERTED-like macro for the DEBUG case.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>