This opcode is the same as the f2f16 opcode except that it comes with
a promise that it is safe to optimise it out if the result is
immediately converted back to a 32-bit float again. Normally this
would be a lossy conversion and so it would be visible to the
application, but if the conversion is generated as part of the mediump
lowering process then this removal doesn’t matter. The opcode is
eventually replaced with a regular f2f16 in the late optimisations so
the backends don’t need to handle it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
I noticed that we can do better for these kinds of comparisons while
working on the lowering for iadd_sat@64 and isub_sat@64. This
eliminated 11 instruction from the fs-addSaturate-int64.shader_test.
My hope is that this will improve the run-time of int64 tests on Ice
Lake. I have no data to support or refute this.
Unsurprisingly, no changes on shader-db.
v2: Condition the min and max patterns with nir_lower_minmax64.
Suggested by Caio. Very long discussion in the MR. :)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rearranged and expand the comment about the optimizations applied to
the lowering. Suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rebase on 272e927d0e ("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add a new lower_usub_sat64 flag that only applies to the 64-bit
version of the nir_op_usub_sat instruction.
v4: Also enable the lowering when nir_lower_iadd64 is set.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rebase on 272e927d0e ("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add a new lower_hadd64 flag that only applies to the 64-bit versions
of the instructions.
v4: Also enable the lowering when nir_lower_iadd64 is set.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq
should only return Inf when fsqrt returns 0.
The changes are pretty small, but this turns a few hundred hurt shaders
in the next patch into helped shaders.
An alternative to the intBitsToFloat is to import numpy and do
np.finfo(np.float32).max. That's more explicit, but we may also want to
have specific bit encodings of float values later. I could be convinced
either way, but intBitsToFloat(0x7f7fffff) was what I implemented first.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14661140 -> 14661104 (<.01%)
instructions in affected programs: 7520 -> 7484 (-0.48%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.52% -0.47%
Instructions are helped.
total cycles in shared programs: 228585416 -> 228584806 (<.01%)
cycles in affected programs: 56321 -> 55711 (-1.08%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10
helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65%
95% mean confidence interval for cycles value: -28.32 -9.80
95% mean confidence interval for cycles %-change: -1.63% -0.54%
Cycles are helped.
Sandy Bridge
total cycles in shared programs: 152991077 -> 152991075 (<.01%)
cycles in affected programs: 11525 -> 11523 (-0.02%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -5.27 4.27
95% mean confidence interval for cycles %-change: -0.16% 0.15%
Inconclusive result (value mean confidence interval includes 0).
No changes on Iron Lake or GM45.
This prevents some additional optimizations that would change the
original result. This includes things like (b < a && b < c) => b <
min(a, c) and !(a < b) => b >= a. Both of these optimizations were
specifically observed in the piglit tests added in piglit!160.
This was discovered while investigating
https://gitlab.freedesktop.org/mesa/mesa/issues/1958. However, the
problem in that issue was Chrome or Angle is replacing calls to isnan()
with some stuff that we (correctly) optimize to false. If they had left
the calls to isnan() alone, everything would have just worked.
No shader-db changes on any Intel platform.
I also tried marking the comparison generated by the isnan() function
precise. The precise marker "infects" every computation involved in
calculating the parameter to the isnan() function, and this severely
hurt all of the (few) shaders in shader-db that use isnan().
I also considered adding a new ir_unop_isnan opcode that would implement
the functionality. During GLSL IR-to-NIR translation, the resulting
comparison operation would be marked exact (and the samething would need
to happen in SPIR-V translation).
This approach taken by this patch seemed easier, but we may want to do
the ir_unop_isnan thing anyway.
Fixes: d55835b8bd ("nir/algebraic: Add optimizations for "a == a && a CMP b"")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size. For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.
This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant. For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry. This
can result in Z-fighting artifacts (at best). For now, disable this
optimization in these stages.
In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.
total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.
With the arrival of VK_KHR_shader_float_controls algebraic
optimizations for float types of the form (('fop', a, b), a) become
inexact depending on the execution mode.
For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case
of a denorm value for the "a" parameter, we cannot return it still as
a denorm, it needs to be flushed to zero. Therefore, we mark now all
those operations as inexact.
Suggested-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Fix the a / b ordering in some compares. Delete duplicate patterns.
Add a table explaining things. While I was cleaning this up, I managed
to confuse myself. The table helped sort that out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This didn't fix bug #111308, but it was found will trying to find the
actual cause of that bug.
Fixes piglit tests (new in piglit!110):
- fs-fract-of-NaN.shader_test
- fs-lt-nan-tautology.shader_test
- fs-ge-nan-tautology.shader_test
No shader-db changes on any Intel platform.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: b77070e293 ("nir/algebraic: Use value range analysis to eliminate tautological compares")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This caused a problem on Sandybridge where an open-coded
bitfieldReverse() function could be optimized to a
nir_op_bitfield_reverse that would generate an unsupported BFREV
instruction in the backend. This was encountered in some Unreal4 tech
demos in shader-db. The bug was not previously noticed because we don't
actually try to run those demos on Sandybridge.
The fixes tag is a bit a lie. The actual bug was introduced about
26,000 commits earlier in 371c4b3c48 ("nir: Recognize open-coded
bitfield_reverse."). Without the NIR lowering pass, the flag needed to
avoid the optimization does not exist. Hopefully nobody will care to
fix this on an earlier Mesa release.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 7afa26d4e3 ("nir: Add lowering for nir_op_bitfield_reverse.")
This just eliminates tautological / contradictory compares that are used
for bcsel and other non-if-statement cases. If-statements are not
affected because removing flow control can cause the i965 instrution
scheduler to create some very long live ranges resulting in unncessary
spilling. This causes some shaders to fall of a performance cliff.
Since many small if-statements are already flattened to bcsel, this
optimization covers more than 68% of the possible cases (2417 shaders
helped for instructions on Skylake vs. 3554).
v2: Reorder and add whitespace to make the relationship between the
patterns more obvious. Suggested by Caio.
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16333474 -> 16322028 (-0.07%)
instructions in affected programs: 438559 -> 427113 (-2.61%)
helped: 1765
HURT: 0
helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4
helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82%
95% mean confidence interval for instructions value: -6.87 -6.10
95% mean confidence interval for instructions %-change: -4.30% -3.84%
Instructions are helped.
total cycles in shared programs: 367608554 -> 367511103 (-0.03%)
cycles in affected programs: 8368829 -> 8271378 (-1.16%)
helped: 1541
HURT: 129
helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39
helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17%
HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10
HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60%
95% mean confidence interval for cycles value: -64.90 -51.81
95% mean confidence interval for cycles %-change: -3.89% -3.36%
Cycles are helped.
total spills in shared programs: 8867 -> 8868 (0.01%)
spills in affected programs: 18 -> 19 (5.56%)
helped: 0
HURT: 1
total fills in shared programs: 21900 -> 21903 (0.01%)
fills in affected programs: 78 -> 81 (3.85%)
helped: 0
HURT: 1
All Gen6 and earlier platforms had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10829877 -> 10829247 (<.01%)
instructions in affected programs: 30240 -> 29610 (-2.08%)
helped: 177
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3
helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94%
95% mean confidence interval for instructions value: -3.93 -3.18
95% mean confidence interval for instructions %-change: -3.04% -2.32%
Instructions are helped.
total cycles in shared programs: 154036580 -> 154035437 (<.01%)
cycles in affected programs: 352402 -> 351259 (-0.32%)
helped: 96
HURT: 28
helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6
helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46%
HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4
HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23%
95% mean confidence interval for cycles value: -13.40 -5.03
95% mean confidence interval for cycles %-change: -1.62% -0.53%
Cycles are helped.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Optimizations that insert bitshift or bitwise operations should not be
applied on GPUs that don't support integer operations.
The .lower_bitshift could be used to control the bitshift related ones,
but there was also one bitwise optimization uncovered.
Since only lima and freedreno use this option and the use case is that
no bit operations are wanted, let's rename it to .lower_bitops and use
it to control all bitops related optimizations.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can
be used to optimize fdot lowering. fsum2 is not implemented and can be
further lowered to an add with the vector components.
Currently lima ppir handles this lowering internally, however this
happens in a very late stage and requires a big chunk of code compared
to a nir_opt_algebraic lowering.
By having fsum in nir, we can reduce ppir complexity and enable the
lowered ops to be part of other nir optimizations in the optimization
loop.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When 'x' is the result of a scmp op:
x != 0.0 or x == 1.0: passthrough
x == 0.0 or x != 1.0: invert
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal
for vec4 backends that doesn't have any special instructions for it, as
long as they support saturate.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add simple fdot2 optimizations that are missing.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For backends that don't have a 'fdph' instructions
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This version has less ops for the same precision.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
This is to allow optimizations in nir_opt_algebraic not otherwise possible
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>