this fixes mipmapping with saturate by saturating the coord param while
passing an additional param (partial derivatives or lod) that uses the
unsaturated coord value
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8756>
If there are no helper invocations required during the
execution of the shader, we can assume that there also
are no helper invocations active.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9058>
load_helper_invocation is an Input Builtin, for which the
value should not change during the execution of a shader.
This new pass inserts an is_helper intrinsic before any
demote() instruction and re-uses its value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9058>
We already throw out any variables which may have a complex use so we
just need to make sure that our mode checks don't assert if we have a
deref which may_be but not must_be nir_var_function_temp.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9068>
We can't move the shuffle to a new block so this only works if the
shuffle and the bcsel are in the same block. Fortunately, in the
motivating case, this is true.
Also, we have to be careful around discard. We could try really hard to
just avoid moving them past discard but we choose to simply bail if we
see a discard instead.
Fixes: 4ff4d4e569 "nir/opt_intrinsic: Optimize bcsel(b, shuffle..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9068>
There are a bunch of cases where we can pretty quickly determine that
the high bits don't matter. In these cases, delete the casts.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8872>
Facilites the gl_SamplePosition lowering on Bifrost, where the sample
positions are accessed directly in a packed in-memory format.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8774>
Useful for determining whether certain optimizations are legal for a
compute shader (e.g. optimizing workgroup size in the driver).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6312>
The base mask previously used was 0xffffffff. This is not correct (but
should still work) for 16-bit and 8-bit values, but it means the high
32-bits of 64-bit values will get chopped off.
Instead of just restricting the pattern to 32-bits (as was done before
00b28a50b2), this extends the optimization in two ways:
1. Make it correct for other bit sizes.
2. Make it work for arbitrary shift counts.
This has the added benefit of reducing the number of patterns actually
added (7 previously, 4 now).
The "Reassociate for improved CSE" part is just reverted to its
pre-00b28a50b2c behavior. I doubt that pattern is likely to have much
impact outside 32-bits.
This change fixes the piglit tests
tests/spec/arb_gpu_shader_int64/fs-shl-of-shr-int64.shader_test and
tests/spec/arb_gpu_shader_int64/fs-iand-of-iadd-int64.shader_test.
All of the shaders helped in shader-db are vertex shaders on platforms
with vector-oriented vertex processing. The shaders contain ((x >> 16)
<< 16). These platforms set lower_extract_word, so the optimization
that transforms (x >> 16) to extract_u16 doesn't trigger. With only ~60
shaders involved, I didn't bother trying to add extract_XYZ versions of
these patterns to try to get those cases.
Fixes: 00b28a50b2 ("nir/algebraic: trivially enable existing 32-bit patterns for all bit sizes")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Haswell and earlier Intel GPUs had simlar results. (Haswell shown)
total instructions in shared programs: 16397554 -> 16397496 (<.01%)
instructions in affected programs: 7961 -> 7903 (-0.73%)
helped: 58
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 1.89% x̄: 0.99% x̃: 0.78%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.13% -0.85%
Instructions are helped.
total cycles in shared programs: 1035483770 -> 1035483504 (<.01%)
cycles in affected programs: 75922 -> 75656 (-0.35%)
helped: 44
HURT: 2
helped stats (abs) min: 2 max: 12 x̄: 6.14 x̃: 2
helped stats (rel) min: 0.05% max: 1.67% x̄: 0.87% x̃: 0.72%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -7.28 -4.29
95% mean confidence interval for cycles %-change: -1.03% -0.63%
Cycles are helped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8852>
Basically every pass in NIR uses nir_ssa_def_rewrite_uses which calls
nir_instr_rewrite_src which is fairly complex because it handles all
sorts of non-SSA cases. Since we already know a priori that every
source written by nir_ssa_def_rewrite_uses is SSA, we can check new_src
once at the top of the function and cut out all that complexity.
While we're at it, we expose a new SSA-only nir_ssa_def_rewrite_uses_ssa
helper which takes an SSA def which avoids the one SSA check. It's also
more convenient 90% of the time.
Compile time as tested by Rhys Perry <pendingchaos02@gmail.com>
Difference at 95.0% confidence
-797.166 +/- 418.649
-0.566174% +/- 0.296441%
(Student's t, pooled s = 325.459)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8790>
It is currently a bitset on top of a uint64_t but there are already
more than 64 values. Change to use BITSET to cover all the
SYSTEM_VALUE_MAX bits.
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>
All uses are passing variables of either nir_var_shader_in or
nir_var_shader_out modes. Note that currently there are more than 64
system values, so the uint64_t wouldn't be enough anyway.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>
The function is very convenient for lowering any type of instruction
that can be easily filtered, but so far instructions that didn't return
a value were siletly ignored.
Fix this by
- not requiring a return value in the instruction
- add a new special return value from the lowering implementation
function to indicated that an instruction that doesn't have a
return value must be removed, and
- don't try to collect and replace uses in this case.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8177>
This reverts the code back to the form it was before, but with an
explicitly sized float32 instead of float, now that all producers are
switched over.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7989>
In theory, we could also verify this against the sampler type for
sampler derefs, but there are a number of complications there:
- SPIR-V 1.4 lets you override the signedness of integer samplers
per-instruction. So the base type may not match.
- mediump/RelaxedPrecision samplers may get lowered to f16 in the
instruction or may not. So the bitsize may not match.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7989>
This happens with nir_texop_samples_identical, and we need to keep
things consistent and (soon) keep the validator happy when expanding
booleans once we switch that to having a dest_type of bool1.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7989>
These are all pretty trivial because we can just split the op into one
subgroup op per half of the value. There's some question as to whether
these belong in lower_int64 or lower_subgroups but, on Intel, they key
decider of whether or not we need the lowering is based on whether or
not we have hardware int64 support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>