nir_lower_int_to_float() does this at the end of compilation, no need to
do it up front.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
NIR has fdiv, and all the NIR backends have to have lower_fdiv set
appropriately already since various passes (format conversions,
tgsi_to_nir, nir_fast_normalize(), etc.) might generate one.
This causes softpipe and llvmpipe to now do actual divides, since
lower_fdiv is not set there. Note that llvmpipe's rcp implementation is a
divide of 1.0 by x, so now we're going to be just doing div(x, y) instead
of mul(x, div(1.0, y)).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
It's way more concise to write as nir_builder calls.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
Should have been in 3a42e92a4f ("glsl: Drop the dead MOD_TO_FLOOR path.")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
The drivers not setting it were:
- nv30, which gets lowering using NIR's lower_fsat flag.
- r300, which gets lowering using NIR's lower_fsat flag.
- a2xx, which has was getting it optimized back to fsat anyway.
This drops the check for the cap from gallium nine. While nine does have
a non-nir path, I think it's safe to assume that if you have SM3
texturing, you can do fsat.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
This will be used for radeonsi to map common I/O location to fixed
slots agreed by different shader stages.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16418>
All uniform linking is now done via nir based linker not via this
code so we drop that from its name. We also drop a bunch of unused
parameters.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16880>
As per the comment this was meant to tidy things up after varying
linking but varying linking has been moved into a nir based linker
so this extra call is no longer needed.
This optimisation pass is still called in the regular glsl ir
optimisation loop.
No shader-db change on Iris (BDW).
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16880>
according to the spec, atomic counters can be bound at any offset divisible by 4,
which means that any driver that uses the ssbo lowering pass and doesn't have
a min offset align of 4 is potentially broken
to handle this, use a statevar to inject the misaligned remainder of the offset
into the shader as a uniform. for well-aligned counter binds, the uniform offset
will be 0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16749>
opencl_c_h is defined only for llvm < 15.
Fixes: bcc2df4890 ("clc: speed up compilation by not relying on opencl-c.h")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16808>
This is useful for drivers which wish to consume XFB information. These
hopefully-uncontroversial hunks are extracted from the much more controversial
"st,nir,radeons: Move nir_lower_io_passes to si_nir_lower_io" by Jason.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
These will be used to facilitate transform feedback lowering for Panfrost,
although other backends could use the sysvals in the future.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Calling this directly in the linker code allows us to place it between
the varying linker and uniform linker calls which allows for better
optimisation/removal of uniforms.
Also in a later patch it allows us to insert a new nir based
lower_const_arrays_to_uniforms() call after the gl_nir_link_opts()
call. This is important because it allows the linking opts to
move constant arrays to later stages if possible before
lower_const_arrays_to_uniforms() turns them into uniforms.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6541
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
Some backends lower constant arrays to uniforms in GLSL IR. These
create so called hidden uniforms. Since we know these are added
per stage it is safe to remove them if we detect they are dead.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
The remap tables are used with the GL API so there is no need to
add hidden uniforms to them. Also when we switch to lowering some
constant arrays to uniforms in NIR in a following patch there
will no longer be enough room in the tables as we assign their
size in the GLSL IR linker not the NIR linker currently.
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
Doing this in NIR should give better results, but also allows us to
stop calling more GLSL IR optimisations passes.
v2: Skip 8bit and 16bit type that would require further processing
I believe this is an existing bug in the GLSL IR pass also.
v3: rebuild constant initialisers as we want to call this pass
after nir has already lowered them and performed optimisations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
Per spec RelaxedPrecision cannot be applied to bool types, however
3DMark Wild Life does it:
OpDecorate %171 RelaxedPrecision
...
%171 = OpLogicalAnd %bool %169 %170
Fixes crash in 3DMark Wild Life on Android.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16746>
For the NIR XFB gathering as well as all the Vulkan drivers, buffer
strides in nir_xfb_info are in bytes. When Marek started using
nir_xfb_info for GLSL on radeonsi, he copied directly from the GLSL
struct which has strides in dwords. This inconsistency didn't show up
until I went through and started us using the NIR passes for GL drivers
directly without going through the GLSL structs. We could change the
nir_xfb_buffer_info field to be in dwords to be consistent with
shader_info but that would mean changing all the Vulkan drivers but, for
now, it's easier to always use bytes in nir_xfb_info.
Fixes: 2a22885a45 ("st,nir: Use nir_shader::xfb_info in nir_lower_io_passes")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16819>
A lot of the fields get fully overwritten but outputs/buffers_written
are both bitfields that we set one bit at a time.
Fixes: 7c5dc0b11a ("glsl/nir: Populate nir_shader::xfb_info after linking varyings")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16819>
Since we now depend on C11, we know that we have support for the C99
math functionality. So let's drop the c99_math.h compatibility wrapper,
and just include <math.h> directly.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16812>
A 1D texture operation may need to do a mov to turn a reference to a
channel of an SSA value into a scalar value to be passed as the texture
coordinate (since texture srcs can't do swizzles). Seen in
amnesia-the-dark-descent/low/46.shader_test() for example, where a 1D
texture is used to remap each of r,g,b from a previous texture result.
Besides, the nir_op_is_vec() case will (perhaps surprisingly) look through
a mov, anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16616>
This function allows to only scalarize instructions down to a desired
vectorization width. nir_lower_alu_to_scalar() was changed to use the
new function with a width of 1.
Swizzles outside vectorization width are considered and reduce
the target width.
This prevents ending up with code like
vec2 16 ssa_2 = iadd ssa_0.xz, ssa_1.xz
which requires to emit shuffle code in backends
and usually is not beneficial.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
The callback allows to request different vectorization factors
per instruction depending on e.g. bitsize or opcode.
This patch also removes using the vectorize_vec2_16bit option
from nir_opt_vectorize().
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
NIR already has the necessary lowering, and the GLSL lowering violates
GLSL IR validation rules. Once quadop lowering was turned off, the IR
validation at the end of the compile path on DEBUG builds caught the
problem.
In order to move the lowering to NIR, though, we need to make sure that
drivers supporting these functions actually have the lowering flag set.
xfails added for t860, where apparently this tickles a variety of existing
64-bit bugs in the backend.
Fixes: #6461
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16437>
We want to be able to carry this along with the shader instead of always
having to re-generate it from scratch. A new nir_gather_xfb_info()
helper is also added which, instead of returning it, adds it to the
shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16750>
During certain control-flow manipulation passes, we go out-of-SSA
temporarily in certain areas of the code to make control-flow
manipulation easier. This can result in registers being in phi sources
temporarily. If two sub-passes run before we get a chance to do
clean-up, we can end up doing some out-of-SSA and then a bit more
out-of-SSA and trigger this case. It's easy enough to handle.
Fixes: a620f66872 ("nir: Add a couple quick-and-dirty out-of-SSA helpers")
Fixes: 79a987ad2a ("nir/opt_if: also merge break statements with ones after the branch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6370
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16111>
The previous logic would just set the block to the instructions
original location if we couldn't evict it from a loop.
For now we only push const loads to a later block inside ifs
but we can add more heuristics later. This change helps a
hand full of shaders but also stops a CTS regression caused
by excess spilling after a series I'm working on to disable
more of the GLSL IR optimisation passes.
Shader-db results iris (BDW):
total instructions in shared programs: 17529759 -> 17529749 (<.01%)
instructions in affected programs: 15929 -> 15919 (-0.06%)
helped: 5
HURT: 2
helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 2
helped stats (rel) min: 0.06% max: 0.15% x̄: 0.11% x̃: 0.12%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for instructions value: -3.34 0.49
95% mean confidence interval for instructions %-change: -0.14% 0.02%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 861109994 -> 861099681 (<.01%)
cycles in affected programs: 7027698 -> 7017385 (-0.15%)
helped: 95
HURT: 72
helped stats (abs) min: 1 max: 7995 x̄: 138.54 x̃: 9
helped stats (rel) min: <.01% max: 15.96% x̄: 0.54% x̃: 0.11%
HURT stats (abs) min: 1 max: 474 x̄: 39.56 x̃: 12
HURT stats (rel) min: <.01% max: 1.17% x̄: 0.20% x̃: 0.11%
95% mean confidence interval for cycles value: -159.05 35.54
95% mean confidence interval for cycles %-change: -0.45% 0.01%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 17606 -> 17605 (<.01%)
spills in affected programs: 323 -> 322 (-0.31%)
helped: 1
HURT: 0
total fills in shared programs: 22599 -> 22598 (<.01%)
fills in affected programs: 1348 -> 1347 (-0.07%)
helped: 1
HURT: 0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14940>