If a pass returning boolean progress reports no change, we shouldn't need
to re-validate. If a pass breaks the NIR but also fails to report
progress correctly, it would be up to the next pass to catch that.
This should hopefully help with test timeouts on
KHR-GL33.texture_swizzle.functional since switching softpipe to
nir-to-tgsi and enabling NIR validation in CI (27s to 20s on my system).
Suggested-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7239>
The way the ALIGN_POT macro works, an alignment of 0 may cause
ALIGN_POT(x, 0) to return 0 for any x. Throw in an assert to guard
against this case.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
nir-to-tgsi will use this to release release temporaries for SSA storage
back to ureg's linear register allocation once they're dead.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
I wanted it for the per-instruction live intervals metadata, and it's not
much to store in general. Make the ip explicitly 32-bit, on suggestion by
jekstrand.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
live_index had two things going on: 0 meant the instr was an undef and
always dead, and otherwise ssa defs had increasing numbers by instruction
order. We already have a field in the instruction for storing instruction
order, and ssa defs don't need that number to be contiguous (if you want a
compact per-ssa-def number, use ssa->index after reindexing).
We don't use ssa->index for this, because reindexing those would change
nir_print, and that would be rude to people trying to track what's
happening in optimization passes.
This openend up a hole in nir_ssa_def, so we move nir_ssa_def->index
toward the end to shrink the struct from 64 bytes to 56.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
This fixes issues in backends such as ACO which rely on always
getting this intrinsic to know the correct vertex and primitive
count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7213>
When we replace the original instruction with per-channel operations,
the new instruction should inherint the semantics of the original
instruction.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6721>
This is the same as nir_get_buffer_size but geared towards UBOs instead
of SSBOs. The new intrinsic is useful in Vulkan backends that need to
add bound checks on buffer accesses to honor the robust buffer access
feature.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The option use_interpolated_input_intrinsics will lower these as well
as regular input loads. This is inconvenient for V3D, where we can
produce optimal code for regular input loads based on the input
variable layout qualifiers, so this change adds an option to only
lower instances of interpolateAt().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In practice we found that we need this for v3d (specifically for cube
map arrays, as they don't support the default value for wrap_i, so a
sampler object is needed to override that value).
It is worth to note that the main reason behind this auxiliar method
was to identify those cases that we didn't have a sampler object
available for Vulkan. So far, we found that we have a sampler object
coming from nir always for that operation.
Fixes cube map array tests like the following:
dEQP-VK.glsl.texture_functions.query.texturequerylod.usamplercubearray_fragment
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This optimizes atomics with a uniform offset so that only one atomic
operation is done in the subgroup.
For shaders which do a very large amount of atomics, this can
significantly improve performance.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
locations are important for these because they provide info about how
many block indices each ubo takes up
UBO arrays have nonzero values here. all non-array UBOs have either 0
for the base or nonzero for an io lowered block at an offset,
but only arrays need to be changed here because they're the only ones
with absolute values, whereas all the others are relative.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6272>
In particular, OpenCL needs to allow shader_temp and function_temp
through because they're 100% real pointers.
Fixes piglit CL calls.cl
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7092>
Before 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero") this
optimization didn't need the use of umax/umin. VC4 HW supports only signed
integer max/min operations.
lower_umin and lower_umax are added to allow enabling previous optimizations
behaviour for this cases.
Fixes: 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7083>
After each end_primitive and at the end of the shader before emitting
set_vertex_and_primitive_count, we check if the primitive that is being
emitted has enough vertices or not, and we adjust the vertex and
primitive counters accordingly.
As a result, if the backend uses this option, the backend compiler
will not have to worry about discarding the unneeded vertices
and primitives.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Add an option to nir_lower_gs_intrinsics so that it can also track
the number of emitted vertices per primitive, not just the total
vertex count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Add an option to nir_lower_gs_intrinsics which tells it to track
the number of emitted primitives, not just vertices. Additionally,
also make it per-stream.
Also rename the set_vertex_count intrinsic to
set_vertex_and_primitive_count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Some GPUs can sample biplanar formats like NV12 natively, returning
the YUV values. Add a lowering type that uses that for sampling and
relies on existing colorspace conversions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6693>
This was observed with the intel vulkan driver when running
dEQP-VK.spirv_assembly.instruction.compute.float32.comparison_1.modfstruct
with ubsan enabled.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6728>