This changes the trace rays logic to always use
VkTraceRaysIndirectCommand2KHR and implements
vkCmdTraceRaysIndirect2KHR. I renamed the
load_sbt_amd to sbt_base_amd and moved the SBT
load lowering from ACO to NIR.
Note that we can not just upload one pointer to
all the trace parameters because that would
be incompatible with traceRaysIndirect.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16430>
Found this while working on traceRaysIndirect2.
I don't think this is relevant for now at least
since we don't use the pass in RADV.
Fixes: 938c9d9 ("nir: Add a ray launch size addr intrinsic")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16430>
Add a new cull_mask system value that is exposed
by the ray_cull_mask capability of
SPV_KHR_ray_cull_mask.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16430>
The mesh shader task ring is a buffer in VRAM which we will use to
store some mesh shader outputs that don't fit into LDS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16737>
This will be used for radeonsi to map common I/O location to fixed
slots agreed by different shader stages.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16418>
according to the spec, atomic counters can be bound at any offset divisible by 4,
which means that any driver that uses the ssbo lowering pass and doesn't have
a min offset align of 4 is potentially broken
to handle this, use a statevar to inject the misaligned remainder of the offset
into the shader as a uniform. for well-aligned counter binds, the uniform offset
will be 0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16749>
This is useful for drivers which wish to consume XFB information. These
hopefully-uncontroversial hunks are extracted from the much more controversial
"st,nir,radeons: Move nir_lower_io_passes to si_nir_lower_io" by Jason.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
These will be used to facilitate transform feedback lowering for Panfrost,
although other backends could use the sysvals in the future.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Doing this in NIR should give better results, but also allows us to
stop calling more GLSL IR optimisations passes.
v2: Skip 8bit and 16bit type that would require further processing
I believe this is an existing bug in the GLSL IR pass also.
v3: rebuild constant initialisers as we want to call this pass
after nir has already lowered them and performed optimisations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
For the NIR XFB gathering as well as all the Vulkan drivers, buffer
strides in nir_xfb_info are in bytes. When Marek started using
nir_xfb_info for GLSL on radeonsi, he copied directly from the GLSL
struct which has strides in dwords. This inconsistency didn't show up
until I went through and started us using the NIR passes for GL drivers
directly without going through the GLSL structs. We could change the
nir_xfb_buffer_info field to be in dwords to be consistent with
shader_info but that would mean changing all the Vulkan drivers but, for
now, it's easier to always use bytes in nir_xfb_info.
Fixes: 2a22885a45 ("st,nir: Use nir_shader::xfb_info in nir_lower_io_passes")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16819>
Since we now depend on C11, we know that we have support for the C99
math functionality. So let's drop the c99_math.h compatibility wrapper,
and just include <math.h> directly.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16812>
A 1D texture operation may need to do a mov to turn a reference to a
channel of an SSA value into a scalar value to be passed as the texture
coordinate (since texture srcs can't do swizzles). Seen in
amnesia-the-dark-descent/low/46.shader_test() for example, where a 1D
texture is used to remap each of r,g,b from a previous texture result.
Besides, the nir_op_is_vec() case will (perhaps surprisingly) look through
a mov, anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16616>
This function allows to only scalarize instructions down to a desired
vectorization width. nir_lower_alu_to_scalar() was changed to use the
new function with a width of 1.
Swizzles outside vectorization width are considered and reduce
the target width.
This prevents ending up with code like
vec2 16 ssa_2 = iadd ssa_0.xz, ssa_1.xz
which requires to emit shuffle code in backends
and usually is not beneficial.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
The callback allows to request different vectorization factors
per instruction depending on e.g. bitsize or opcode.
This patch also removes using the vectorize_vec2_16bit option
from nir_opt_vectorize().
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
We want to be able to carry this along with the shader instead of always
having to re-generate it from scratch. A new nir_gather_xfb_info()
helper is also added which, instead of returning it, adds it to the
shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16750>
During certain control-flow manipulation passes, we go out-of-SSA
temporarily in certain areas of the code to make control-flow
manipulation easier. This can result in registers being in phi sources
temporarily. If two sub-passes run before we get a chance to do
clean-up, we can end up doing some out-of-SSA and then a bit more
out-of-SSA and trigger this case. It's easy enough to handle.
Fixes: a620f66872 ("nir: Add a couple quick-and-dirty out-of-SSA helpers")
Fixes: 79a987ad2a ("nir/opt_if: also merge break statements with ones after the branch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6370
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16111>
The previous logic would just set the block to the instructions
original location if we couldn't evict it from a loop.
For now we only push const loads to a later block inside ifs
but we can add more heuristics later. This change helps a
hand full of shaders but also stops a CTS regression caused
by excess spilling after a series I'm working on to disable
more of the GLSL IR optimisation passes.
Shader-db results iris (BDW):
total instructions in shared programs: 17529759 -> 17529749 (<.01%)
instructions in affected programs: 15929 -> 15919 (-0.06%)
helped: 5
HURT: 2
helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 2
helped stats (rel) min: 0.06% max: 0.15% x̄: 0.11% x̃: 0.12%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for instructions value: -3.34 0.49
95% mean confidence interval for instructions %-change: -0.14% 0.02%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 861109994 -> 861099681 (<.01%)
cycles in affected programs: 7027698 -> 7017385 (-0.15%)
helped: 95
HURT: 72
helped stats (abs) min: 1 max: 7995 x̄: 138.54 x̃: 9
helped stats (rel) min: <.01% max: 15.96% x̄: 0.54% x̃: 0.11%
HURT stats (abs) min: 1 max: 474 x̄: 39.56 x̃: 12
HURT stats (rel) min: <.01% max: 1.17% x̄: 0.20% x̃: 0.11%
95% mean confidence interval for cycles value: -159.05 35.54
95% mean confidence interval for cycles %-change: -0.45% 0.01%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 17606 -> 17605 (<.01%)
spills in affected programs: 323 -> 322 (-0.31%)
helped: 1
HURT: 0
total fills in shared programs: 22599 -> 22598 (<.01%)
fills in affected programs: 1348 -> 1347 (-0.07%)
helped: 1
HURT: 0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14940>
On some implementations eg. AMD RDNA2 the driver can generate a
more optimal code path knowing whether outputs are indexed using the
local invocation index or not.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16736>
1. Lowers NV_mesh_shader TASK_COUNT output to launch_mesh_workgroups.
2. Removes all code after launch_mesh_workgroups, enforcing the
fact that it's a terminating instruction.
3. Ensures that task shaders always have at least one
launch_mesh_workgroups instruction, so the backend doesn't
need to implement a special case when the shader doesn't have it.
4. Optionally, implements task_payload using shared memory when
task_payload atomics are used.
This is useful when the backend is otherwise not capable of
handling the same atomic features as it can for shared memory.
If this is used, the backend only has to implement the basic
load/store operations for task_payload.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16720>
The new intrinsic launches mesh shader workgroups
from a task shader, with explicit task_payload.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16720>
It's meaningful for this intrinsic and so does not add noise to the
lowering pass.
(Although dual-source writes must be to RT 0, depth and stencil
writes, which store_combined_output_pan is also used for, can still be
done with MRT enabled.)
Fixes: 5c168f09eb ("nir: Eliminate store_combined_output_pan BASE")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>
This should make it clear whether a validation failure happens in RADV or
zink.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12448>
If a user wants to skip printing the shader if no changes were made
without declaring a dummy variable for the progress.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12448>
Some older drivers don't support GLSL versions with the concept of flat
varyings and also don't support integers. Here we add a new setting to
make sure we don't use the optimisation that sets varyings to flat.
This setting helps us avoid marking varyings as flat and therefore
potentially having them changed to ints via varying packing.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6500
Fixes: 7647023f3b ("glsl: enable the use of the nir based varying linker")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16573>
Some drivers don't support these indirects and therefore require
loop unrolling if a shader uses a loop induction variable to
access a sampler array.
Here we add a new nir shader compiler option that drivers can set,
this will be the equivalent of the EmitNoIndirectSampler setting
used in the GLSL IR unrolling pass.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
v2: Don't lower ffloor@64 to ffract@64 when both ops are
to be lowered. Settle on ffloor in opt_algebraic because
in can be lowered to other ops in lower_double_ops.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>(v1)
Jason Ekstrand <jason.ekstrand@collabora.com> (v1)
Reviewed-by: Emma Anholt <emma@anholt.net> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16431>
These varyings cannot be packed by the GLSL linkers packing pass
so we need to skip this lowering until later when we can properly
handle them.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15731>