We need this, because on Intel task payload starts with private header,
followed by user-accessible data.
Fixes: 37e78803d7 ("intel/compiler: use nir_lower_task_shader pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409>
Based on nir_create_passthrough_tcs and d3d12_make_passthrough_gs, this
creates a passthrough geometry shader that can be used by drivers that
needs to emulate some graphics features in the geometry shader.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19987>
We cannot fully use the vectorizer outside of this pass because once
stack load/store operations have been lower to global load/store, the
robustness rule applies to those as they would to application
load/store.
But this is all internal and we know it doesn't require out of bound
checking. So doing the vectorizing here is the best solution. We just
have to teach the vectorizer about our intrinsics.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20058>
We can determice scopes/ranges of the use of ray queries and use this information to combine ray queries.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16593>
Fixes issue with "is helper invocation" that in recent SPIR-V is mapped to
a volatile Load. The CSE was catching the loads before they were transformed
in the new is_helper_invocation intrinsic (that is not reorderable).
Fixes: 729df14e45 ("nir: Handle volatile semantics for loading HelperInvocation builtin")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19432>
NIR has two implementations of lower_idiv, keyed on the
imprecise_32bit_lowering flag. This flag is misleading: the results when
setting this flag "imprecise", they're completely wrong for some values.
If a backend has a native implementation of umul_high, the correct path
isn't that much more expensive. If it doesn't, it's substantially slower
for highp integer divison... but in practice, non-constant highp integer
division is pretty rare.
After a painful migration of the tree, this code path has no more users.
Remove it so nobody else gets the bright idea of using it again.
Closes: #6555
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>
The intel backend compiler is not dealing with the scratch loads
emitted by this pass very well. There are 2 reasons for this :
- all loads are at the top of the shader
- the loads are global load intrinsics (cannot be differentiated
from ssbo loads for example)
This leads the backend to generate ridiculous amount of spills.
To help a bit (actually quite a lot), we can move the scratch loads in
the blocks where they're needed, using the dominance information.
Quite often that also ends up moving loads in a block that might not
be reached by all the lanes, so we're potentially avoiding some loads.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
Based on si_create_passthrough_tcs() as that seemed the most generic of
the various different backend driver implementations. Uses the
load_tess_level_outer_default and load_tess_level_inner_default
intrinsics to load the gl_TessLevelOuter and gl_TessLevelInner values,
so driver will somehow need to implement those to load the values set
by pipe_context::set_tess_state() or similar.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>
The new enum is called nir_selection_control_divergent_always_taken,
and it's almost the same as nir_selection_control_flatten.
The main difference between the two is that "flatten" represents
a choice made by the application but "divergent_always_taken" may
be applied by the compiler stack when it thinks this is beneficial.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
Lowered fp64 ops can blow up the loop bodies while still being
suitable for unrolling.
Allow for using different parameters to unroll loops with
soft fp64.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18863>
This lets us vectorize gl_TessLevel{Inner,Outer} writes, using a pass
developed for RADV. Not all backends are prepared to handle this, so
we make it optional.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
SPIRV and GLSL are reasonable at converting ALU ops to mediump, but
variable storage would be wrapped in a 2f32/2mp on store/load, and if
nir_vars_to_ssa doesn't make that storage go away then you'd have extra
conversions. For compute shader shared mem, you'd waste memory too.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18259>
AGX and zink both want all of these lowered, but nir_to_tgsi will want
only demote (and terminate if it was possible from GLSL but it's not)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15932>
This is almost always a nir_instr and updating the src of a nir_if will
have to work slightly differently in the future.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12910>
Also make the code cleaner and simplier.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17334>
No driver doesn't use this option.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17757>
Allow folding constants/undef sources by sharing more code with the image_store
16bit folding pass.
Allow more than one set of sources because RADV wants two, one for
G16 (ddx/ddy) and one for A16 (all other sources).
Allow folding cube sampling destination conversions on radeonsi/radv because
I think the limitation only applies to sources.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16978>
This adds support for global 64-bit GPU addresses as a pair of
32-bit values. This is useful for platforms with 32-bit GPUs
that want to support VK_KHR_buffer_device_address, which makes
GPU addresses explicitly 64-bit.
With the new format we also add new global intrinsics with 2x32
suffix that consume the new address format.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
When point smoothing is enabled then this lowering pass will
modifies the alpha component of every write to fragment output.
Anti-aliased points get rounded with respect to their radius instead
of square.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15117>