Derefs have index-based access semantics, which means we don't need
custom intrinsics to encode an index instead of a byte offset.
Remove the "masked" store intrinsics and just emit the pair of atomics
directly. This massively reduces duplication between scratch, shared,
and constant, while also moving more things into nir so more optimizations
can be done.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
There's a few changes in here that are very inter-related.
First, we stop lowering load_deref on shader_temp to load_ptr_dxil,
and just leave it as load_deref. In order for that to work, we need
the derefs to be in a shape that's acceptable to DXIL, so the only
current producer of shader_temp loads (the CLC frontend) needs to
run some lowering passes on them first.
The DXIL backend is augmented to just write out deref indices while
walking a deref chain, which will get combined in the load op into
a GEP instruction. For non-mesh/raytracing shaders, these are required
to be single-level scalar arrays, but the complexity here is preparation
for when we don't need to do that anymore.
Additionally, the const lookups are changed from using a hash table
to just putting an index on the variable.
All of this together is enough to enable the authored-forever-ago test
which uses indirect array access into a const packed struct. The
load_ptr_dxil handling didn't deal with packed structs / unaligned
accesses, but now that we're in a logical address space with derefs
instead of physical, there's no alignment to deal with anymore and
the fact that it's packed goes out the window.
This removes one custom DXIL intrinsic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
I was seeing u2u64 still in my final shader after pack/unpack were
lowered, which sounds to me like some other optimizations are missing
for detecting the post-lowering pack/unpack patterns, but let's at
least add some patterns for the simple cases.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
This pass handles 16-component 8-bit loads, 8-component 16-bit loads,
and 2-component 64-bit loads. The number of components for the fallback
case doesn't need to be 4.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
The var splitting pass can rearrange the variables as long as their
position in memory doesn't matter. For block-arranged variables,
or things like memcpys or casts, the layout matters, but atomics
don't imply anything about the layout of the overall variable, so
don't treat them as "complex" for this use case.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Idiomatic DXIL has constants contained within global variables rather
than a big blob of data. Doing this allows us to have 16-bit and 64-bit
data as well, where normally bitcasts would be disallowed on variable
GEP chains.
Unfortunately, DXIL validation requires SOA to be turned into AOS,
which means we need to split structs. We want to be able to run this
on nir_var_mem_constant variables which have constant initializers,
so add a bit of logic to handle that case, and relax the mode validation.
There's nothing special about the modes it was set up to handle.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
While this is a generic bit twiddling ALU instruction, it's especially useful
for address calculations, since the architecture's tiled textures use Morton
coding within the tiles.
This will be used when lowering image_texel_address on AGX, as part of the image
atomics implementation. I don't know if there's any other neat uses I could
detect with opt_algebraic, this doesn't seem like an operation a shader would
open-code... Maybe useful for BVH building or something...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
The find-remove-use pattern is quite natural for texture lowering :)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
I have this in the AGX compiler but I want to use it in more places.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
Adding `nir_rematerialize_derefs_in_use_blocks_impl`
solves some cases when 'opt_dead_cf()' generates
a phi instruction for the first argument
of the `deref_store` intrinsic.
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Lionel Landwerlin's avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6742
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22983>
This is a piece of cake with unified atomics :-) This will let us do our
addressing math tricks nice and easily.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23529>
Use the string itself as a key for searching -- and the internal
allocated name as a key when storing.
Because record_key_hash doesn't consider the name field, which is
the only used field for a SUBROUTINE type, the hash key was always
the same for all types. Using the name fixes this.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23277>
For VK_KHR_fragment_shader_barycentric, AMD needs to know the primitive
topology in the fragment shader but with fast-link GPL this is unknown
at compile time and it needs to be passed dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16742>
This only used by vulkan drivers and depends on vulkan util, so do the move to decouple
nir from vulkan utils
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23444>
This move is done for decouple glsl_types.h from src/mesa/*
This is achieved by move gl_texture_index from src/mesa/main/menums.h to src/compiler/shader_enums.h
And move ATOMIC_COUNTER_SIZE,MAX_VERTEX_STREAMS from src/mesa/main/config.h to src/compiler/shader_enums.h
Move include main/[config|menums].h into glsl/glsl_parser_extras.h from glsl_types.h
As now glsl_types.h should not include headers from src/mesa/*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23420>
As glsl_types.cpp also called is_gl_identifier, so move it into glsl_types.h,
this will help the decouple glsl_types.h from src/compiler/glsl/*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23420>
In many cases the raw value is not really helpful,
since we only work with enums and the raw value is
already printed for indices without special printing.
If an index benefits from having special printing AND the
raw value, we can include the printing of the raw value
as part of its handler.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23375>
Via Coccinelle patches
@@
expression a, b, c;
@@
-nir_channels(b, a, (1 << c) - 1)
+nir_trim_vector(b, a, c)
@@
expression a, b, c;
@@
-nir_channels(b, a, BITFIELD_MASK(c))
+nir_trim_vector(b, a, c)
@@
expression a, b;
@@
-nir_channels(b, a, 3)
+nir_trim_vector(b, a, 2)
@@
expression a, b;
@@
-nir_channels(b, a, 7)
+nir_trim_vector(b, a, 3)
Plus a fixup for pointless trimming an immediate in RADV and radeonsi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
Via Coccinelle patch:
@@
expression a, b, c;
@@
-a.src = nir_src_for_ssa(b);
-a.src_type = c;
+a = nir_tex_src_for_ssa(c, b);
@@
expression a, b, c;
@@
-a.src_type = c;
-a.src = nir_src_for_ssa(b);
+a = nir_tex_src_for_ssa(c, b);
Plus manual fixups, including...
* a few identity swizzles changed to nir_trim_vector in TTN and prog-to-nir to
fix the Coccinelle-botched formatting, and similarly a pointless nir_channels
* collapsing a now-pointless temp in vtn
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
This makes texture instructions a lot less annoying to construct, especially in
cases where the deref-based helpers don't work.
I only converted core NIR, not the drivers. Since it was by hand.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.
v3: Remove redundant patterns. Noticed by Matt.
shader-db:
DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15
total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32
Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.
All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.
fossil-db:
DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210
Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922
Lost: 90
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
There should not be any isub at this point due to lowerings that happened
ages before getting to late algebraic.
shader-db:
DG2
total instructions in shared programs: 23103769 -> 23103767 (<.01%)
instructions in affected programs: 65 -> 63 (-3.08%)
helped: 1 / HURT: 0
total cycles in shared programs: 842348074 -> 842347714 (<.01%)
cycles in affected programs: 28572 -> 28212 (-1.26%)
helped: 3 / HURT: 0
One compute shader in Assassin's Creed Odyssey was affected.
fossil-db:
DG2
Instructions in all programs: 196400668 -> 196400676 (+0.0%)
helped: 8 / HURT: 5
Cycles in all programs: 13931740724 -> 13931758003 (+0.0%)
helped: 8 / HURT: 7
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
This condition can occur in the wild (more specifically in RT shader
call lowering), and it is handled correctly.
Cc: mesa-stable
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22536>
nir_repair_ssa_impl may insert phi nodes for any deref, but if the deref mode is uniform, validation fails.
To fix this, rematerialize the derefs in the blocks they are used to avoid generating phi nodes for them.
Cc: mesa-stable
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22536>