Use the local variable in the assertions, move them out the critical region.
No behavior change.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23279>
The element type passed is different than the array type and it is not
a "base type" in the glsl_type sense, so pick a name that reflects that.
Also stick to a single name for the array_size.
Just renames, no behavior change.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23279>
These are mostly just obvious patterns that somebody will eventually
want to add.
DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar
results (Ice Lake shown)
total instructions in shared programs: 20570033 -> 20570026 (<.01%)
instructions in affected programs: 7363 -> 7356 (-0.10%)
helped: 6 / HURT: 0
total cycles in shared programs: 902118781 -> 902118854 (<.01%)
cycles in affected programs: 419132 -> 419205 (0.02%)
helped: 4 / HURT: 2
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819500 -> 152819380 (-0.00%)
Cycles: 15014627187 -> 15014624437 (-0.00%)
Totals from 115 (0.02% of 662497) affected shaders:
Instrs: 28963 -> 28843 (-0.41%)
Cycles: 404582 -> 401832 (-0.68%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
v2: Fix a copy-and-paste bug s/('find_lsb', a)/a/ in the patterns. See
piglit!819.
DG2, Tiger Lake, Ice Lake, Skylake, and Broadwell had similar results (Ice Lake shown)
total instructions in shared programs: 20570063 -> 20570033 (<.01%)
instructions in affected programs: 452 -> 422 (-6.64%)
helped: 30 / HURT: 0
total cycles in shared programs: 902118723 -> 902118781 (<.01%)
cycles in affected programs: 1762 -> 1820 (3.29%)
helped: 0 / HURT: 29
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819969 -> 152819500 (-0.00%)
Cycles: 15014628652 -> 15014627187 (-0.00%); split: -0.00%, +0.00%
Totals from 469 (0.07% of 662497) affected shaders:
Instrs: 7644 -> 7175 (-6.14%)
Cycles: 31787 -> 30322 (-4.61%); split: -4.90%, +0.29%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
The needs of this pass are ever so slightly more than what
nir_opt_algebraic can do. :( Specifically, it needs to be able to look
at the relationship of constant values used in an expression tree.
v2: Add nir_mov_alu to handle swizzles on the original sources.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
It is now set by all relevant drivers and not checked anywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
True for all backends supporting barriers. This lets us collapse lots of code,
since scoped_barriers are based on the SPIR-V definition.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
Derefs have index-based access semantics, which means we don't need
custom intrinsics to encode an index instead of a byte offset.
Remove the "masked" store intrinsics and just emit the pair of atomics
directly. This massively reduces duplication between scratch, shared,
and constant, while also moving more things into nir so more optimizations
can be done.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
There's a few changes in here that are very inter-related.
First, we stop lowering load_deref on shader_temp to load_ptr_dxil,
and just leave it as load_deref. In order for that to work, we need
the derefs to be in a shape that's acceptable to DXIL, so the only
current producer of shader_temp loads (the CLC frontend) needs to
run some lowering passes on them first.
The DXIL backend is augmented to just write out deref indices while
walking a deref chain, which will get combined in the load op into
a GEP instruction. For non-mesh/raytracing shaders, these are required
to be single-level scalar arrays, but the complexity here is preparation
for when we don't need to do that anymore.
Additionally, the const lookups are changed from using a hash table
to just putting an index on the variable.
All of this together is enough to enable the authored-forever-ago test
which uses indirect array access into a const packed struct. The
load_ptr_dxil handling didn't deal with packed structs / unaligned
accesses, but now that we're in a logical address space with derefs
instead of physical, there's no alignment to deal with anymore and
the fact that it's packed goes out the window.
This removes one custom DXIL intrinsic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
I was seeing u2u64 still in my final shader after pack/unpack were
lowered, which sounds to me like some other optimizations are missing
for detecting the post-lowering pack/unpack patterns, but let's at
least add some patterns for the simple cases.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
This pass handles 16-component 8-bit loads, 8-component 16-bit loads,
and 2-component 64-bit loads. The number of components for the fallback
case doesn't need to be 4.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
The var splitting pass can rearrange the variables as long as their
position in memory doesn't matter. For block-arranged variables,
or things like memcpys or casts, the layout matters, but atomics
don't imply anything about the layout of the overall variable, so
don't treat them as "complex" for this use case.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Idiomatic DXIL has constants contained within global variables rather
than a big blob of data. Doing this allows us to have 16-bit and 64-bit
data as well, where normally bitcasts would be disallowed on variable
GEP chains.
Unfortunately, DXIL validation requires SOA to be turned into AOS,
which means we need to split structs. We want to be able to run this
on nir_var_mem_constant variables which have constant initializers,
so add a bit of logic to handle that case, and relax the mode validation.
There's nothing special about the modes it was set up to handle.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
While this is a generic bit twiddling ALU instruction, it's especially useful
for address calculations, since the architecture's tiled textures use Morton
coding within the tiles.
This will be used when lowering image_texel_address on AGX, as part of the image
atomics implementation. I don't know if there's any other neat uses I could
detect with opt_algebraic, this doesn't seem like an operation a shader would
open-code... Maybe useful for BVH building or something...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
The find-remove-use pattern is quite natural for texture lowering :)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
I have this in the AGX compiler but I want to use it in more places.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
Adding `nir_rematerialize_derefs_in_use_blocks_impl`
solves some cases when 'opt_dead_cf()' generates
a phi instruction for the first argument
of the `deref_store` intrinsic.
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Lionel Landwerlin's avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6742
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22983>
This is a piece of cake with unified atomics :-) This will let us do our
addressing math tricks nice and easily.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23529>
Use the string itself as a key for searching -- and the internal
allocated name as a key when storing.
Because record_key_hash doesn't consider the name field, which is
the only used field for a SUBROUTINE type, the hash key was always
the same for all types. Using the name fixes this.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23277>
For VK_KHR_fragment_shader_barycentric, AMD needs to know the primitive
topology in the fragment shader but with fast-link GPL this is unknown
at compile time and it needs to be passed dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16742>
This only used by vulkan drivers and depends on vulkan util, so do the move to decouple
nir from vulkan utils
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23444>