Adding `nir_rematerialize_derefs_in_use_blocks_impl`
solves some cases when 'opt_dead_cf()' generates
a phi instruction for the first argument
of the `deref_store` intrinsic.
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Lionel Landwerlin's avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6742
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22983>
This is a piece of cake with unified atomics :-) This will let us do our
addressing math tricks nice and easily.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23529>
For VK_KHR_fragment_shader_barycentric, AMD needs to know the primitive
topology in the fragment shader but with fast-link GPL this is unknown
at compile time and it needs to be passed dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16742>
This only used by vulkan drivers and depends on vulkan util, so do the move to decouple
nir from vulkan utils
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23444>
In many cases the raw value is not really helpful,
since we only work with enums and the raw value is
already printed for indices without special printing.
If an index benefits from having special printing AND the
raw value, we can include the printing of the raw value
as part of its handler.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23375>
Via Coccinelle patches
@@
expression a, b, c;
@@
-nir_channels(b, a, (1 << c) - 1)
+nir_trim_vector(b, a, c)
@@
expression a, b, c;
@@
-nir_channels(b, a, BITFIELD_MASK(c))
+nir_trim_vector(b, a, c)
@@
expression a, b;
@@
-nir_channels(b, a, 3)
+nir_trim_vector(b, a, 2)
@@
expression a, b;
@@
-nir_channels(b, a, 7)
+nir_trim_vector(b, a, 3)
Plus a fixup for pointless trimming an immediate in RADV and radeonsi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
Via Coccinelle patch:
@@
expression a, b, c;
@@
-a.src = nir_src_for_ssa(b);
-a.src_type = c;
+a = nir_tex_src_for_ssa(c, b);
@@
expression a, b, c;
@@
-a.src_type = c;
-a.src = nir_src_for_ssa(b);
+a = nir_tex_src_for_ssa(c, b);
Plus manual fixups, including...
* a few identity swizzles changed to nir_trim_vector in TTN and prog-to-nir to
fix the Coccinelle-botched formatting, and similarly a pointless nir_channels
* collapsing a now-pointless temp in vtn
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
This makes texture instructions a lot less annoying to construct, especially in
cases where the deref-based helpers don't work.
I only converted core NIR, not the drivers. Since it was by hand.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.
v3: Remove redundant patterns. Noticed by Matt.
shader-db:
DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15
total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32
Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.
All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.
fossil-db:
DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210
Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922
Lost: 90
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
There should not be any isub at this point due to lowerings that happened
ages before getting to late algebraic.
shader-db:
DG2
total instructions in shared programs: 23103769 -> 23103767 (<.01%)
instructions in affected programs: 65 -> 63 (-3.08%)
helped: 1 / HURT: 0
total cycles in shared programs: 842348074 -> 842347714 (<.01%)
cycles in affected programs: 28572 -> 28212 (-1.26%)
helped: 3 / HURT: 0
One compute shader in Assassin's Creed Odyssey was affected.
fossil-db:
DG2
Instructions in all programs: 196400668 -> 196400676 (+0.0%)
helped: 8 / HURT: 5
Cycles in all programs: 13931740724 -> 13931758003 (+0.0%)
helped: 8 / HURT: 7
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
This condition can occur in the wild (more specifically in RT shader
call lowering), and it is handled correctly.
Cc: mesa-stable
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22536>
nir_repair_ssa_impl may insert phi nodes for any deref, but if the deref mode is uniform, validation fails.
To fix this, rematerialize the derefs in the blocks they are used to avoid generating phi nodes for them.
Cc: mesa-stable
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22536>
This is a prepare step to remove depends on p_defines.h in src/util/*
This is done by:
replace pipe_prim_type with mesa_prim
replace shader_prim with mesa_prim
replace PIPE_PRIM_MAX with MESA_PRIM_COUNT
replace SHADER_PRIM_ with MESA_PRIM_
replace PIPE_PRIM_ with MESA_PRIM_
This patch only replace code only
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23369>
To add the const offset to the base index.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23254>
This introduces new intrinsics nir_intrinsic_load_barycentric_coord_xxx
with 3-components instead of expanding the existing ones that are
supposed to interpolate input varyings, while BaryCoord is a sysval
on most hardware.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23254>
Now that radeonsi support pass desc to ssbo atomic ops,
we can use ssbo atomic instead. aco does not implement
nir_buffer_atomic_add either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23096>
It's one of the weirder parts of our shader interface's interactions with
the GL API, so let's try to make it a little cleaner.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23111>
The DXIL backend would like to distinguish between casts to 16-bit
that must cast, vs those that may. If a shader only ever produces
16-bit types from mediump casts and ALU ops on those values, then
the resulting shader can be annotated with DXIL's min-precision
qualifier, basically telling the driver to use 16-bit precision if
it's faster for them. If it uses concrete 16-bit casts, or loads/
stores to externally-visible memory, then it must use the "native"
16-bit flag, which is not supported on all hardware.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23344>
Otherwise this can cause optimizations to fight resulting in infinite
optimization loops with opt_algebraic, constant_folding, and copy_prop.
Fixes: 368be872 ("nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23192>
Since 2d6233d0 ("nir: Check all sizes in nir_alu_instr_is_comparison"),
nir_alu_instr_is_comparison already returns true for comparisons with 32bit
result.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23287>
The resouce_intel intrinsic doesn't not result in an actual
instruction, it's just a wrapper around another value, usually a
load_const.
Allowing this intrinsic to be moved anywhere means it's going to be
closer to the value it wraps, enabling opt_gcm to move a load_ubo
using this resource_intel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Intel HW has multiple ways to access resources like UBO/SSBO/images :
- binding tables : a small ~240 heap of surfaces
- bindless surfaces : a 64Mb heap of surfaces up to Gfx12+, 4Gb on Gfx12.5+
- surfaces : a 4Gb heap on Gfx12.5+ (mostly unused at the moment,
only available through the LSC)
For samplers, we have 2 options since Gfx11+ :
- samplers indexed from the Dynamic State Heap (4Gb)
- samplers indexed from the Bindless Sampler Heap (4Gb)
Additionally our whole push constant promotion mechanism is based
around binding table indices. This is problematic if you want to also
promote to push constants things that would be accessed through the
bindless heap.
To solve this issue, we introduce a new intrinsic that will cary a
block index that is not based off the binding table index nor the
bindless table offset.
We will also use this intrinsic to identify whether the buffer/surface
index in load_ubo/load_ssbo/store_ssbo/etc... is relative to the
binding table or the bindless heap.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Some instruction we would like to keep around because they carry
additional information in their indices.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
This adds a little extra work, since now dominance is computed and
blocks that don't just have then-return or else-return are looked at.
However it means that nir_lower_returns can now keep phis up to date
by inserting undefs without causing some phis to become non-trivial.
This ends up obviating a couple of tests for lower_returns.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22913>
sample_mask_agx maps to the AGX instruction used to write out a sample mask.
api_sample_mask_agx is a system value that returns the value of glSampleMask
(or its Vulkan equivalent), used to lower glSampleMask (etc).
This is distinct from sample_mask_in, which we map to the hardware thing and
AND with this as a lowering.
sample_positions_agx is a system value returning the sample positions in a
packed fixed-point format matching the hardware register, used to lower
gl_SamplePositions.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23040>
This will be used for RADV/ACO in the future, and I don't want to and
don't have to deal with 16-bit.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22636>
Now that we have nir_fsub_imm, let's use it to save some typing!
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>
nir_ffma_imm has several variants that allows specific arguments to be
immediates. Use them for simplicity.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>
When doing nir_fsub(b, x, imm), we can negate the immediate value, and
replace the fsub with nir_fadd_imm() and get the same result. This makes
the code a bit shorter and easier to read.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>