It's just v_fma with fixed DPP8 and builtin s_waitcnt_expcnt, so it can mostly
be handled as a pure VALU instruction.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
Both GLSL & SPIRV have undefined values for shift > bitsize. But SM5
says :
"This instruction performs a component-wise shift of each 32-bit
value in src0 left by an unsigned integer bit count provided by
the LSB 5 bits (0-31 range) in src1, inserting 0."
Better to not hard code the wrong behavior in NIR.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e227bb9fd5 ("nir/builder: add ishl_imm helper")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@colllabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21720>
It's unnecessary and already checked elsewhere like in
vkCmdBindDescriptorSets(). This improves performance of vkoverhead
test #76 (descriptor_1image) by +18%. It's the same performance as
PRO on my Threadripper 1950X now. This should also slightly improve
texel and buffer descriptors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20699>
We can't add a 64-bit immediate with the hardware iadd, that won't work. What we
can do is add a 32-bit immediate, derived as the low 32-bits of a 64-bit
nir_ssa_def.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
If we manage to fold in a left shift that's bigger than the hardware can do, we
should at least avoid generating a useless right shift to feed the hardware
rather bailing completely.
For motivation, this form of address arithmetic is encountered when indexing
into arrays with large power-of-two element sizes (array-of-structs).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
Optimize address arithmetic of the form
base + u2u64((index << shift) + const)
into hardware operands
base, index << (shift - format_shift) + const'
which (if format_shift = shift) can be simply
base, index + const'
rather than the current naive translation
base, ((index << shift) + const) >> format_shift
This saves at least one pointless shift. We can't do this optimization with
nir_opt_algebraic, because explicitly optimizing "(a << #b) >> #b" to "a" isn't
sound due to overflow. But there's no overflow issue here, which is what this
whole pass is designed around.
For motivation, this address arithmetic implements "dynamically indexing into an
array inside of a C structure", where the const is the offset of the array
relative to the structure.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
Once we've matched a summand, commit to it. This avoids needlessly checking the
second source if the first matched, and removes some indentation/funny control
flow.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
this otherwise may have been a surface that was never drawn to or
already had its layout corrected, in which case a deferred barrier
is not only unnecessary, it might be broken
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21739>
Wa_1209978020 is a clone of Wa_18012201914. Update references to
refer to the originating bug, and use generated helpers to ensure it
is applied to future platforms as needed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21741>
Rather than control_barrier. This avoids the need to handle control_barrier at
all for backends that set use_scoped_barrier. This effectively matches what
spirv_to_nir emits, so Vulkan-capable compilers should be ok.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>
Now that we handle scoped barriers with execution scope during
NIR -> Backend IR translation, this lowering is not needed anymore.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>
For GLSL, we want to optimize code like
memoryBarrierBuffer();
controlBarrier();
into a single scoped_barrier intrinsic for the backend to consume. Now that
backends can get scoped_barriers everywhere, what's left is enabling backends to
combine these barriers together. We already have an Intel-specific pass for
combining memory barriers; it just needs a teensy bit of generalization to allow
combining all sorts of barriers together.
This avoids code quality regression on Asahi when switching to purely scoped
barriers. It's probably useful for other backends too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21661>