If the hardware does not support INSTANCE_INDEX natively, it will be
lowered to load_instance_id + base_instance. Otherwise, INSTANCE_ID
will be lowered to load_instance_index - base_instance.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32158>
The semantics of this newly introduced system value match
Vulkan's InstanceIndex exactly, and are equivalent to
instance_id + base_instance.
Some hardware, such as Mali Valhall or later, only provides
instance id offset by base_instance. Introducing a new system
value to represent this, rather than handling the mismatch
when lowering to BIR lets us use NIR to eliminate redundant
arithmetic that would follow from mismatched semantics, e.g.
instance_id could be lowered to instance_index - base_instance,
so expressions such as instance_id + base_instance would be
optimized to a simple instance_index.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32158>
A recent algebraic opt made a function that used to inline
with llvmpipe CL not inline anymore. However that function
has a barrier in it.
Handling barriers from inside a callstack is hard for llvmpipe
coroutines, so just force functions with barriers to be inlined.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32204>
We have to honor drivers when they say that different interpolation
qualifiers can't be mixed in the same vec4, indicated by
nir_io_has_flexible_input_interpolation_except_flat not being set.
This is a prerequisite for enabling nir_opt_varyings for all drivers.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32174>
This was enabled by default in nir_opt_varyings, but vc4 can't handle
when shader outputs write Y but not X. Add an option for it and enable
it only for the driver that benefits from it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32174>
The main difference between load_global and load_global_constant is that
the latter can be reordered arbitrarily. If the access being lowered is
already tagged as being reorderable, then we can preserve that by using
the load_global_constant intrinsics instead of load_global. This gives
us more flexibility.
On Intel, this lets us use the load_global_constant_uniform_block_intel
intrinsic for doing convergent block loads in more cases. This nets us
significant reductions in spill/fills: Borderlands 3 on Lunarlake sees
spills/fills reduced by 53%. Alchemist sees a 13% reduction.
Improves performance of Borderlands 3 DX12 on Intel Battlemage by
around 44%. Improves Hogwarts Legacy by around 14%.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>
Due to LLVM ABI reasons the SPIRV-LLVM-Translator always uses pointers to
private memory for struct function parameters. This includes kernel entry
points.
However technically it's also legal to pass those parameters by value
according to the OpenCL SPIR-V Env spec.
One compiler making use of this is e.g. artic based on Thorin.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12149
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32141>
When (off_const > max) there is a wrap around uint when calling
try_extract_const_addition.
Exit early since folding doesn't make sense in this case.
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32118>
As per the code comment added in this commit the nir produced from
glsl to nir doesn't always keep function declarations before the
code that calls them e.g. calls from within other function
implementations. The change in this commit works around this problem by
first cloning all function declarations in a first pass, then cloning
the implementations in a second pass once we have filled the remap
table.
Fixes: cbfc225e2b ("glsl: switch to a full nir based linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12115
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32100>
This instruction can be used as a breakpoint in shaders to enter a
trap if supported by the driver. It will be used to handle
NonSemantic.DebugBreak in SPIR-V.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32061>
ishl can wrap, amul cannot. so we need amul in the backend, or otherwise we
would need to introduce an ashl opcode instead. that doesn't seem better.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
Some drivers generally need int64 lowered, but prefer to do this lowering
themselves late, to have a chance to optimize targeted int64 patterns before
lowering the rest. This isn't currently possible since nir_lower_int64 takes no
options except what's const* in the shader, and frontends call nir_lower_int64
before passing the shader off to the driver. Add an option to defer int64
lowering. This is a bit ugly but the alternative is replumbing nir_lower_int64's
option handling cross-tree and no-thank-you-not-right-now.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
This only worked when "a" was 16-bit because a pattern above replaced the
shift.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
It only supported scalar intrinsics because it was written before
nir_opt_vectorize_io existed. The introduction of nir_opt_vectorize_io
exposes this issue. The direct path has been tested. The indirect path
hasn't. That's fine because if we see a CLIP_DIST failure with indirect
in the future, this pass is likely the cause.
This is a prerequisite for enabling nir_opt_varyings for all gallium
drivers.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31994>