A cooperative matrix conversion operation was represented in NIR by the
cmat_unary_op intrinsic with an nir_alu_op as extra parameter,
that was already lowered to a specific conversion operation
based on the matrix types.
Instead of that, add a new intrinsic `cmat_convert` that is specific
for that conversion. In addition to the src/dst matrix descriptions
already available, also include the signedness information in the
intrinsic (reuse nir_cmat_signed for that). This is needed because
different Convert operations define different interpretations for
integers, regardless their original type.
In this patch, both radv and intel were changed to use the same logic
that was previously used to pick the lowered ALU op.
This change will help represent cmat conversions involving BFloat16,
because it avoids having to create new NIR ALU ops for all the
combinations involving BFloat16.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34511>
I was surprised this had any affect on Intel GPUs because we have been
unconditionally performing this optimization in the backend since June
2014.
Once that error is fixed (later in this MR), this change prevents a
couple dozen regressions in shader-db and around 90 regressions in
fossil-db. Many of the regressions in fossil-db were loss of SIMD32, and
that can be a big deal.
v2: Add 64-bit too. Suggested by Alyssa.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16970141 -> 16970139 (<.01%)
instructions in affected programs: 40 -> 38 (-5.00%)
helped: 2 / HURT: 0
total cycles in shared programs: 914617580 -> 914617548 (<.01%)
cycles in affected programs: 3428 -> 3396 (-0.93%)
helped: 2 / HURT: 0
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Cycle count: 30546028462 -> 30546025224 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237017827 -> 237017731 (-0.00%)
Totals from 83 (0.01% of 706657) affected shaders:
Cycle count: 3042978 -> 3039740 (-0.11%); split: -0.13%, +0.02%
Non SSA regs after NIR: 78997 -> 78901 (-0.12%)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
this is a cleaned up version of the lowering originally written for asahi, moved
to common code so it can be shared with an upcoming Vulkan implementation (not
honeykrisp).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34507>
When texture_index and sampler_index are over 32, we can't really check
for them in a single 32-bit word. This happens among other things when
Panfrost uses preload shaders on v9 and later. Otherwise, we trigger
undefined behavior.
We're already doing this for textures in one case, let's be consistent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365>
In commit 292ac71a4a ("nir/lower_tex: handle deref casts"), we avoided
using texture_index when a texture instruction contained a variable
deref. There's no good reason why this should be done to some of the
lowering, but not all.
So let's fix up code-paths that were added after this change to do the
same.
The first two patches here crossed paths with the commit that introduced
texture_mask, so it's not strange that the change was missed. The last
one seems to have just copied what was done around it, propagating the
issue.
Fixes: 880b00dc59 ("nir/lower_tex: Add support for lowering YUYV formats")
Fixes: 1358d93650 ("nir/lower_tex: Add support for lowering Y41x formats")
Fixes: 65d6f5aed2 ("nir: add options to lower y_vu, yv_yu, yx_xvxu and xy_vxux")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365>
This patch does a couple of things to make CL integration with drivers
as seamless as possible:
- We pull in opencl-c.h and opencl-c-base.h to stop relying on system
headers.
- Parts of libcl.h are moved to new headers that are incomplete CL-safe
variants of libc headers.
- A couple of util headers are changed to remove now unnecessary
__OPENCL_VERSION__ guards and make more headers CL safe.
- Drivers now include src/compiler/libcl and use headers like
macros.h,u_math.h instead of libcl.h.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>
On Valhall we can optimize lower waits, which waits for both readers and
writers, into resource_waits which only wait for writers, allowing
threads accessing read-only resources to execute concurrently.
Let's use that on LD_TILE instructions so we can optmize the read-only
case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
In order to dynamically load the content of the tile buffer, we need
to know the target (color, depth or stencil) and the conversion to
apply. Let's define the load_input_attachment_{target,conv}_pan
intrinsics so we can dissociate the logic lowering input attachment
loads into load_converted_output_pan, and the part optimizing the shader
when input attachment map is passed at compile time.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
Needed if we want to lower multisample input attachment loads to
tile buffer loads.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
This allows us to pass a dynamic render target which will be needed
to support VK_KHR_dynamic_rendering_local_read.
While at it, we also enable support for depth/stencil tile loads.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
This pulls out the logicop_func variable from the options struct, so we can
modify it in the next commit in a central place. It then refactors out the
format variable from the options struct since we end up duplicating
options->format[rt] a zillion times and passing in both an options struct and a
logicop func override is confusing so this will just make everything neater and
self-contained next commit.
no functional change.
Cc'd to make the next commit cherrypickable.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34426>
The vtn_ssa_value for a cmat is not backed by a nir_def, but by a nir_variable, so
can't be used directly when calling a function. In most cases the cmat is used by
reference so code will take the value of deref for it (which is a `nir_def`).
When passing a cooperative matrix to a function by value, let the caller pass the deref
value, and the callee copy to a new local variable from that deref.
Fixes: b98f87612b ("spirv: Implement SPV_KHR_cooperative_matrix")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34364>
Require that the function returns false before we get to the end if there
is any linker error.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34307>
__LINE__ can be inconsistent when using different compilers. This patch
changes the test runner to do a simple string find/replace of the test
source file instead of looking for the line where the reference string
starts.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33980>
We need to remember which streamout buffers and streams were enabled,
even if the shader doesn't actually write any outputs to them,
because the API requires that we count vertices created by this shader
towards queries against those streams.
That information can be gathered by nir_gather_xfb_info_with_varyings
from the original NIR I/O variables that we get from the frontend,
but it isn't included in any intrinsics so would be otherwise lost here.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
This fixes some upcoming CTS tests that attempt bias usage when
it is not valid per spec.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34285>
This pass was originally based on a similar pass from Intel but it's
grown support for some fancy stuff like fp64 -> fp16 conversion
splitting with proper rounding.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34126>
Intel HW only has support for non-uniform offsets for TG4 operations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
After 8fa70cfcfd ("spirv: Use the right bit-size for spec constant
ops") the bit-size will already be adjusted based on the sources, and
this will take care of Convert operations too.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34119>