We can optimize the VUE layout in cases where all shaders are compiled
together and some outputs are unused. So we need to have consistent
clip/cull_distance_mask with the VUE.
Previously we could have a VUE without ClipDistance present in the
header and yet have a non zero clip_distance_mask. This would trip the
HW into taking into account a VUE field that doesn't exist.
Here we set the clip/cull_distance_mask to 0 if the associated output
is not written by the shader. The written outputs are always
consistent with what's in the VUE.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2d396f6085 ("intel: prepare VUE layout for more than 2 layouts")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13685
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36734>
For unaligned invocations, don't launch two COMPUTE_WALKER, instead we
can mask off excessive invocations in the shader itself at nir level and
launch one additional workgroup.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36245>
The logical send lowering already resolves sources when constructing
the send payload, so prior to that lowering, we don't need to apply
any special restrictions here.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
The logical send lowering already resolves sources when constructing
the send payload, so prior to that lowering, we don't need to apply
any special restrictions here.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
The goal here was to avoid propagating source modifiers, unusual
regions, and other things that couldn't be used as a send source.
A few patches ago ("brw: Properly resolve non-sendable sources in a few
logical opcodes") we fixed the logical send lowering to handle these
by resolving them when constructing the send payload. So now prior
to lowering, we don't need to treat these opcodes specially.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
This introduces enums for SHADER_OPCODE_SEND[_GATHER] sources, similar
similar to what we've done for most of the newer logical opcodes. This
allows us to use actual names for sources rather than remembering their
order, or leaving ourselves comments like /* ex_desc */ all over. It
will also make it easier to add or reorder sources in the future.
While we're at it, we also standardize on the number of sources.
Previously, we allowed SHADER_OPCODE_SEND to have either 3 (monosend) or
4 (split send) sources, but this is mostly for haphazard historical
reasons. We now specify all sources every time, eliminating the need
for careful inst->source checks before accessing the last source.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
Sources decorated with source modifiers, immediates, or particular
stride combinations may not be directly usable as SEND operands. We
have to resolve them to an ordinary VGRF first.
Most opcodes do this as part of broader payload construction, but these
send directly because the messages are very simple.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
The is_send() helper is just a wrapper around inst->is_send_from_grf()
now, so we can combine the two. Trim the name from is_send_from_grf()
to is_send(), as it's shorter, and also matches is_math().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
We used to have inst->mlen set on various virtual opcodes, but these
days the only instructions that should have inst->mlen set are
SHADER_OPCODE_SEND and SHADER_OPCODE_SEND_GATHER, which are already
covered in inst->is_send_from_grf(). So we don't need to check for mlen
specifically.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
Explicitly list FS_OPCODE_INTERPOLATE_AT_* as allowed, as they were
already allowed by the default case. Interlock, memory fence, and
barrier were disallowed and remain so. Uniform pull constant load
was allowed and remains so. SHADER_OPCODE_SEND and SEND_GATHER get
explicit handling.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
Every case but SHADER_OPCODE_SEND and SHADER_OPCODE_BARRIER will be
lowered to SEND before register allocation happens. And the barrier
send has a null destination, so the restriction doesn't apply.
Note that this hack is for Gfx9 only, so we don't need to worry about
Xe3's SHADER_OPCODE_SEND_GATHER feature.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
We used to have other opcodes as well, but we've since transitioned
entirely to logical send lowering prior to register allocation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
A number of places emit monolithic sends, where the second payload is
empty. Some places were using a BAD_FILE register, while others were
specifying the hardware ARF null register. Switch to BAD_FILE for
consistency - this is usually what we do for "source isn't present".
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34040>
The vulkan runtime doesn´t store this parameter in the dynamic state
(since it's not a dynamic state). Just capture it at compile time and
leave on the wm_prog_data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36665>
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
These are identical, pull them into a helper so we only have one place
to fix bugs.
(Marked fixes because the next two patches depend on the refactor.)
Fixes: ec2e8bc33f ("intel/compiler: Avoid copy propagating large registers into EOT messages")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36577>
Setting variable names currently always uses ralloc, but the new
nir_variable_* helpers will mostly eliminate ralloc/malloc in a later
commit.
This just updates all places that touch nir_variable names to use the new
helpers.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
Despite the information in "Overview of Memory Access" (57046), the L3
seems to be smarter on Xe2+. See 4aa3b2d3ad ("anv: LNL+ doesn't need
the special flush for sparse").
The behavior is the same both with vm_bind and TR-TT.
v2: Add some comments (Caio).
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
The comment was pasted from the commit message that added it. Remove
the parts that only make sense in the commit message, not in the final
code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
The residencyNonResidentStrict property requires that writes to
unbound memory be ignored and reads return zero. We need this
property, otherwise vkd3d will claim we don't support DX12.
If a shader writes to a variable associated with an unbound memory
region (i.e., mapped to a null tile), reads it back (in the same
shader) and expects the value be 0 instead of what is wrote, it has to
use the 'volatile' access qualifier to the variable associated with
the access, otherwise the compiler will be allowed to optmize things
and use the non-zero value. This is explained in the "Accessing
Unbound Regions" section of the Vulkan spec.
Our hardware adds an extra problem on top of the above. BSpec page
"Overview of Memory Access" (47630, 57046) says:
"If a read from a Null tile gets a cache-hit in a
virtually-addressed GPU cache, then the read may not return
zeroes."
So, when we detect this type of access, we have to turn off the
caching.
There's a proposed Vulkan CTS test that does exactly the above.
No shaders on shader_db seem to be using 'volatile'.
v2:
- Reorder commit order
- Rewrite commit message
v3:
- Rework the patch after Caio pointed out the interaction with
'coherent'.
- Remove previous R-B tags due to the patch differences.
v4:
- Rework the patch and commit message again after further
discussions.
v5:
- Check for atomic first so we don't regress DG2 atomic tests.
Fixes future test: dEQP-VK.sparse_resources.buffer.ssbo.read_write.sparse_residency_non_resident_strict
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
The GLSL spec says (among other things):
"When a volatile variable is read, its value must be re-fetched from
the underlying memory, even if the shader invocation performing the
read had previously fetched its value from the same memory. When a
volatile variable is written, its value must be written to the
underlying memory, even if the compiler can conclusively determine
that its value will be overwritten by a subsequent write."
The SPIR-V spec says (among other things):
"Accesses to volatile memory cannot be eliminated, duplicated, or
combined with other accesses."
So in this commit we make sure that both writes and reads marked as
volatile can't be affected by CSE.
v2: Reorder patches in the series.
Credits-to: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Iván Briano <ivan.briano@intel.com> (v1)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
We seem to be ignoring the 'volatile' keyword coming from the shaders.
Record this in MEMORY_LOGICAL_FLAGS so we can use it later.
Credits-to: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.
We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.
Via Coccinelle patch:
@@
expression a, b;
@@
-nir_def_rewrite_uses_after(a, b, b->parent_instr)
+nir_def_rewrite_uses_after_def(a, b)
Followed by a bunch of sed.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
NIR is going to use exec_node/list without the C++ code, and may switch to
a different linked list implementation in the future.
GLSL is going to use ir_exec_node/list, which we want to keep private
for GLSL, so that we can change it easily.
Thus, it's better to fork the C++ version of list.h for Intel.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36425>