Intrinsic to get the SIMD width, which not always the same as subgroup
size. Starting with a small scope (Intel), but we can rename it later
to generalize if this turns out useful for other drivers.
Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of
a width will be passed as argument. The pass also used to optimized
load_subgroup_id for the case that the workgroup fitted into a single
thread (it will be constant zero). This optimization moved together
with lowering of the SIMD.
This is a preparation for letting the drivers call it before the
brw_compile_cs() step.
No shader-db changes in BDW, SKL, ICL and TGL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
We can remove a bunch of conditional code at key comparison time by
computing a bitmask of used key bits at ir3_shader creation time. This
also gives us a nice place to put additional key simplification to reduce
how many variants we create (like skipping rastflat if we don't read
colors in the FS, or skipping vclamp_color if we don't write colors).
It does mean walking the whole key to AND it, but the key is just 28 bytes
so far so that seems pretty fine.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
This commit introduces a new function nir_assign_linked_io_var_locations
which is intended to help with assigning driver locations to shaders
during linking, primarily aimed at the VS->TCS->TES->GS stages.
It ensures that the linked shaders have the same driver locations,
and it also packs these as close to each other as possible.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
We can't remove volatile writes and we can't combine them with other
volatile writes so all we can do is clear the unused bits.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
For deref_store, we can still delete invalid stores that write to
statically OOB data. For everything, we need to make sure that we kill
aliases of destinations even if it's volatile.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
With the mask value 0x80000000, the other operand must be 32-bit. This
fixes failures in
dEQP-VK.subgroups.ballot_mask.ext_shader_subgroup_ballot.*.gl_subgroupgemaskarb_*
tests from Vulkan 1.2.2 CTS.
Checking one of the tests, it appears that the tests are doing 64-bit
iand with 0x0000000080000000, then comparing the result with zero.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2834
Fixes: 88eb8f190b ("nir/algebraic: Simplify logic to detect sign of an integer")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4770>
The new option replaces the two other _split lowering options, since
there's no need for separate options.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>
I spent over an hour trying to debug a problem if a condition on a
variable not being applied. The problem turned out to be
"a(is_not_negative" instead of "a(is_not_negative)". This commit would
have detected that problem and failed to build.
v2: Just add $ to the end of the existing regex, and it will fail to
match a malformed string. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4720>
This commit completely rewrites the way we extract a structured CFG from
SPIR-V. The new approach is different in a few ways:
1. It does a breadth-first search instead of depth-first. This means
that we've visited the merge node for a construct before we visit
any of the nodes inside the construct. This makes it easier to
validate things like loop and switch nesting.
2. We record more information in the CFG. Earlier commits added a
parent pointer to vtn_cf_node but we now record all of the merge and
other special blocks for each CFG node. This lets us validate
things more precisely.
3. It makes heavy use of merge blocks for walking the CFG. Previously,
we sort of used them as hints for trying to guess the CFG structure
but things got dicey whenever a merge was missing. We had some
heuristics for how to handle short-circuiting if statements but it
was a bunch of special cases.
Now, we make them a fundamental part of walking the CFG. When we
encounter a control-flow construct, we add the body components of
the construct to the BFS work list and then jump to the merge block
if one exists to continue scanning the current CFG nesting level.
If no merge block exists, we assume that means that control-flow
never re-converges in a normal way and that the only way to get back
to normality is with a direct jump such as a loop break or continue.
This should make things far more robust when trying to deal with the
more creative placement (or lack thereof) of merge instructions.
Reviewed-by: Alan Baker <alanbaker@google.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3820>
Closes: #2760
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4446>
Thanks to VK_EXT_subgroup_size_control, we can end up with
gl_SubgroupSize being as low as 8 on Intel.
Fixes: d10de25309 "anv: Implement VK_EXT_subgroup_size_control"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4694>
When we originally wrote spirv_to_nir we didn't have a good scalar value
union to handily use so we rolled our own thing for spec constants. Now
that we have nir_const_value, we can use that and simplify a bunch of
the spec constant logic.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4675>
We were accidentally asserting that the value had to be a vtn_ssa_value
which isn't true if it, for instance, comes from a spec constant.
Fixes: fb282a68bc "spirv: Implement OpConvertPtrToU and OpConvertUToPtr"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4675>
This fixes find_and_update_named_uniform_storage() for subroutines
and also updates num_shader_uniform_components for non opaque
uniforms.
The following patch will ensure this type of bug won't happen again.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4721>
This fixes things when the last input/output is a struct or matrix.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4670>
We have shader->num_inputs for "last used input + 1" already, which
respects struct/matrix varyings.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4670>
So far only the singed versions are defined.
v2: Make umad24 and umul24 non-driver specific (Eric Anholt)
v3: Take care of nir_builder and automatic lowering of the
opcodes if they are not supported by the backend.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>
nir_opt_load_store_vectorize has an is_strided_vector() function that
looks for types with weird explicit strides. It does so by comparing
the explicit stride against the type-size-derived typical stride.
This had a subtle bug. Simple vector types (vec2/3/4) have no explicit
stride, so glsl_get_explicit_stride() returns 0. This never matches the
typical stride for a vector, so is_strided_vector() would return true
for basically any vector type, causing the vectorizer to bail.
I found this by looking at a compute shader with scalar SSBO loads at
offsets 0x220, 0x224, 0x228, 0x22c. nir_opt_load_store_vectorize would
properly vectorize the first two into a vec2 load, but would refuse to
extend it to a vec3 and ultimately vec4 load because is_strided_vector()
saw a vec2 and freaked out.
Neither ACO nor ANV do load/store vectorization before lowering derefs,
so this shouldn't affect them. However, I'd like to fix this bug to
avoid the trap for anyone who decides to in the future. In a branch
where anv used this lowering, this cut an additional 38% of the send
messages in the shader by properly vectorizing more things.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4255>
In SPIRV of compute shader in Aztec Ruins benchmark there is:
OpControlBarrier %uint_1 %uint_1 %uint_0
// ControlBarrier(Device, Device, rdcspv::MemorySemantics(0));
which is an incorrect translation of glsl barrier().
GLSLang, prior to c3f1cdfa, emitted the OpControlBarrier with
Device instead of Workgroup for execution scope.
2365520c covers similar case but isn't applied when execution_scope
is SpvScopeDevice.
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2742
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4660>
This moves the fi_types to a new mesa_private.h and removes the
imports.c file. The vast majority of this patch is just removing
pound includes of imports.h and fixing up the recursive includes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
As of the previous commit, they are never used.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
Version 4.4 of the GLSL spec changed the definition of noise*() to
always return zero and earlier versions of the spec allowed zero as a
valid implementation.
All drivers, as far as I can tell, unconditionally call lower_noise()
today which turns ir_unop_noise into zero. We've got a 10-year-old
comment in there saying "In the future, ir_unop_noise may be replaced by
a call to a function that implements noise." Well, it's the future now
and we've not yet gotten around to that. In the mean time, the GLSL
spec has made doing so illegal.
To make things worse, we then pretend to handle the opcode in
glsl_to_nir, ir_to_mesa, and st_glsl_to_tgsi even though it should never
get there given the lowering. The lowering in st_glsl_to_tgsi defines
noise*() to be 0.5 which is an illegal implementation of the noise
functions according to pre-4.4 specs. We also have opcodes for this in
NIR which are never used because, again, we always call lower_noise().
Let's just kill the whole opcode and make builtin_builder.cpp build a
bunch of functions that just return zero.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: D Scott Phillips <d.scott.phillips@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
Copied from anv, replaced state with passing model/range directly.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: D Scott Phillips <d.scott.phillips@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
If we have a clip dist float[1] compact followed by a tess factor
float[2] we don't want to overlap them, but the partial check
only happens for non-compact vars.
This fixes some issues seen with my sw vulkan layer with
dEQP-VK.clipping.user_defined.clip_distance*
v2: v1 failed with clip/cull mixtures, since in that
case the cull has a location_frac to follow after the clip
so only reset if we get a location_frac of 0 in a subsequent
clip var
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4635>