The maximum number of descriptor sets is indeed 32 but without
the sign bit.
The maximum number of bindings for RADV is way larger, keep it
as 32-bit.
Fixes: 96e6ef80d9 ("nir: pack the rest of nir_variable::data")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Add a NIR instrinsic that represent a memory barrier in SPIR-V /
Vulkan Memory Model, with extra attributes that describe the barrier:
- Ordering: whether is an Acquire or Release;
- "Cache control": availability ("ensure this gets written in the memory")
and visibility ("ensure my cache is up to date when I'm reading");
- Variable modes: which memory types this barrier applies to;
- Scope: how far this barrier applies.
Note that unlike in SPIR-V, the "Storage Semantics" and the "Memory
Semantics" are split into two different attributes so we can use
variable modes for the former.
NIR passes that took barriers in consideration were also changed
- nir_opt_copy_prop_vars: clean up the values for the mode of an
ACQUIRE barrier. Copy propagation effect is to "pull up a load" (by
not performing it), which is what ACQUIRE restricts.
- nir_opt_dead_write_vars and nir_opt_combine_writes: clean up the
pending writes for the modes of an RELEASE barrier. Dead writes
effect is to "push down a store", which is what RELEASE restricts.
- nir_opt_access: treat the ACQUIRE and RELEASE as a full barrier for
the modes. This is conservative, but since this is a GL-specific
pass, doesn't make a difference for now.
v2: Fix the scoped barrier handling in copy propagation. (Jason)
Add scoped barrier handling to nir_opt_access and
nir_opt_combine_writes. (Rhys)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Otherwise, if the base type is (for example) uint32, we would
incorrectly think that PoT optimizations could not apply.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstsrand <jason@jleksrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This is like nir_texop_tex, but signals that the sampling coordinates
are immutable during the shader stage, in a way that allows the HW
that supports pre-dispatching sampling operations to pre-fetch
the result prior to scheduling the shader stage.
This is introduced to support the feature in Freedreno. Adreno HW
from a4xx supports it.
A NIR pass introduced later in this series will detect sampling
operations that are eligible for pre-dispatch, and replace
nir_texop_tex by this new op, to tell the backend to enable
pre-fetch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
This allows us to make sure clipdist is emitted as a scalar array rather
than two vec4s. This matches SPIR-V semantics, and will be useful for
Zink.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2:
- Added more functions.
v3:
- Simplify most of the functions (Caio).
v4:
- Updated to renamed enum values (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.
This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
There are no remaining users in-tree.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass expects the shader to be in LCSSA form.
The algorithm is based on 'The Simple Divergence Analysis' from
Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira.
Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
ACO depends on LCSSA phis for divergent booleans to work correctly.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The OpenGL and OpenGL ES specs require that implementations clamp the
value of gl_PointSize to an implementation-depedent range. This pass
is useful for any GPU hardware that doesn't do this automatically
for either one or both sides of the range, such as V3D.
v2:
- Turn into a generic NIR pass (Eric).
- Make the pass work before lower I/O so we can use the deref variable
to inspect if we are writing to gl_PointSize (Eric).
- Make the pass take the range to clamp as parameter and allow it
to clamp to both sides of the range or just one side.
- Make the pass report progress.
v3:
- Fix copyright header (Eric)
- use fmin/fmax instead of bcsel to clamp (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).
v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: use glsl_type_size_align_func
v2: move get_explicit_type() to glsl_types.cpp/nir_types.cpp
v2: use align() instead of util_align_npot()
v2: pack arrays a bit tighter
v2: rename mem_* to field_*
v2: don't attempt to handle when struct offsets are already set
v2: use column_type() instead of recreating it
v2: use a branch instead of |= in nir_lower_to_explicit_impl()
v2: assign locations to variables and update shared_size and num_shared
v2: allow the pass to be used with nir_var_{shader_temp,function_temp}
v4: rebase
v5: add TODO
v5: small formatting changes
v5: remove incorrect assert in get_explicit_type()
v5: rename to nir_lower_vars_to_explicit_types
v5: correctly update progress when only variables are updated
v5: rename get_explicit_type() to get_explicit_shared_type()
v5: add comment explaining how get_explicit_shared_type() is different
v5: update cast strides
v6: update progress when lowering nir_var_function_temp variables
v6: formatting changes
v6: add more detailed documentation comment for get_explicit_shared_type
v6: rename get_explicit_shared_type to get_explicit_type_for_size_align
v7: fix comment in nir_lower_vars_to_explicit_types_impl()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These were left after a rebase and happen to make
NIR_INTRINSIC_SWIZZLE_MASK == NIR_INTRINSIC_SRC_ACCESS, which is how it
was noticed.
Fixes: 6f20643b47 ("nir: Allow qualifiers on copy_deref and image instructions")
Cc: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Optimizations that insert bitshift or bitwise operations should not be
applied on GPUs that don't support integer operations.
The .lower_bitshift could be used to control the bitshift related ones,
but there was also one bitwise optimization uncovered.
Since only lima and freedreno use this option and the use case is that
no bit operations are wanted, let's rename it to .lower_bitops and use
it to control all bitops related optimizations.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can
be used to optimize fdot lowering. fsum2 is not implemented and can be
further lowered to an add with the vector components.
Currently lima ppir handles this lowering internally, however this
happens in a very late stage and requires a big chunk of code compared
to a nir_opt_algebraic lowering.
By having fsum in nir, we can reduce ppir complexity and enable the
lowered ops to be part of other nir optimizations in the optimization
loop.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit rewrites opt_find_array_copies to be able to handle
an array copy sequence with other intervening operations in between. In
particular, this handles the case where we OpLoad an array of structs
and then OpStore it, which generates code like:
foo[0].a = bar[0].a
foo[0].b = bar[0].b
foo[1].a = bar[1].a
foo[1].b = bar[1].b
...
that wasn't recognized by the previous pass.
In order to correctly handle copying arrays of arrays, and in particular
to correctly handle copies involving wildcards, we need to use a tree
structure similar to lower_vars_to_ssa so that we can walk all the
partial array copies invalidated by a particular write, including
ones where one of the common indices is a wildcard. I actually think
that when factoring in the needed hashing/comparing code, a hash table
based approach wouldn't be a lot smaller anyways.
All of the changes come from tessellation control shaders in Strange
Brigade, where we're able to remove the DXVK-inserted copy at the
beginning of the shader. These are the result for radv:
Totals from affected shaders:
SGPRS: 4576 -> 4576 (0.00 %)
VGPRS: 13784 -> 5560 (-59.66 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
Code Size: 329940 -> 263268 (-20.21 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 330 -> 898 (172.12 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal
for vec4 backends that doesn't have any special instructions for it, as
long as they support saturate.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For backends that don't have a 'fdph' instructions
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This version has less ops for the same precision.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in
NIR, but doesn't generate a new txf_ms instructions as replacement. It's
pretty easy to allow that in nir_shader_lower_instructions, and it may be
common in lowering passes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>