This involves determining the variables referenced by intrinsics, setting and
using the access qualifier correctly and considering that images and buffers
can alias.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6483>
ACCESS_NON_WRITEABLE means that the memory is read-only, not the variable,
so we don't have to check for aliasing.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6483>
ACCESS_NON_WRITEABLE means that the memory is read-only, not the variable,
so we have to check for aliasing first.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6483>
It shouldn't matter whether an access/variable is coherent or not, just
that it's not written. The coherent qualifier doesn't mean anything with
read-only memory.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6483>
If ACCESS_NON_UNIFORM is not specified, we can assume the resource is
uniform. This requires nir_lower_non_uniform_access to remove that flag.
A few Detroit: Become Human shaders use a index sourced from a fragment
input which is expected to be uniform.
shader-db (Navi):
Totals from 8 (0.01% of 127638) affected shaders:
SGPRs: 224 -> 384 (+71.43%)
VGPRs: 208 -> 112 (-46.15%)
CodeSize: 5360 -> 5344 (-0.30%); split: -1.49%, +1.19%
Instrs: 1036 -> 1028 (-0.77%); split: -1.93%, +1.16%
VMEM: 1320 -> 608 (-53.94%)
SMEM: 384 -> 336 (-12.50%); split: +14.58%, -27.08%
VClause: 24 -> 16 (-33.33%)
SClause: 48 -> 56 (+16.67%)
PreSGPRs: 124 -> 216 (+74.19%)
PreVGPRs: 168 -> 88 (-47.62%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5201>
In particular, if we have an index or bindless handle we were passing
the original handle which, technically, is uniform within the context of
the if. However, we can save the back-end compiler some effort if we
pass it the result of the read_first_invocation().
(Rebased by Kenneth Graunke and Rhys Perry.)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7592>
In theory, I don't think this is a functional change. We should
generate the same code before and after.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7592>
This introduces a new flag in shader_info to know if a fragment
shader uses sample shading, even if there is no inputs.
During NIR linking, constants varyings are optimized and the
per-sample interpolation info (ie. the sample qualifier) might
be removed if nir_shader_gather_info() is called again.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7876>
As we now reuse the enums to remain within 64 values, we need to get
the proper name using the stage.
v2: Use enum type for parameter (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7795>
v2: Fixup comment about bits in nir_intrinsics.py
v3: Use varying for primitive shading rate builtin (samuel)
v4: Reoder switch alphabetically
Make divergence of frag_shading_rate an option
v5: Remove stage check for frag_shading_rate in divergence (Samuel)
v6: s/frag_shading_rate_per_subgroup/single_frag_shading_rate_per_subgroup/ (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7795>
In commit 00b28a50b2, Marek extended
a number of optimizations that had been 32-bit specific to work on
other bit-sizes.
Most optimizations preserve the data type across the transformation.
In other words, an optimization which generates e.g. fp64 operations
only does so when the source expression also contains fp64 operations.
These transformations are fine with respect to lowering, because we
will lower away all expressions that would trigger the search portion
of the expression, and so we'd never apply those rules.
However, a few of the rules create new operations that run afoul of
lowering passes. For example,
('bcsel', a, 1.0, 0.0) => ('b2f', a)
where the result is a double would simply be a selection between two
different 64-bit constants. The replacement expression, on the other
hand, involves a nir_op_b2f64 ALU operation. If we're run after
nir_lower_doubles, then it may not be legal to generate such an
expression anymore (at least without running lowering again, which we
don't do today).
Regressions due to this are blocking the 20.3 release, so for now, we
take the easy route and simply disallow those few rules when doing full
softfp64 lowering, which fixes the immediate problem. But it doesn't
solve the long-term problem in an extensible manner.
In the future, we may want to add a `lowered_alu_ops` bitfield to the
NIR shader, and as lowering passes are run, mark them as taboo. Then,
we could have each algebraic transformation track which operations it
creates in the replacement expression. With both of those in place,
nir_replace_instr could compare the transformation's list of ALU ops
against `lowered_alu_ops` and implicitly skip rules that generate
forbidden ALU operations.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3504
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7841>
MSVC C++ can't do designated initializers without /std:c++latest. These
helpers will likely be removed soon anyway, so just don't use the
intrinsic builders here.
This should also fix the GCC7 build, which doesn't implement non-trivial
designated initializers.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: c9bcad2573 ("nir: add generated intrinsic builders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7808>
Fix this error:
error C4576: a parenthesized type followed by an initializer list is a non-standard explicit type conversion syntax
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: c9bcad2573 ("nir: add generated intrinsic builders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7808>
Make it consistent with nir_intrinsics.py, the unlabelled indices just
before it and the intrinsic builders.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6587>
If bit_size_src is not -1, then it's the index of the source the
destination bit size can be expected to match. This will be useful for
generating intrinsic builders
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6587>
NIR can't CSE the read_first_invocation intrinsics, so we can end up
creating iand(read_first_invocation(a) == a, read_first_invocation(a) == a)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3813>
txf_ms takes an integer LOD, not a float.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7653>
Otherwise, search_phi_bcsel() will be called with a buf_size that is
slightly lower than it has to be.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7748>
It should only recurse if there's enough space to add the phi sources.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 72ac3f6026 ("nir: add nir_unsigned_upper_bound and nir_addition_might_overflow")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7748>
They're not really "push" anymore but that's because there is no such
thing as push constants in bindless shaders on Intel. They should be
fast enough, though. There is some room for debate here as to whether
we want to do the pull in NIR or push it into the back-end. The
advantage of doing it in the back-end is that it'd be easier to use
MOV_INDIRECT for indirect push constant access rather than falling back
to a dataport message.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
This is a little bit more work than executeCallable() because we also
have to set up the MemRay data structure which the ray traversal
hardware uses to keep its state.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Each callable ray-tracing shader shader stage has to perform a return
operation at the end. In the case of raygen shaders, it retires the
bindless thread because the raygen shader is always the root of the call
tree. In the case of any-hit shaders, the default action is accep the
hit. For callable, miss, and closest-hit shaders, it does a return
operation. The assumption is that the calling shader has placed a
BINDLESS_SHADER_RECORD address for the return in the first QWord of the
callee's scratch space. The return operation simply loads this value
and calls a btd_spawn intrinsic to jump to it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The Intel bindless thread dispatch model is very simple. When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs. These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned. Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke. When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers. The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>