This came up while reviewing
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398 ... Possibly
this intrinsic should be renamed to load_smem_constant_amd for consistency with
load_global_constant. But if we're not going to convey constantness in the
intrinsic name, let's at least document the restriction, because NIR's optimizer
relies on it.
(I didn't inspect every call site, but it looks like load_smem_amd is just used
for descriptor loads so there's no bug to fix.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29743>
The semantics of discard differ between GLSL and HLSL and
their various implementations. Subsequently, numerous application
bugs occurred and SPV_EXT_demote_to_helper_invocation was written
in order to clarify the behavior. In NIR, we now have 3 different
intrinsics for 2 things, and while demote and terminate have clear
semantics, discard still doesn't and can mean either of the two.
This patch entirely removes nir_intrinsic_discard and
nir_intrinsic_discard_if and replaces all occurences either with
nir_intrinsic_terminate{_if} or nir_intrinsic_demote{_if} in the
case that the NIR option 'discard_is_demote' is being set.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
Add the BASE index to the load/store_ssbo_ir3 intrinsic to store an
immediate offset. This offset is encoded in the corresponding fields of
isam.v/ldib.b/stib.b.
One extra optimization is implemented: whenever the regular offset is
also a constant, the total offset (regular plus immediate) is aligned
down to a multiple of the max immediate offset and this is used as the
regular offset while the immediate is set to the remainder. This ensures
that the register used for the regular offset can often be reused among
multiple contiguous accesses.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28664>
The ldc_nv and ldcx_nv intrinsics correspond to the index and bindless
forms of NVIDIA's LDC instruction, respectively. ldc_nv is pretty much
load_ubo without some of the unnecessary constant bits while ldcx_nv
takes a 64-bit bindless handle instead of an index. The other two give
us a little control over register allocation at the NIR level to ensure
that LDCX handles are placed in uniform registers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
We'll use the delta for an upcoming internal printf mechanism, where
the PARAM_IDX will be the base printf reloc identifier and the BASE
will be the string id.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
This will allow a driver to use a single table of printf strings
across all shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
It can't be reordered globally, since its value is control-flow dependent.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
There seems to be a hardware issue where fragment shaders with side effects get
skipped if depth testing with NEVER. Add a workaround for this case where we
discard programmatically instead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
needed for correct flatshading with fans, without falling back on software input
assembly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
As this is intended to be used also by VC4, change the suffix to
something more convenient, like tlb_color_brcm.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29119>
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
These don't really do anything. They're just a source and user of SSA
defs.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28301>
I'm not measuring a significant perf difference in
-bshading:shading=phong:model=bunny -bideas -brefract so this seems Good Enough
For Me.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
unfortunately, shader stage merging is bogus when coherent images are used, so
we need an unmerged path. i'd rather not maintain two paths, so let's just
stop merging. as a bonus this makes ESO a lot easier, and lets us reuse the same
VS for both VS->GS and VS->TCS.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>
It was by pure luck that all sources (and the result) of nir_dpas_intel
had the same number of components. It is possible to support matrix
sizes where the accumlator matrix and the result matrix are larger
(e.g., 16x8 * 8x16 = 16x16).
This breaks all of the assumptions of NIR's infrastructure for code
generating intrinsics. Fix the by making the accumulator matrix be the
first source. The accumulator and the result will always have the same
dimensions (due to rules of matrix multiplication) and the same type
(due to restructions of the cooperative matrix extension). This forces
them to have the same number of components.
This doesn't fix all the potential problems. NIR expects that all
0-sized sources will have the same number of components. This just
ensures that the result has the correct number of components.
Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
These will be needed to implement some tessellation dynamic
states within the TCS as opposed to using an epilog.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28408>
This is just a little handle to tell the back-end where to do the copy.
Ideally, we'd have a NIR intrinsic that does the copy but we need to be
able to copy any number of registers up to 34 and NIR intrinsics just
aren't that flexible.
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28300>
Instead of the previous fragile attempt to handle sparse_resident_and
by crawling deref chains, we now insert an is_sparse_resident_zink
intrinsic immediately after the tex or sparse_load intrinsic and define
Zink's sparse resident codes to always be 0/1. Then sparse_resident_and
becomes iand and is_sparse_texels_resident becomes != 0 and everything
is well-defined and robust.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28123>
This intrinsic helps to read the W coordinate stored in the QPU register
when initializing the input data for the fragment shaders.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>