There is native support for D3D-style untyped UAVs, which are an unsized
array of "records."
This will be needed for acceleration structures, because normal SSBO
descriptors aren't large enough to cover all the 128-byte instance
descriptors for the maximum number of instances (2**24).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28447>
this is required for vtn_bindgen2 where we don't know the buffer size until
the driver-specific code paths, but we need to lower printf (to hash format
strings) in common code. so defer the buffer size decision to an intrinsic.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>
This is needed for noperspective lowering, where we need to multiply the
varying value by gl_FragCoord.w at the same barycentric as the varying.
Normal nir_load_frag_coord_zw instructions are lowered to the new
intrinsic on bifrost with the pan_lower_frag_coord_zw pass.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32127>
load_attribute_pan is a panfrost-specific intrinsic for loading
vertex attributes. Takes explicit vertex and instance IDs which
we need in order to implement vertex attribute divisor with
non-zero base instance on v9+.
Passes which are used by panvk are modified to be aware of
load_attribute_pan.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32039>
This is needed for implementing multiview in panvk, where the address
calculation for multiview outputs is not well-represented by lowering to
nir_intrinsic_store_output with a single offset.
The case where a variable is both per-view and per-{vertex,primitive} is
now unsupported. This would come up with drivers implementing
NV_mesh_shader or using nir_lower_multiview on geometry, tessellation,
or mesh shaders. No drivers currently do either of these. There was some
code that attempted to handle the nested per-view case by unwrapping
per-view/arrayed types twice, but it's unclear to what extent this
actually worked.
ANV and Turnip both rely on per-view outputs being assigned a unique
driver location for each view, so I've added on option to configure that
behavior rather than removing it.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
Like read_first_invocation but using getlast. Note that I intentionally
used the name of the ir3 instruction in the name as its semantics are
tricky to exactly describe otherwise.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
The semantics of this newly introduced system value match
Vulkan's InstanceIndex exactly, and are equivalent to
instance_id + base_instance.
Some hardware, such as Mali Valhall or later, only provides
instance id offset by base_instance. Introducing a new system
value to represent this, rather than handling the mismatch
when lowering to BIR lets us use NIR to eliminate redundant
arithmetic that would follow from mismatched semantics, e.g.
instance_id could be lowered to instance_index - base_instance,
so expressions such as instance_id + base_instance would be
optimized to a simple instance_index.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32158>
Since we check for loop-invariance, we don't have to unconditionally
flag LCSSA phis as divergent in presence of divergent breaks.
This ensures consistency, with or without LCSSA form.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>
This intrinsic was initially dedicated to mesh/task shaders, but the
mechanism it exposes also exists in the compute shaders on Gfx12.5+.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
These are like shuffle_{xor,up,down} except they expect a dynamically
uniform index. This is necessary since the ir3 shfl instruction does not
work with a divergent index.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31501>
If the backend does not implement this too, or some other future transform
modifiess the phi so that this isn't the case (replace the phi with a
bcsel or replace undef with zero), then it will not actually be uniform.
This keeps it enabled to some degree for RADV/ACO.
fossil-db (navi31):
Totals from 76 (0.10% of 79395) affected shaders:
Instrs: 195008 -> 195282 (+0.14%)
CodeSize: 1012592 -> 1015884 (+0.33%)
Latency: 3892826 -> 3898843 (+0.15%); split: -0.00%, +0.15%
InvThroughput: 460681 -> 460964 (+0.06%)
Copies: 13508 -> 13516 (+0.06%)
Branches: 5244 -> 5412 (+3.20%)
PreVGPRs: 5092 -> 5096 (+0.08%)
VALU: 116177 -> 116197 (+0.02%)
SALU: 23449 -> 23785 (+1.43%)
fossil-db (navi21):
Totals from 76 (0.10% of 79395) affected shaders:
Instrs: 164471 -> 164981 (+0.31%)
CodeSize: 883988 -> 888420 (+0.50%)
Latency: 4074287 -> 4082043 (+0.19%)
InvThroughput: 783783 -> 784276 (+0.06%); split: -0.00%, +0.06%
Branches: 5262 -> 5430 (+3.19%)
PreVGPRs: 5100 -> 5104 (+0.08%)
VALU: 116375 -> 116381 (+0.01%)
SALU: 23589 -> 23925 (+1.42%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30211>
When the non_uniform flag is not set, the result is never divergent.
v2: Remove redundant assignment to is_divergent. Suggested by Caio.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
This is needed for cl_khr_mipmap_image, specifically the OpenCL C
function get_image_num_mip_levels.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30834>
Adds a new instruction type that stores metadata that might be useful
for debugging purposes. Passes must ignore these instructions when
making decisions.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18903>
load_global_constant_uniform_block_intel is equivalent in terms of
loading, then for the predicate we just do a bcsel afterward in places
where that is required.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>
bunch of vendor intrinsics, plus some standard intrinsics used in weird shader
stages.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>
For preambles, we don't actually care which invocation we get, so we
don't have to enable helper invocations when the preamble uses "getone."
Introduce a new intrinsic with the right semantics and plumb it through.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29914>
The ldc_nv and ldcx_nv intrinsics correspond to the index and bindless
forms of NVIDIA's LDC instruction, respectively. ldc_nv is pretty much
load_ubo without some of the unnecessary constant bits while ldcx_nv
takes a 64-bit bindless handle instead of an index. The other two give
us a little control over register allocation at the NIR level to ensure
that LDCX handles are placed in uniform registers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
This will allow a driver to use a single table of printf strings
across all shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
As this is intended to be used also by VC4, change the suffix to
something more convenient, like tlb_color_brcm.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29119>