Robust against separable shaders, and still makes sense for lowered I/O drivers,
whereas just counting FS variables and expecting them to match with the VS is...
questionable.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: antonino <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26888>
we can replace varyings with point sprites, we just need to fix up .zw
appropriately. do that with some bcsels, ALU is cheap.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26963>
v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion.
v3: Fix float16 destination DPAS on DG2.
v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.
v5: Rebase on !26323.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
In the wise words of Mike Blumenkrantz, "I hate gl_PointSize and so can you".
The mesa/st lowering won't mesh well with vertex shader epilogues, and it falls
over in various circumstances. I am too tired to go against the grain, so let's
just pretend to be a normal gallium driver and trust in the rasterizer CSO,
lowering point size internally. This properly handles transform feedback without
any hacks, both GL and GLES behaviours, etc.
Fixes:
KHR-GL31.transform_feedback.capture_vertex_separate_test
gl-2.0-large-point-fs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
in prep for tessellation (which will share the IA lowering), and for multidraw
indirect (which greatly complicates IA lowering with geom/tess).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
The NIR intrinsics now take and return a barrier whenever one is
modified instead of modifying in-place. In NAK, we give the internal
instructions the same treatment and convert everything to use barrier
SSA values and RegRefs. In nak_from_nir, we move all barriers to/from
GPRs. We'll clean up the massive pile of OpBMov later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26463>
This gives FS I/O the same treatment as we did for vertex attributes in
that we now have a NIR intrinsic which pretty closely matches the
hardware and we lower to that before going into NAK. This gives us a
bit more control in the NIR.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26181>
We need both: decomposed primitives for transform feedback and regular
primitives for the sizing the index buffer.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
NVIDIA hardware doesn't take a vertex index for per-vertex I/O.
Instead, it takes an offset into the primitive. This has to be fetched
using a combination of SR_INVOCATION_INFO and the ISBERD instruction.
To keep things simple and allow for maximum CSE, we do the lowering in
NIR and patch the load/store_per_vertex_input/output intrinsic.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25000>
New push consts loading consist of:
- Push consts are set for the entire pipeline via HLSQ_SHARED_CONSTS_IMM
array which could fit up to 256b of push consts.
- For each shader stage that uses push consts READ_IMM_SHARED_CONSTS
should be set in HLSQ_*_CNTL, otherwise push consts may get overwritten
by new push consts that are set after the draw.
- Push consts are loaded into consts reg file in a shader preamble via
stsc at the very start of the preamble.
OPC_PUSH_CONSTS_LOAD_MACRO is used instead of directly translating NIR
intrinsic into stsc because: we don't want to teach legalize pass how
to set (ss) between stores and loads of consts reg file, don't want for
stsc to be reordered, etc.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25086>
We'll implement layer ID reads in the frag shader with a varying read, but if
the VS doesn't write the varying we need to return 0 per the spec. Add a sysval
to detect that case so we can handle it at runtime without keys.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This is actually a no-op on AMD, so we really don't want to lower it to
something more complicated. There may be a more efficient way to do
this on Intel too. In addition, in the future we'll want to use this for
lowering boolean reduce operations, where the inverse ballot will
operate on the backend's "natural" ballot type as indicated by
options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In
total, there are now three possible lowerings we may have to perform:
- inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot
with natural source type, when the backend supports inverse_ballot
natively.
- inverse_ballot with source type of uvec4 from SPIR-V to arithmetic,
when the backend doesn't support inverse_ballot.
- inverse_ballot with natural source type from reduce operation, when
the backend doesn't support inverse_ballot.
Previously we just did the second lowering unconditionally in vtn, but
it's just a combination of the first and third. We add support here for
the first and third lowerings in nir_lower_subgroups, instead of simply
moving the second lowering, to avoid unnecessary churn.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
In Vulkan, UBOs are lowered by nir_lower_explicit_io, and the ubo_base_agx
sysval is unused (since it doesn't handle descriptor sets). That makes the UBO
lowering GL-only and hence belongs with the GL driver rather than the compiler.
This lets us delete the ubo_base_agx sysval.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24847>
Doing a descriptor crawl with binding tables requires a real binding table in
the shader, which won't work for VK or merged shader stages in GL. Instead,
let's lower anything that needs a crawl to bindless in the driver, so the
compiler code doesn't need to know anything about descriptor binding models.
That gets rid of the texture_base_agx sysval, which is problematic when there
are multiple descriptor sets worth of textures.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24847>
For merging shader stages, it will be useful to express a load from an explicit
GL "descriptor set", so we can represent things like UBO loads with merged
shaders where UBOs can come from either stage. To do so, we add an intrinsic
representing a load from the driver's uniform tables, indexed like "descriptor
sets" with "bindings".
In principle, a layered GL-on-Vulkan implementation would use literal descriptor
sets for each stage, so I feel comfortable with the analogy here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24847>
There are two new variable modes:
- nir_var_mem_node_payload
- nir_var_mem_node_payload_in
Also add a few more intrinsics and some shader info.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24512>
Push costants in Zink are not flat indexed like in vulkan drivers which
makes the `nir_intrinsic_load_push_constant` intrinsic inappropiate.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24401>
sed + ninja clang-format + fix up spacing for common code.
If you are unhappy that I did not manually change the whitespace of your driver,
you need to enable clang-format for it so the formatting would happen
automatically.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24428>
Non of the know etnaviv GPUs support this feature in hardware
and the binary blob provides sizes via uniforms too.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24217>
This intrinsic (vec2 tess_coord) is generally useful for non-r600 backends.
Promote it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24159>
Read-after-write hazards require special handling on AGX, since image loads are
implemented with texturing. Add intrinsics to handle these hazards.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24148>
Note the writemask handling is chosen for consistency with the rest of NIR. In
every other instance, writemask=w requires a vec4 source. This is hardcoded into
nir_validate and nir_print as what it means to have a writemask.
More importantly, consistency with how register writemasks currently work.
nir_print hides it, but r0.w = fneg ssa_1.x is actually a vec4 instruction with
source ssa_1.xxxx. As a silly example nir_dest_num_components(that) = 4 in the
old model. I realize this is quite strange coming from a scalar ISA, but it's
perfectly natural for the class of vec4 hardware for which this was designed. In
that hardware, conceptually all instructions are vec4`, so the sequence "fneg
ssa_1 and write to channel w" is implemented as "fneg a vec4 with ssa_1.x in the
last component and write that vec4 out but mask to write only the w channel".
Isn't this inefficient? It can be. To save power, Midgard has scalar ALUs in
addition to vec4 ALUs. Those details are confined to the backend VLIW scheduler;
the instruction selection is still done as vec4. This mechanism has little in
common with AMD's SALUs. Midgard has a wave size of 1, with special hacks for
derivatives.
As a result, all backends consuming register writemasks are expecting this
pattern of code. Changing the store to take a vec1 instead of a vec4 would
require changing every backend to reswizzle the sources to resurrect the vec4. I
started typing a branch to do this yesterday, but it made a mess of both Midgard
and nir-to-tgsi. Without any good reason to think it'd actually help
performance, I abandoned the idea. Getting all 15 backends converted to the
helpers is enough of a challenge without forcing 10 backends to reswizzle their
sources too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>