Commit graph

456 commits

Author SHA1 Message Date
Alyssa Rosenzweig
c6c8262ce1 asahi: implement pipeline stats as a checkbox
real impl is blocked on uapi to plumb thru hw perf counters.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
2024-02-14 21:02:30 +00:00
Asahi Lina
b89da92a5e agx: compiler: Add fence_helper_exit_agx barrier
This is used by the helper program on exit.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
2024-02-14 21:02:29 +00:00
Asahi Lina
b07dbf7b0f nir: Add AGX-specific helper opcodes
These opcodes are used by the helper program to fetch the current
operation info and core ID.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
2024-02-14 21:02:29 +00:00
Alyssa Rosenzweig
311070f7af nir: add active_subgroup_invocation_agx sysval
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
2024-02-14 21:02:29 +00:00
Alyssa Rosenzweig
5dc0f5ccba asahi: implement VBO robustness
GL semantics. GLES (weaker) and VK (stronger) semantics are left as a todo, with
explanations given. Enabled always to deal with null VBOs, this should be
optimized once we have soft fault.

This necessitates a rework of VBO keys, but hopefully for the best.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
2024-02-14 21:02:29 +00:00
Alyssa Rosenzweig
9753cd44f7 asahi: Implement skeleton for tessellation
This implements a rough skeleton of what's needed for tessellation. It contains
the relevant lowerings to merge the VS and TCS, running them as a compute
kernel, and to lower the TES to a new VS (possibly merged in with a subsequent
GS). This is sufficient for both standalone tessellation and tess + geom/xfb
together. It does not yet contain a GPU accellerated tessellator, simply falling
back to the CPU for that for now. Nevertheless the data structures are
engineered with that end goal in mind, in particular to be able to tessellate
all patches in parallel without needing any prefix sums etc (using simple
watermark allocation for the heap).

Work on fleshing out the skeleton continues in parallel. For now, this does pass
the tests and lets the harder stuff get regression tested more easily. And
merging early will ease rebase.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
2024-02-14 21:02:28 +00:00
Alyssa Rosenzweig
2d37d1b704 asahi: lower poly stipple
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
2024-02-14 21:02:28 +00:00
Connor Abbott
6a744ddebc ir3: Initial support for pushing globals with ldg.k
Add a separate pass which uses the analyze_ubo_ranges machinery to
construct ranges of readonly globals accessed in the shader and push
them to constants in the preamble, using ldg.k if possible. This is
enough to handle inline uniforms in turnip but also provides a base for
OpenCL, although the pass would need further work for that.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Connor Abbott
45c71803f9 tu: Add more info to ldg inline uniform path
This will let us push the ldg into the preamble.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Job Noorman
60413e11c2 ir3: optimize subgroup operations using brcst.active
Follow the blob and optimize subgroup operation using brcst.active and
getlast when supported.

The transformation consists of two parts. First, a NIR transform
replaces subgroup operations with a sequence of new brcst_active_ir3
intrinsics followed by a new [type]_clusters_ir3 intrinsic (where type
can be reduce, inclusive_scan, or exclusive_scan).

The brcst_active_ir3 intrinsic is lowered directly to a brcst.active
instruction. The other intrinsics get lowered to a new macro
(OPC_SCAN_CLUSTERS_MACRO) which later gets emitted as a loop (using
getlast/getone) that iterates all clusters and produces the requested
scan result.

OPC_SCAN_CLUSTERS_MACRO has a number of optional arguments. First, since
the exclusive scan result is not a natural by-product of the loop but
has to be calculated explicitly, its destination is optional. This is
necessary since adding it unconditionally will produce unused
instructions that won't be DCE'd anymore at this point. Second, when
performing 32b MUL_U reductions (that expand to multiple instructions),
an extra scratch register is necessary.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6387
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26950>
2024-02-02 19:49:22 +00:00
Faith Ekstrand
48ebfeba34 nak: Add a source barrier intrinsic
This just inserts a GPU stall until the given source is available.  We
need this in order to properly implement shader clock.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27303>
2024-01-26 16:55:50 +00:00
Georg Lehmann
1cb5bf7009 nir: add ballot_relaxed and as_uniform intrinsics
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27116>
2024-01-19 20:13:33 +00:00
Faith Ekstrand
82fe981e35 nir,spirv: Add support for SPV_NV_shader_sm_builtins
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27154>
2024-01-18 20:20:06 +00:00
Alyssa Rosenzweig
8ddd89ffa5 nir,zink: Redefine flat_mask in terms of I/O locations
Robust against separable shaders, and still makes sense for lowered I/O drivers,
whereas just counting FS variables and expecting them to match with the VS is...
questionable.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: antonino <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26888>
2024-01-10 14:30:14 +00:00
Alyssa Rosenzweig
97f9f7ab0a asahi: implement point sprites w/o shader key
we can replace varyings with point sprites, we just need to fix up .zw
appropriately. do that with some bcsels, ALU is cheap.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26963>
2024-01-10 08:44:38 -04:00
Ian Romanick
6b14da33ad intel/fs: nir: Add nir_intrinsic_dpas_intel
v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion.

v3: Fix float16 destination DPAS on DG2.

v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.

v5: Rebase on !26323.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:43 -08:00
Lionel Landwerlin
f53748c481 nir: fixup nir_printf intrinsic description
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26505>
2023-12-12 11:11:10 +00:00
Alyssa Rosenzweig
c43c90a5fa asahi: rewrite pointsize handling
In the wise words of Mike Blumenkrantz, "I hate gl_PointSize and so can you".

The mesa/st lowering won't mesh well with vertex shader epilogues, and it falls
over in various circumstances. I am too tired to go against the grain, so let's
just pretend to be a normal gallium driver and trust in the rasterizer CSO,
lowering point size internally. This properly handles transform feedback without
any hacks, both GL and GLES behaviours, etc.

Fixes:

   KHR-GL31.transform_feedback.capture_vertex_separate_test
   gl-2.0-large-point-fs

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
2023-12-09 12:08:39 -04:00
Alyssa Rosenzweig
5987e47a29 asahi: rework GS input assembly
in prep for tessellation (which will share the IA lowering), and for multidraw
indirect (which greatly complicates IA lowering with geom/tess).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
2023-12-09 12:08:39 -04:00
Marek Olšák
7d2faa88ab nir,radeonsi: add FLAGS into load_vector_arg_amd to record color input usage
This will be needed for gathering color usage from lowered PS.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26307>
2023-12-09 00:05:27 +00:00
Faith Ekstrand
eda940c855 nak: Make barriers SSA-friendly
The NIR intrinsics now take and return a barrier whenever one is
modified instead of modifying in-place.  In NAK, we give the internal
instructions the same treatment and convert everything to use barrier
SSA values and RegRefs.  In nak_from_nir, we move all barriers to/from
GPRs.  We'll clean up the massive pile of OpBMov later.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26463>
2023-12-05 18:59:40 +00:00
Mary Guillemard
60544cae07 nir: Add a ldtram_nv intrinsic
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26224>
2023-11-18 02:46:47 +00:00
Daniel Schürmann
88afbbba11 nir: optimize open-coded quadVote* directly to new nir_quad intrinsics
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218>
2023-11-17 09:45:40 +00:00
Connor Abbott
1cfb0ae92c nir: Add quad vote intrinsics
Both Intel and AMD have special hardware support for these.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218>
2023-11-17 09:45:40 +00:00
Faith Ekstrand
618bdb8571 nak: Rework FS input interpolation
This gives FS I/O the same treatment as we did for vertex attributes in
that we now have a NIR intrinsic which pretty closely matches the
hardware and we lower to that before going into NAK.  This gives us a
bit more control in the NIR.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26181>
2023-11-14 16:38:03 +00:00
Faith Ekstrand
eb0d9a1b88 nir: Add nvidia barrier intrinsics
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
2023-11-14 00:48:14 +00:00
Mary Guillemard
0aa4148978 nir: Add AGX-specific doorbell and stack mapping opcodes
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
2023-11-07 00:05:55 +00:00
Alyssa Rosenzweig
d0a4a8cda0 nir: Add intrinsics for lowering bindless textures/samplers
Needed for merged stages to work properly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
2023-11-07 00:05:54 +00:00
Alyssa Rosenzweig
33e80918de nir: Add intrinsics for lowering GS
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
2023-11-07 00:05:54 +00:00
Alyssa Rosenzweig
b65636ca40 nir/lower_gs_intrinsics: Count decomposed primitives too
We need both: decomposed primitives for transform feedback and regular
primitives for the sizing the index buffer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
2023-11-07 00:05:54 +00:00
Alyssa Rosenzweig
f157a3de4e nir/lower_gs_intrinsics: Include primitive counts
Generic GS lowering needs this, we already calculate it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
2023-11-07 00:05:54 +00:00
Mary Guillemard
5308378a35 nir: Add NVIDIA-specific geometry shader opcodes
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25000>
2023-10-24 22:21:18 +00:00
Faith Ekstrand
1fa7c37a36 nir: Add NVIDIA-specific I/O intrinsics
NVIDIA hardware doesn't take a vertex index for per-vertex I/O.
Instead, it takes an offset into the primitive.  This has to be fetched
using a combination of SR_INVOCATION_INFO and the ISBERD instruction.
To keep things simple and allow for maximum CSE, we do the lowering in
NIR and patch the load/store_per_vertex_input/output intrinsic.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25000>
2023-10-24 22:21:18 +00:00
Faith Ekstrand
8188842fdc nir: Add a range to most I/O intrinsics
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25000>
2023-10-24 22:21:18 +00:00
Faith Ekstrand
a2b799c53c nir: Add an load_barycentric_at_offset_nv intrinsic
NVIDIA hardware takes the offset as two 4.12 fixed-point values packed
into a single 32-bit value.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25000>
2023-10-24 22:21:18 +00:00
Faith Ekstrand
5984265d45 nir: Add a load_sysval_nv intrinsic
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25000>
2023-10-24 22:21:18 +00:00
Rhys Perry
4c3677094e aco,nir: add export_row_amd intrinsic
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25040>
2023-10-24 21:36:06 +00:00
Bas Nieuwenhuizen
a29cd20d17 nir: Add AMD cooperative matrix intrinsics.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683>
2023-10-24 13:24:18 +00:00
Rhys Perry
ad5be40303 nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25525>
2023-10-04 18:53:43 +00:00
Danylo Piliaiev
a5f0f7d4b1 turnip,ir3: Implement A7XX push consts load via preamble
New push consts loading consist of:
- Push consts are set for the entire pipeline via HLSQ_SHARED_CONSTS_IMM
  array which could fit up to 256b of push consts.
- For each shader stage that uses push consts READ_IMM_SHARED_CONSTS
  should be set in HLSQ_*_CNTL, otherwise push consts may get overwritten
  by new push consts that are set after the draw.
- Push consts are loaded into consts reg file in a shader preamble via
  stsc at the very start of the preamble.

OPC_PUSH_CONSTS_LOAD_MACRO is used instead of directly translating NIR
intrinsic into stsc because: we don't want to teach legalize pass how
to set (ss) between stores and loads of consts reg file, don't want for
stsc to be reordered, etc.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25086>
2023-10-04 15:51:54 +00:00
Georg Lehmann
289b369597 nir: make quad intrinsic dst bit size match src0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25501>
2023-10-03 12:49:28 +00:00
Alyssa Rosenzweig
10b9c2fa36 nir: Support arrays in block_image_store_agx
For layered rendering, runs once per layer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
f4042afd57 nir: Add layer_id_written_agx sysval
We'll implement layer ID reads in the frag shader with a varying read, but if
the VS doesn't write the varying we need to return 0 per the spec. Add a sysval
to detect that case so we can handle it at runtime without keys.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Caio Oliveira
3105d516d0 nir: Add new intrinsics for Cooperative Matrix
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23825>
2023-09-28 07:35:02 +00:00
Samuel Pitoiset
1ce80653b2 nir: rename atomic_add_gs_invocation_count_amd to make it more generic
It will be re-used to implement mesh/tash shader invocations queries.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25331>
2023-09-26 07:50:15 +00:00
Connor Abbott
4282386311 nir/spirv: Add inverse_ballot intrinsic
This is actually a no-op on AMD, so we really don't want to lower it to
something more complicated.  There may be a more efficient way to do
this on Intel too. In addition, in the future we'll want to use this for
lowering boolean reduce operations, where the inverse ballot will
operate on the backend's "natural" ballot type as indicated by
options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In
total, there are now three possible lowerings we may have to perform:

- inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot
with natural source type, when the backend supports inverse_ballot
natively.
- inverse_ballot with source type of uvec4 from SPIR-V to arithmetic,
when the backend doesn't support inverse_ballot.
- inverse_ballot with natural source type from reduce operation, when
the backend doesn't support inverse_ballot.

Previously we just did the second lowering unconditionally in vtn, but
it's just a combination of the first and third. We add support here for
the first and third lowerings in nir_lower_subgroups, instead of simply
moving the second lowering, to avoid unnecessary churn.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
2023-09-20 14:41:18 +00:00
Karol Herbst
513cd29eda nir: make num_workgroups 32 bit only
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24905>
2023-08-30 07:04:33 +00:00
Karol Herbst
1b22b67199 nir: make workgroup_id 32 bit only
No backend supports 64 bit values natively anyway.

Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24905>
2023-08-30 07:04:33 +00:00
Alyssa Rosenzweig
5189bae50c asahi: Move UBO lowering into GL driver
In Vulkan, UBOs are lowered by nir_lower_explicit_io, and the ubo_base_agx
sysval is unused (since it doesn't handle descriptor sets). That makes the UBO
lowering GL-only and hence belongs with the GL driver rather than the compiler.
This lets us delete the ubo_base_agx sysval.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24847>
2023-08-23 15:06:55 +00:00
Alyssa Rosenzweig
1d77fb967d nir,asahi: Remove texture_base_agx
Doing a descriptor crawl with binding tables requires a real binding table in
the shader, which won't work for VK or merged shader stages in GL. Instead,
let's lower anything that needs a crawl to bindless in the driver, so the
compiler code doesn't need to know anything about descriptor binding models.
That gets rid of the texture_base_agx sysval, which is problematic when there
are multiple descriptor sets worth of textures.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24847>
2023-08-23 15:06:55 +00:00