Commit graph

55 commits

Author SHA1 Message Date
Qiang Yu
3507cdc59c ac/nir: legacy vs/gs use nir_xfb_info to replace pipe_stream_output_info
pipe_stream_output_info is built from nir_xfb_info, why not just use
nir_xfb_info directly.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20015>
2022-11-29 03:28:42 +00:00
Samuel Pitoiset
505290dc44 ac/nir,radv: rework and fix NGG queries enables for VS/TES
XFB queries need to be enabled with NGG streamout and VS/TES.
Previously, the NGG lowering code relied on has_prim_query for XFB.

This fixes failures with RADV_PERFTEST=ngg_streamout on GFX10.3 with
the vkd3d-proton testsuite. Vulkan CTS is missing TES tests with XFB
queries apparently.

Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19493>
2022-11-07 14:54:53 +00:00
Rhys Perry
140cefe95a ac/nir: lower gfx11 vertex parameter exports
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19228>
2022-10-31 14:33:43 +00:00
Rhys Perry
93fb84237f ac/nir: add ac_nir_lower_ngg_options
These signatures were getting ridiculous.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19340>
2022-10-27 13:31:40 +00:00
Qiang Yu
e536d0fe4b ac/nir/ngg,radv: move LDS layout calculation out of nir ngg lowering
Use lds base load intrinsics in nir ngg lowering to get layout, left
its calulation to driver.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
2022-10-27 07:35:01 +00:00
Qiang Yu
54eea0e393 ac/nir/ngg: pass primitive_id_location as param for nogs lower
radeonsi need to use packed driver location for all outputs,
while radv need to use VARYING_SLOT_*. To meet both drivers'
needs.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
2022-10-27 07:35:01 +00:00
Rhys Perry
1c005e72f4 ac/nir: add legacy streamout and GS copy shader helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19302>
2022-10-25 17:35:08 +00:00
Qiang Yu
188a7f9226 ac/nir/ngg: add query param to ac_nir_lower_ngg_gs
radeonsi may disable it. gfx_level will also be used by latter
vertex param export when gfx11.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17457>
2022-10-25 12:58:43 +00:00
Qiang Yu
97e1613b0e ac/nir/ngg: use nir_load_provoking_vtx_in_prim_amd in ngg lower
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19166>
2022-10-20 06:53:56 +00:00
Qiang Yu
074f3216f2 ac/nir/ngg: support gs streamout
Port from radeonsi.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17654>
2022-09-16 08:51:28 +00:00
Qiang Yu
5ec79f9899 ac/nir/ngg: nogs support streamout
Port from radeonsi.

Works on both GFX11 and GFX10. Although GFX10 can do atomic
GDS add on all threads, now we just disable the NGG streamout
for GFX10, so it's OK.

There's a difference for the GFX11 implementation with radeonsi
that we do all 4 buffer/stream info calc on a single thread.
It's just because this is simple, we need to update GDS on a
single thread anyway, and streamout is not that performance
critical to loss a small amount of instruction. We may change
to a better implementation when using register based streamout.

When streamout enabled, ES threads need to save all vertex
attributes to LDS besides position. This is because we don't
know where in the streamout buffer to export the attributes to
and wheter there are space in the streamout buffer.

Streamout is done in primitives, so we need to check if there
is space and where the current primitive should be written to
by GDS atomic add, then in GS threads do the streamout.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17654>
2022-09-16 08:51:28 +00:00
Qiang Yu
f75452918b ac/nir/ngg: support clipdist culling
Port from radeonsi.

Besides vertex position based primitive culling, clipdist
attribute can also be used to cull a primitive. Normally
it's used by fixed-pipeline, but when NGG we can treate it
as a culling condition to filter out invisible primitive
before fixed-pipeline.

There are two kinds of clipdist:
1. user define a clip plane explicitly by glClipPlane(),
   fixed-pipeline calculate with vertex position to get
   clipdist, then cull. This is the legacy way.
2. Now GLSL define gl_ClipDistance/gl_CullDiatance so that
   user can calculate clipdist in any way he like.

This implementation support both way.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
2022-08-26 05:50:30 +00:00
Qiang Yu
1bdeb961bd ac/nir/ngg: add gs culling
Port from radeonsi.

Cull primitive after GS thread and before final vertex/primitive
export. GS culling is like VS/TES culling which read out saved
vertex positions of a primitive from LDS then call the primitive
culling algorithm to check whether it's visiable or not, only
passed primitives will be exported.

Unlike the VS/TES culling that read vertex index of a primitive
from VGPRs as shader args, GS will set a primitive complete flag
for each last vertex of a primitive in LDS, so that vertex thread
know the previous 1/2/3 vertex can form a primitive and do primitive
culling.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
2022-08-26 05:50:30 +00:00
Qiang Yu
db0e9d3cab ac/nir/ngg: support line culling
Port from ac_llvm_cull.c

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
2022-08-26 05:50:30 +00:00
Qiang Yu
f1f2c931a7 ac/nir/cull: support caller react when primitive is rejected
Make accept_func optional, and return accpect result for caller
react when primitive is rejected.

This is for GS culling.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
2022-08-26 05:50:30 +00:00
Timur Kristóf
c721f751f2 ac/nir/ngg: Move LDS store of accepted flag into the inner branch.
For primitives which are rejected based on only W and face, this
will reduce the number of executed branches.

Fossil DB stats on Navi 21:

Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160330564 -> 160086644 (-0.15%)
Instrs: 30477385 -> 30477916 (+0.00%); split: -0.00%, +0.00%
Latency: 139802763 -> 139587915 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 21198444 -> 21184261 (-0.07%); split: -0.07%, +0.00%
SClause: 749811 -> 749810 (-0.00%)
Copies: 2701482 -> 2762930 (+2.27%); split: -0.00%, +2.28%

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
2022-08-05 22:10:28 +00:00
Marek Olšák
4f622d62d0 ac/nir: add ac_nir_lower_resinfo
Emulating image_get_resinfo should be faster than using the hw.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17693>
2022-08-03 17:44:15 +00:00
Qiang Yu
e9f1f115fa ac/nir: add triangle_strip_adjacency_fix to gs input lower
From radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16788>
2022-06-27 11:32:43 +08:00
Qiang Yu
109eb378e5 ac/nir: change es output lower param to esgs_itemsize
radeonsi may add extra dword to the stride, so let's pass it
directly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16788>
2022-06-27 11:32:34 +08:00
Qiang Yu
8b5e8b2af7 ac/nir: remove unused param num_reserved_es_outputs from gs input lower
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16788>
2022-06-27 11:32:30 +08:00
Qiang Yu
d00845faf4 ac/nir: add no_input_lds_space param to hs output lower
This is used by radeonsi to save some lds space when all LS output
is passed by register.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
2022-06-27 02:38:21 +00:00
Qiang Yu
ae9b02b4d0 ac/nir: add wave_size parameter to ac_nir_lower_hs_outputs_to_mem
Used by radeonsi and radv to reflect true wave size used, not minimal size.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
2022-06-27 02:38:21 +00:00
Qiang Yu
18d51831a8 ac/nir: add pass_tessfactors_by_reg param to hs output lower
radeonsi won't emit tess factor in the lower pass, need to keep
the output for llvm backend to pass it as parameter. This is used
by radeonsi for an optimization to save LDS write.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
2022-06-27 02:38:21 +00:00
Qiang Yu
6ccb9634de ac/nir: use nir_intrinsic_load_hs_out_patch_data_offset_amd in tess lower
radeonsi load this from SGPR arg, can't use static value because TCS output
and TES input may not match (TCS output is not a key for TES) and
determined in runtime.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
2022-06-27 02:38:21 +00:00
Qiang Yu
2ba6d2b107 ac/nir: remove unused parameter in tes input lower
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
2022-06-27 02:38:21 +00:00
Qiang Yu
3aa70d92ce radv: no need to do gs_alloc_req for newer chips in ngg vs/tes
Copy from radeonsi.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17130>
2022-06-27 02:12:13 +00:00
Samuel Pitoiset
fe57fe1fd8 ac/nir/ngg: count the number of generated primitives for VS and TES
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15639>
2022-06-09 08:02:39 +00:00
Timur Kristóf
b664279755 ac/nir/ngg: Use mesh shader scratch ring when outputs don't fit LDS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16737>
2022-06-08 08:43:51 +00:00
Timur Kristóf
f7f2770e72 ac/nir: Add remappability to tess and ESGS I/O lowering passes.
This will be used for radeonsi to map common I/O location to fixed
slots agreed by different shader stages.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16418>
2022-06-07 01:40:14 +00:00
Qiang Yu
6a95452ddf ac/nir: use nir_intrinsic_load_lshs_vertex_stride_amd
For radeonsi which pass this value by argument.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16418>
2022-06-07 01:40:14 +00:00
Timur Kristóf
c69b771e35 radv, ac/nir: Fix multiview layer export for mesh shaders.
Unfortunately, radv_lower_multiview is not suitable for mesh shaders
because it can't know the mapping between API mesh shader
invocations and output primitives.

Additionally, when lowering view id to layer, it must be created
as a per-primitive PS input.

Fixes: d32656bc65
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16504>
2022-05-31 07:58:29 +00:00
Marek Olšák
39800f0fa3 amd: change chip_class naming to "enum amd_gfx_level gfx_level"
This aligns the naming with PAL.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pellou-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16469>
2022-05-13 14:56:22 -04:00
Timur Kristóf
7de3034897 ac/nir: Add I/O lowering for task and mesh shaders.
Task shaders store their output payload to VRAM where mesh
shaders read from. There are two ring buffers:

1. Draw ring: this is where mesh dispatch sizes and
the ready bit are stored.

2. Payload ring: this is where the optional payload
is stored (up to 16K per task workgroup).

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
2022-05-12 00:29:51 +00:00
Timur Kristóf
212f183c1f ac/nir: Remove now-superfluous ac_nir_lower_tess_to_const.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13155>
2022-05-10 17:16:03 +00:00
Timur Kristóf
719678f891 ac/nir: Add ac_nir_load_arg helper for shader arguments.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13155>
2022-05-10 17:16:03 +00:00
Marek Olšák
11c28d9798 ac: add ac_nir_optimize_outputs, a NIR version of ac_optimize_vs_outputs
ac_optimize_vs_outputs is an LLVM IR pass, and it will be replaced by this.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14414>
2022-04-22 22:21:11 +00:00
Rhys Perry
61ac5acca3 radv,ac/nir: lower global access to _amd global access intrinsics
fossil-db (Sienna Cichlid):
Totals from 400 (0.30% of 134621) affected shaders:
VGPRs: 18696 -> 18688 (-0.04%)
CodeSize: 2031348 -> 1946640 (-4.17%)
Instrs: 374703 -> 360226 (-3.86%)
Latency: 4200727 -> 4108628 (-2.19%); split: -2.20%, +0.01%
InvThroughput: 1059935 -> 1029441 (-2.88%); split: -2.88%, +0.00%
VClause: 5777 -> 5771 (-0.10%)
SClause: 11890 -> 10891 (-8.40%); split: -8.57%, +0.17%
Copies: 34035 -> 33259 (-2.28%); split: -2.98%, +0.70%
Branches: 11108 -> 11100 (-0.07%); split: -0.08%, +0.01%
PreSGPRs: 15999 -> 15942 (-0.36%); split: -0.44%, +0.08%
PreVGPRs: 16994 -> 16970 (-0.14%)

fossil-db (Polaris10):
Totals from 400 (0.29% of 135668) affected shaders:
SGPRs: 23799 -> 22919 (-3.70%); split: -4.30%, +0.61%
VGPRs: 18480 -> 18472 (-0.04%)
CodeSize: 2090316 -> 2041592 (-2.33%)
Instrs: 395461 -> 385747 (-2.46%); split: -2.46%, +0.00%
Latency: 5045768 -> 5020196 (-0.51%); split: -0.53%, +0.02%
InvThroughput: 2694320 -> 2689886 (-0.16%); split: -0.23%, +0.07%
VClause: 5982 -> 5968 (-0.23%)
SClause: 12064 -> 10823 (-10.29%); split: -10.33%, +0.04%
Copies: 48233 -> 48322 (+0.18%); split: -0.47%, +0.65%
PreSGPRs: 16409 -> 16358 (-0.31%); split: -0.39%, +0.08%

fossil-db (Pitcairn):
Totals from 400 (0.29% of 135668) affected shaders:
SGPRs: 22431 -> 22215 (-0.96%); split: -2.60%, +1.64%
VGPRs: 18776 -> 18560 (-1.15%); split: -1.21%, +0.06%
CodeSize: 2104440 -> 2017708 (-4.12%)
MaxWaves: 2363 -> 2367 (+0.17%)
Instrs: 413099 -> 397446 (-3.79%)
Latency: 5507707 -> 5450251 (-1.04%); split: -1.12%, +0.07%
InvThroughput: 2838867 -> 2786903 (-1.83%); split: -1.83%, +0.00%
VClause: 10334 -> 10097 (-2.29%)
SClause: 12346 -> 11005 (-10.86%); split: -10.89%, +0.02%
Copies: 54034 -> 52065 (-3.64%); split: -3.99%, +0.35%
PreSGPRs: 17916 -> 17857 (-0.33%); split: -0.40%, +0.07%
PreVGPRs: 16917 -> 16893 (-0.14%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Marek Olšák
116a05c721 ac: move ac_exp_param.h to ac_nir.h
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14266>
2022-01-05 12:46:31 +00:00
Timur Kristóf
7aa42e023a ac/nir/ngg: Lower NV mesh shaders to NGG semantics.
Lower mesh shader outputs to shared memory.

At the end of the shader, read the outputs from shared memory
and export their values as NGG expects.

We allocate separate shared memory (LDS) areas for per-vertex,
per-primitive outputs, primitive indices, primitive count.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13580>
2021-12-31 13:05:09 +00:00
Samuel Pitoiset
b52aaea630 radv: remove unnecessary ac_nir_ngg_config output struct
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13134>
2021-10-04 08:55:19 +00:00
Samuel Pitoiset
52e91f7640 radv: move ngg passthrough determination earlier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13134>
2021-10-04 08:55:19 +00:00
Samuel Pitoiset
2ce78a30ff move: move ngg lds bytes determination earlier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13134>
2021-10-04 08:55:19 +00:00
Samuel Pitoiset
90858dd718 radv: move ngg early prim export determination earlier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13134>
2021-10-04 08:55:19 +00:00
Rhys Perry
24501b5452 radv: move ngg culling determination earlier
Co-Authored-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13134>
2021-10-04 08:55:19 +00:00
Timur Kristóf
a7f2faea46 ac/nir: Emit edge flag instructions conditionally.
They are not needed by RADV but will be needed by RadeonSI.

Fossil DB results on Sienna Cichlid (with NGGC on):
Totals from 56917 (44.24% of 128647) affected shaders:
VGPRs: 1982664 -> 1975936 (-0.34%); split: -0.43%, +0.09%
CodeSize: 152790880 -> 149510316 (-2.15%); split: -2.15%, +0.00%
MaxWaves: 1617984 -> 1621900 (+0.24%)
Instrs: 29272825 -> 28907038 (-1.25%); split: -1.26%, +0.01%
Latency: 128744182 -> 127565678 (-0.92%); split: -1.14%, +0.22%
InvThroughput: 20125915 -> 19805168 (-1.59%); split: -1.63%, +0.03%
VClause: 521312 -> 519804 (-0.29%); split: -0.77%, +0.48%
SClause: 688861 -> 688897 (+0.01%); split: -0.04%, +0.05%
Copies: 3205421 -> 3177799 (-0.86%); split: -1.68%, +0.82%
Branches: 1181457 -> 1183147 (+0.14%); split: -0.03%, +0.17%
PreVGPRs: 1626681 -> 1595406 (-1.92%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12998>
2021-09-23 16:57:56 +02:00
Timur Kristóf
f4a65e5628 ac/nir/nggc: Only repack arguments that are needed.
Don't repack everything, only what is actually used.
The goal of this commit is primarily to remove unnecessary
LDS stores and loads. In addition to that, it also gets rid of
a few VALU instructions and reduces VGPR use.

Fossil DB stats on Sienna Cichlid with NGGC on:

Totals from 6951 (5.40% of 128647) affected shaders:
VGPRs: 206056 -> 205360 (-0.34%); split: -0.79%, +0.45%
CodeSize: 12344568 -> 12269312 (-0.61%); split: -0.62%, +0.01%
MaxWaves: 211206 -> 212196 (+0.47%)
Instrs: 2319459 -> 2308483 (-0.47%); split: -0.50%, +0.03%
Latency: 7220829 -> 7164721 (-0.78%); split: -1.21%, +0.43%
InvThroughput: 1051450 -> 1049191 (-0.21%); split: -0.36%, +0.15%
VClause: 25794 -> 25445 (-1.35%); split: -1.97%, +0.61%
SClause: 39192 -> 39277 (+0.22%); split: -0.21%, +0.43%
Copies: 315756 -> 313404 (-0.74%); split: -1.17%, +0.42%
Branches: 127878 -> 127879 (+0.00%); split: -0.00%, +0.00%
PreVGPRs: 168029 -> 160162 (-4.68%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12246>
2021-09-01 14:45:14 +00:00
Timur Kristóf
8341af5109 radv, aco, ac/nir: Tweak position export scheduling for NGG culling.
The result is about +5-ish fps in Doom Eternal.

It turns out that the location of position exports matters more
than we thought, and it's actually better to keep them at the bottom
for culling shaders rather than schedule it up to the top.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
2021-07-13 23:56:33 +00:00
Timur Kristóf
fc1fabbabf ac/nir: Analyze culling shaders to remember which inputs are used when.
These will be useful for some optimizations.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
2021-07-13 23:56:33 +00:00
Timur Kristóf
e97f0463a8 ac/nir: Implement NGG deferred attribute culling in NIR.
Culling is traditionally done by the rasterizer, but that
can be a bottleneck when an app creates a large number
of primitives. Eg. a lot of tiny triangles reduce the
rasterziation efficiency.

NGG makes it possible for the shader to check primitives
and delete those that it can prove are not needed.

After this is done, we have to repack the surviving invocations
so they remain compact. This also saves bandwidth, because
some memory loads are only executed by those vertices that
survived the culling.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
2021-07-13 23:56:33 +00:00
Timur Kristóf
651a3da1b5 ac/nir: Add a NIR port of ac_llvm_cull.
The algorithms were originally implemented by Marek Olšák,
hence the copyright to AMD.

This commit just ports the LLVM based implementation to NIR,
using the new intrinsics added earlier.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
2021-07-13 23:56:33 +00:00