Commit graph

84 commits

Author SHA1 Message Date
Timur Kristóf
7056714f50 aco: Set config->lds_size when TES or VS is running on HW ESGS.
This doesn't fix anything, just reports the LDS size used by
merged ESGS shaders, such as vertex_geometry_gs and
tess_eval_geometry_gs.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
2020-04-29 11:51:04 +00:00
Timur Kristóf
baa46878d4 aco: Calculate workgroup size of legacy GS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
2020-04-29 11:51:04 +00:00
Timur Kristóf
fdbb296853 aco: Remember VS/TCS output driver locations.
Instead of relying on calling shader_io_get_unique_index repeatedly,
remember the which output driver location corresponds to which
varying slot.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
2020-04-29 11:51:04 +00:00
Timur Kristóf
ab07c4ea70 aco: Use context variables instead of calculating TCS inputs/outputs.
VS needs the number of TCS inputs, and TES needs the number of TCS
outputs.

It is error-prone to repeat those calculations in both instruction
selection and setup. Just set them in one place instead.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
2020-04-29 11:51:04 +00:00
Timur Kristóf
fd0248c37b radv: Refactor calculate_tess_lds_size and get_tcs_num_patches.
Previously these functions needed the bit mask of the TCS outputs
and patch outputs written, and concluded the number of outputs
from that.

Now, they take the number of outputs and patch outputs instead.
This will allow the backend compiler to better optimize the
LDS layout.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
2020-04-29 11:51:04 +00:00
Rhys Perry
eeccb1a941 aco: lower 8/16-bit integer arithmetic
dEQP-VK.spirv_assembly.type.* passes with the features and extensions
enabled.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4707>
2020-04-24 20:03:59 +01:00
Rhys Perry
deea4b7c5a aco: vectorize global loads/stores
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4639>
2020-04-24 18:52:54 +00:00
Rhys Perry
b77d638e1b aco: add and use RegClass::get() helper
Eventually, we'll probably want to replace the current
RegClass(type, size) constructor with this.

This has a functional change in that get_reg_class() now creates v1/v2
instead of v4b/v8b.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4639>
2020-04-24 18:52:54 +00:00
Samuel Pitoiset
c4ca9e66dd aco: fix exporting the viewport index if the fragment shader needs it
It's like the layer, it has to be exported via the pos and also
as a varying if the fragment shader reads it.

Fixes dEQP-VK.draw.shader_viewport_index.fragment_shader_*

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4564>
2020-04-17 16:23:24 +00:00
Daniel Schürmann
637f45f390 aco: setup subdword regclasses for ssa_undef & load_const
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4492>
2020-04-10 07:19:27 +00:00
Samuel Pitoiset
67b567d0d0 aco: implement nir_op_b2f16/nir_op_i2f16/nir_op_u2f16
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:05 +02:00
Timur Kristóf
c633edad72 aco/ngg: Implement NGG VS and TES.
When NGG is used, vertex and tess eval shaders are executed on the
hardware NGG geometry stage. There is a series of steps they
must perform:

* Request GS space using GS_ALLOC_REQ
* Export the primitive
* Finally, export the normal VS outputs

In this commit, two modes are implemented:

* "late" which matches what the RADV LLVM backend currently does
* "early" which is an optimized version as seen in radeonsi

Vulkan doesn't allow the shader to write the edge flags, so we can
currently always use the "early" mode.

Exporting the primitive ID is also supported by having the GS threads
write that into LDS and reading them from LDS in the ES threads.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
2020-04-07 11:29:35 +00:00
Timur Kristóf
c5ed0883fc aco/ngg: Setup NGG VS and TES stages.
ngg_vertex_gs and ngg_tess_eval_gs work very similarly to
vertex_vs and tess_eval_vs, but they run on the HW NGG GS stage.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
2020-04-07 11:29:35 +00:00
Rhys Perry
8dd6a51e80 aco: remove divergence check in sanitize_if()
We also need to do this if a side ends in a divergent break.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4461>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4461>
2020-04-06 18:39:29 +00:00
Daniel Schürmann
23ac24f5b1 aco: add missing conversion operations for small bitsizes
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
d223e4e8de aco: don't vectorize 8/16bit load/store_ssbo
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
f01bf51a2b aco: refactor regClass setup for subdword VGPRs
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Samuel Pitoiset
2f424c83e0 aco: only break SMEM clauses if XNACK is enabled (mostly APUs)
According to LLVM, it seems only required for APUs like RAVEN, but
we still ensure that SMEM stores are in their own clause.

pipeline-db (VEGA10):
Totals from affected shaders:
SGPRS: 1775364 -> 1775364 (0.00 %)
VGPRS: 1287176 -> 1287176 (0.00 %)
Spilled SGPRs: 725 -> 725 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 65386620 -> 65107460 (-0.43 %) bytes
Max Waves: 287099 -> 287099 (0.00 %)

pipeline-db (POLARIS10):
Totals from affected shaders:
SGPRS: 1797743 -> 1797743 (0.00 %)
VGPRS: 1271108 -> 1271108 (0.00 %)
Spilled SGPRs: 730 -> 730 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 64046244 -> 63782324 (-0.41 %) bytes
Max Waves: 254875 -> 254875 (0.00 %)

This only affects GFX6-GFX9 chips because the compiler uses a
different pass for GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349>
2020-04-01 17:50:31 +00:00
Jason Ekstrand
16a80ff18a aco: Implement b2b32 and b2b1
The implementations here just clone i2b32 and i2b1.  This means that
b2b32 doesn't technically generate true NIR 0/-1 booleans but it should
be fine as it's only ever generated for shared variable writes which
will always be consumed by something which will then run it through an
i2b again.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
2020-03-30 15:46:19 +00:00
Timur Kristóf
0f847b18bc aco: Don't store LS VS outputs to LDS when TCS doesn't need them.
Totals:
Code Size: 254764624 -> 254745104 (-0.01 %) bytes

Totals from affected shaders:
VGPRS: 12132 -> 12112 (-0.16 %)
Code Size: 573364 -> 553844 (-3.40 %) bytes

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
0f35b3795d aco: Fix workgroup size calculation.
Clear the workgroup size for all supported shader stages.
Also, unify the workgroup size calculation accross various places.

As a result, insert_waitcnt can use the proper workgroup size
which means that some waits can be dropped from tessellation
shaders. Also, in cases where the previous calculation was wrong,
we now insert s_barrier instructions.

Totals from affected shaders (GFX10):
Code Size: 340116 -> 338484 (-0.48 %) bytes

Fixes: a8d15ab6da
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
99ad62ff27 aco: Extract setup_tcs_info to a separate function.
Will be required by the workgroup size calculation.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
50634ad4a0 aco: Change isel inputs/outputs to a flat array.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
e4a1b246a4 aco: Treat outputs of the previous stage as inputs of the next stage.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
17c779ab9e aco: Skip 2nd read of merged wave info when TCS in/out vertices are equal.
When TCS has an equal number of input and output, it means that the
number of VS and TCS invocations (LS and HS) are the same; and that
the HS invocations operate on the same vertices as the LS.

When this is the case, this commit removes the else-if between
the merged VS and TCS halves, making it possible to schedule
and optimize the code accross the two halves.

Totals:
SGPRS: 5577367 -> 5581735 (0.08 %)
VGPRS: 3958592 -> 3960752 (0.05 %)
Code Size: 254867144 -> 254838244 (-0.01 %) bytes
Max Waves: 1053887 -> 1053747 (-0.01 %)

Totals from affected shaders:
SGPRS: 29032 -> 33400 (15.05 %)
VGPRS: 35664 -> 37824 (6.06 %)
Code Size: 1979028 -> 1950128 (-1.46 %) bytes
Max Waves: 7310 -> 7170 (-1.92 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
e2b1d749b1 aco: Fix handling of tess factors.
There is no need to check whether they are written using indirect
indices, because all tess factors should be written to VMEM only
at the end of the shader.

No pipeline db changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Rhys Perry
17c7f4e30e aco: fix boolean undef regclass
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4285>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4285>
2020-03-23 19:43:09 +00:00
Rhys Perry
9d56ed199b aco: emit IR in IF's merge block instead if the other side ends in a jump
Fixes NIR such as:
if (divergent) {
   a = sgpr()
} else {
   break;
}
use(a)

Previously we would have emitted:
if (divergent) {
   a = sgpr()
}
if (!divergent) {
   break;
}
use(a)

But "a" isn't available at it's use. Now we emit:
if (divergent) {
}
if (!divergent) {
   break;
}
a = sgpr()
use(a)

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 1936 -> 1936 (0.00 %)
VGPRS: 1264 -> 1264 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 159408 -> 159152 (-0.16 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 81 -> 81 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2557
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
2020-03-23 15:55:12 +00:00
Rhys Perry
ee9e0d1eca aco: set late kill for v_interp_p1_f32 for some APUs
Apparently needed for Stoney Ridge, Kabini and Mullins APUs.

gfx702 also has 16-bank LDS and https://llvm.org/docs/AMDGPUUsage.html
lists some dGPUs under there. Those GPUs seem to be Hawaii actually
(gfx701) and we don't seem to have gotten any interpolation related bugs
reported with them so far.

The late kill flag was tested by running pipeline-db with
ACO_DEBUG=validatera while setting late kill for SMEM buffer loads,
emit_vop2_instruction() and texture instructions. I also tested with
just setting the flag for v_interp_p1_f32.

As far as I know, the only other thing we have to consider for 16-bank LDS
is something to do with 16-bit interpolation. We don't do that yet.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>
2020-03-16 16:09:02 +00:00
Timur Kristóf
0e8f4baede aco: Setup tessellation evaluation shader variables.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:10 +00:00
Timur Kristóf
7b7f196fbc aco: Implement tessellation control shader input/output.
Tessellation control shaders can have per-vertex inputs,
and both per-vertex and per-patch outputs. TCS can not only store,
but also load their outputs.

The TCS outputs are stored in RING_HS_TESS_OFFCHIP in VMEM, which
is where the TES reads them from. Additionally, the are also stored
in LDS to make sure they can be loaded fast when read by the TCS.

Tessellation factors are always just stored in LDS.
At the end of the shader, the first shader invocation reads these
from LDS and writes them to RING_HS_TESS_FACTOR in VMEM, and
additionally to RING_HS_TESS_OFFCHIP when they are read by
the Tessellation Evaluation Shader.

This implementation matches the memory layouts used by radv_nir_to_llvm.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:10 +00:00
Timur Kristóf
9016711273 aco: Setup correct HW stages when tessellation is used.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:10 +00:00
Timur Kristóf
754837f3b5 aco: Implement load_tess_coord.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:10 +00:00
Timur Kristóf
9ca2b254ca aco: Setup tessellation control shader variables.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:10 +00:00
Timur Kristóf
7b3316f3c9 aco: Extract setup_gs_variables into a separate function.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:10 +00:00
Rhys Perry
b088a4b113 aco: only reserve sgprs for vcc if it's used
pipeline-db (Vega):

Totals:
SGPRS: 5186302 -> 5075616 (-2.13 %)
VGPRS: 3704580 -> 3704580 (0.00 %)
Spilled SGPRs: 144859 -> 144859 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 4124 -> 4124 (0.00 %) dwords per thread
Code Size: 247315944 -> 247315944 (0.00 %) bytes
LDS: 1311 -> 1311 (0.00 %) blocks
Max Waves: 674560 -> 674562 (0.00 %)

Totals from affected shaders:
SGPRS: 536992 -> 426306 (-20.61 %)
VGPRS: 356404 -> 356404 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 8498748 -> 8498748 (0.00 %) bytes
LDS: 8 -> 8 (0.00 %) blocks
Max Waves: 113832 -> 113834 (0.00 %)

There are some small code size changes in a few RotTR shaders and a small
increase in max_waves in two Detroit: Become Human shaders.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3906>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3906>
2020-03-05 20:18:34 +00:00
Rhys Perry
5ea23ba659 aco: set exec_potentially_empty after continues/breaks in nested IFs
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
2020-01-29 18:02:27 +00:00
Samuel Pitoiset
3922d95b51 aco: implement VK_AMD_shader_explicit_vertex_parameter
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
2020-01-29 09:49:50 +00:00
Rhys Perry
7edcf4a59d aco: fix rebase error from GS copy shader support
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f8f7712666 ('aco: implement GS copy shaders')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3601>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3601>
2020-01-28 13:50:53 +00:00
Samuel Pitoiset
918f00eef8 aco: combine MRTZ (depth, stencil, sample mask) exports
Instead of emitting up to 3 for each different components (depth,
stencil and sample mask). This is needed to fix a hw bug on GFX6.

Totals from affected shaders:
SGPRS: 34728 -> 35056 (0.94 %)
VGPRS: 26440 -> 26476 (0.14 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1346088 -> 1344180 (-0.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 3922 -> 3915 (-0.18 %)
Wait states: 0 -> 0 (0.00 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3538>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3538>
2020-01-24 16:42:15 +00:00
Rhys Perry
b046f55086 aco: use nir_move_copies
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
2020-01-24 13:35:07 +00:00
Rhys Perry
f8f7712666 aco: implement GS copy shaders
v5: rebase on float_controls changes
v7: rebase after shader args MR and load/store vectorizer MR

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
2020-01-24 13:35:07 +00:00
Rhys Perry
de4ce66f5c aco: remove needs_instance_id
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
2020-01-24 13:35:07 +00:00
Rhys Perry
8bad100f83 aco: implement GS on GFX7-8
GS is the same on GFX6, but GFX6 isn't fully supported yet.

v4: fix regclass
v7: rebase after shader args MR

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
2020-01-24 13:35:07 +00:00
Rhys Perry
40bb81c9dd radv/aco,aco: implement GS on GFX9+
v2: implement GFX10
v3: rebase
v7: rebase after shader args MR
v8: fix gs_vtx_offset usage on GFX9/GFX10
v8: use unreachable() instead of printing intrinsic
v8: rename output_state to ge_output_state
v8: fix formatting around nir_foreach_variable()
v8: rename some helpers in the scheduler
v8: rename p_memory_barrier_all to p_memory_barrier_common
v8: fix assertion comparing ctx.stage against vertex_geometry_gs

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
2020-01-24 13:35:07 +00:00
Rhys Perry
b5c9688516 aco: limit register usage for large work groups
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-10 12:10:37 +00:00
Rhys Perry
afe1a8ff5b aco: fix vgpr alloc granule with wave32
We still need to increase the number of physical vgprs

Totals from affected shaders:
SGPRS: 671976 -> 675288 (0.49 %)
VGPRS: 550112 -> 562596 (2.27 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 27621660 -> 27606532 (-0.05 %) bytes
Max Waves: 81083 -> 87833 (8.32 %)
Instructions: 5391560 -> 5389031 (-0.05 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-21 12:38:42 +01:00
Karol Herbst
b35e583c17 aco: use NIR_MAX_VEC_COMPONENTS instead of 4
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-21 11:00:16 +00:00
Timur Kristóf
637c5a1dd9 aco/wave32: Fix reductions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
21db083504 aco/wave32: Allow setting the subgroup ballot size to 64-bit.
Previously, it would only work when the ballot size was set to the
lane mask. This patch makes is possible to set the ballot size
to either 32-bit or 64-bit for wave32 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00