Commit graph

206 commits

Author SHA1 Message Date
Samuel Pitoiset
9be4be515f aco: implement 16-bit nir_op_fsub/nir_op_fadd
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:05 +02:00
Samuel Pitoiset
b0b637ca17 aco: implement 16-bit nir_op_fabs/nir_op_fneg
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:05 +02:00
Samuel Pitoiset
acc5912786 aco: implement 16-bit nir_op_fmax/nir_op_fmin
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:05 +02:00
Samuel Pitoiset
66d5bfb09a aco: implement 16-bit nir_op_ffloor/nir_op_fceil
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:05 +02:00
Samuel Pitoiset
c097c9f20c aco: implement 16-bit nir_op_fsqrt/nir_op_frcp/nir_op_frsq
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:05 +02:00
Samuel Pitoiset
26ed9fb79e aco: implement 16-bit nir_op_ftrunc/nir_op_fround_even
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:05 +02:00
Samuel Pitoiset
ee96181ad9 aco: implement 16-bit nir_op_fexp2/nir_op_flog2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:05 +02:00
Samuel Pitoiset
b8486041df aco: implement 16-bit nir_op_ffract
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:04 +02:00
Samuel Pitoiset
a8b45d7034 aco: implement 16-bit nir_op_frexp_sig/nir_op_frexp_exp
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4452>
2020-04-10 08:05:04 +02:00
Timur Kristóf
64225c4f96 aco/ngg: Run GS_ALLOC_REQ on priority 3 for NGG VS and TES.
It is recommended to do this as quickly as possible.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
2020-04-07 11:29:35 +00:00
Timur Kristóf
c633edad72 aco/ngg: Implement NGG VS and TES.
When NGG is used, vertex and tess eval shaders are executed on the
hardware NGG geometry stage. There is a series of steps they
must perform:

* Request GS space using GS_ALLOC_REQ
* Export the primitive
* Finally, export the normal VS outputs

In this commit, two modes are implemented:

* "late" which matches what the RADV LLVM backend currently does
* "early" which is an optimized version as seen in radeonsi

Vulkan doesn't allow the shader to write the edge flags, so we can
currently always use the "early" mode.

Exporting the primitive ID is also supported by having the GS threads
write that into LDS and reading them from LDS in the ES threads.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
2020-04-07 11:29:35 +00:00
Timur Kristóf
d345bfe195 aco: Extract merged_wave_info_to_mask to its own function.
Currently we only use this at the beginning of merged shader parts,
but we are going to need to use	it with	some NGG code as well.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
2020-04-07 11:29:35 +00:00
Timur Kristóf
b9cbdb6a45 aco: Extract uniform if handling to separate functions.
Currently we only use this for uniform ifs that come from NIR,
but we are going to need to use it with some NGG parts as well.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
2020-04-07 11:29:35 +00:00
Rhys Perry
20a4b1461b aco: zero-initialize Temp
Fixes dEQP-VK.transform_feedback.* crashes from accesses garbage
temporaries in emit_extract_vector().

Fixes: 85521061 ("aco: prepare helper functions for subdword handling")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4463>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4463>
2020-04-06 19:15:19 +00:00
Daniel Schürmann
1d293096d0 aco: use MUBUF to load subdword SSBO
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
8cfddc9199 aco: implement 8bit/16bit store_ssbo
Currently without alignment check, so that
we can only use the _byte and _short versions
and multi-component stores are split.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
3df0a41c75 aco: implement 8bit/16bit load_buffer
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
c70d014455 aco: implement storagePushConstant8 & storagePushConstant16
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
5718347c2b aco: implement vec2/3/4 with subdword operands
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
85521061d6 aco: prepare helper functions for subdword handling
- get_alu_src()
- emit_extract_vector()
- emit_split_vector()

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
fe08f0ccf9 aco: add byte_align_scalar() & trim_subdword_vector() helper functions
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Daniel Schürmann
23ac24f5b1 aco: add missing conversion operations for small bitsizes
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Jason Ekstrand
16a80ff18a aco: Implement b2b32 and b2b1
The implementations here just clone i2b32 and i2b1.  This means that
b2b32 doesn't technically generate true NIR 0/-1 booleans but it should
be fine as it's only ever generated for shared variable writes which
will always be consumed by something which will then run it through an
i2b again.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
2020-03-30 15:46:19 +00:00
Timur Kristóf
0f847b18bc aco: Don't store LS VS outputs to LDS when TCS doesn't need them.
Totals:
Code Size: 254764624 -> 254745104 (-0.01 %) bytes

Totals from affected shaders:
VGPRS: 12132 -> 12112 (-0.16 %)
Code Size: 573364 -> 553844 (-3.40 %) bytes

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
798dd98d6e aco: When LS and HS invocations are the same, pass LS outputs in temps.
We know that in this case, the LS and HS invocations are working
on the exact same vertex, so it's safe to skip the LDS.

Totals:
VGPRS: 3960744 -> 3961844 (0.03 %)
Code Size: 254824300 -> 254764624 (-0.02 %) bytes
Max Waves: 1053748 -> 1053574 (-0.02 %)

Totals from affected shaders:
VGPRS: 26152 -> 27252 (4.21 %)
Code Size: 1496600 -> 1436924 (-3.99 %) bytes
Max Waves: 4860 -> 4686 (-3.58 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
0a91c086b8 aco: Extract store_output_to_temps into a separate function.
Will be used by LS output stores.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
0f35b3795d aco: Fix workgroup size calculation.
Clear the workgroup size for all supported shader stages.
Also, unify the workgroup size calculation accross various places.

As a result, insert_waitcnt can use the proper workgroup size
which means that some waits can be dropped from tessellation
shaders. Also, in cases where the previous calculation was wrong,
we now insert s_barrier instructions.

Totals from affected shaders (GFX10):
Code Size: 340116 -> 338484 (-0.48 %) bytes

Fixes: a8d15ab6da
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
0ad65f2c55 aco: Zero-fill undefined elements in create_vec_from_array.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
50634ad4a0 aco: Change isel inputs/outputs to a flat array.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
e4a1b246a4 aco: Treat outputs of the previous stage as inputs of the next stage.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
e7d733fdab aco: Use more optimal sequence at the beginning of merged shaders.
It can be further optimized in the future, but
the new sequence already has a few advantages:

* Uses fewer instructions
* Uses even fewer instructions in wave32 mode
* Doesn't use the VALU at all

Totals from affected shaders (GFX10):
VGPRS: 43504 -> 43496 (-0.02 %)
Code Size: 2436000 -> 2423688 (-0.51 %) bytes
Max Waves: 8704 -> 8705 (0.01 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
17c779ab9e aco: Skip 2nd read of merged wave info when TCS in/out vertices are equal.
When TCS has an equal number of input and output, it means that the
number of VS and TCS invocations (LS and HS) are the same; and that
the HS invocations operate on the same vertices as the LS.

When this is the case, this commit removes the else-if between
the merged VS and TCS halves, making it possible to schedule
and optimize the code accross the two halves.

Totals:
SGPRS: 5577367 -> 5581735 (0.08 %)
VGPRS: 3958592 -> 3960752 (0.05 %)
Code Size: 254867144 -> 254838244 (-0.01 %) bytes
Max Waves: 1053887 -> 1053747 (-0.01 %)

Totals from affected shaders:
SGPRS: 29032 -> 33400 (15.05 %)
VGPRS: 35664 -> 37824 (6.06 %)
Code Size: 1979028 -> 1950128 (-1.46 %) bytes
Max Waves: 7310 -> 7170 (-1.92 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
4ec48440a0 aco: Allow combining LDS loads when loading tess factors.
Previously the tess factors were loaded individually, but now they can
be loaded using a single LDS load instruction.

Note that the inner and outer tess factors are not yet combined.

Totals (GFX10):
Code Size: 254896008 -> 254879212 (-0.01 %) bytes

Totals from affected shaders (GFX10):
Code Size: 2028352 -> 2011556 (-0.83 %) bytes

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
ace3833293 aco: Allow combining TCS output VMEM stores.
Some copypasta may have stuck in the code.
This was left on false by mistake.

Totals (GFX10):
Code Size: 254939248 -> 254896008 (-0.02 %) bytes

Totals from affected shaders (GFX10):
VGPRS: 16196 -> 16212 (0.10 %)
Code Size: 1126332 -> 1083092 (-3.84 %) bytes
Max Waves: 2336 -> 2334 (-0.09 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
e2b1d749b1 aco: Fix handling of tess factors.
There is no need to check whether they are written using indirect
indices, because all tess factors should be written to VMEM only
at the end of the shader.

No pipeline db changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
d3f6adcaed aco: Extract tcs_driver_location_matches_api_mask to separate function.
Also clear up should_write_tcs_output_to_lds a little bit.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Timur Kristóf
e0dff5fd86 aco: Create null exports in instruction selection instead of assembler.
This allows the passes after isel to assume that the exports are
always correct, and also allows to schedule these null exports later.
Additionally, it ensures that the correct exec mask is used for
these exports.

Totals from affected shaders (GFX10):
SGPRS: 84224 -> 84344 (0.14 %)
VGPRS: 23088 -> 23076 (-0.05 %)
Code Size: 882892 -> 894368 (1.30 %) bytes

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Rhys Perry
8d8c864beb aco: improve check for unreachable loop continue blocks
The old code would have previously caught:
loop {
   ...
   break
}
when it was meant to just catch:
loop {
   if (...)
      break
   else
      break
}

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
2020-03-23 15:55:12 +00:00
Rhys Perry
46e94fd854 aco: skip NIR in unreachable merge blocks
NIR removes most of this but undef instructions for loop header phis can
remain. These were harmless because ACO would DCE them itself.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
2020-03-23 15:55:12 +00:00
Rhys Perry
638cbc21a1 aco: handle when ACO adds new continue edges
Usually a loop ends with a uniform continue. If it doesn't and we end up
adding our own continue edges (because of continue_or_break or divergent
breaks at the end), we have to add extra operands to the loop header phis.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
2020-03-23 15:55:12 +00:00
Rhys Perry
f2c4878de9 aco: handle missing second predecessors at merge block phis
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
2020-03-23 15:55:12 +00:00
Rhys Perry
f1a2e1df78 aco: set has_divergent_branch for discards in loops
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>
2020-03-23 15:55:12 +00:00
Rhys Perry
2d14a8f237 aco: fix operand order for LS VGPR init bug workaround
Fixes: a952bf3946 ('aco: Fix LS VGPR init bug on affected hardware.')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4201>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4201>
2020-03-16 19:34:32 +00:00
Rhys Perry
ded7a8bb46 aco: fix instruction encoding for LS VGPR init bug workaround
Fixes: a952bf3946 ('aco: Fix LS VGPR init bug on affected hardware.')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4201>
2020-03-16 19:34:32 +00:00
Rhys Perry
ee9e0d1eca aco: set late kill for v_interp_p1_f32 for some APUs
Apparently needed for Stoney Ridge, Kabini and Mullins APUs.

gfx702 also has 16-bank LDS and https://llvm.org/docs/AMDGPUUsage.html
lists some dGPUs under there. Those GPUs seem to be Hawaii actually
(gfx701) and we don't seem to have gotten any interpolation related bugs
reported with them so far.

The late kill flag was tested by running pipeline-db with
ACO_DEBUG=validatera while setting late kill for SMEM buffer loads,
emit_vop2_instruction() and texture instructions. I also tested with
just setting the flag for v_interp_p1_f32.

As far as I know, the only other thing we have to consider for 16-bank LDS
is something to do with 16-bit interpolation. We don't do that yet.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>
2020-03-16 16:09:02 +00:00
Timur Kristóf
61f2e8d9bb aco: Don't store TCS outputs to LDS when we're sure that none are read.
This allows us not to write an output to LDS, even if it has
an indirect offset.

No pipeline DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:11 +00:00
Timur Kristóf
9b36d8c23a aco: Only write TCS outputs to LDS when they are read by the TCS.
Note that tess factors are always read at the end of the shader,
so those are still always saved to LDS.

Totals from affected shaders:
VGPRS: 25244 -> 25164 (-0.32 %)
Code Size: 1768268 -> 1690804 (-4.38 %) bytes
Max Waves: 4947 -> 4953 (0.12 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:11 +00:00
Timur Kristóf
4dcca26945 aco: Store tess factors in VMEM only at the end of the shader.
This optimizes out several superfluous stores of the tess factors,
especially if the shader wrote those outputs multiple times.

Pipeline DB changes on GFX10:
Totals from affected shaders:
SGPRS: 30384 -> 29536 (-2.79 %)
Code Size: 2260720 -> 2214484 (-2.05 %) bytes

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:11 +00:00
Timur Kristóf
8c3ab49c6b aco: Don't generate an if when the first part of a merged HS or GS is empty.
In some cases (eg. in a few tessellation CTS tests) the VS part of
a merged HS is completely empty. Let's not generate a divergent if
in these cases. (LLVM also doesn't do it.)

No pipeline DB changes, only affects the CTS.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:11 +00:00
Timur Kristóf
cec6a856e5 aco: Enable running TES as ES, including merged TES+GS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
2020-03-11 08:34:11 +00:00