Commit graph

542 commits

Author SHA1 Message Date
Timur Kristóf
61280bb4b6 aco/ngg: Allocate NGG GS space early for const vertex/primitive counts.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
e8a0409d01 aco/ngg: Use more efficient LDS layout to help reduce bank conflicts.
The LLVM backend has a trick which helps reduce LDS bank conflicts
by swizzling the LDS address where each vertex is emitted.
This commit implements the same thing for ACO.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
dd73719856 aco/ngg: Add shader query support to NGG GS.
In each GS thread, we calculate the number of "real" primitives that
were emitted (points, lines, triangles, not strips). Then we
accumulate the number of "real" primitives emitted by the
entire threadgroup in GDS.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
df62c8fbea aco/ngg: Place workgroup barrier outside control flow for NGG GS.
Merged shaders have a workgroup barrier which makes sure that
the first half is completed in every wave before the 2nd half
is started.

This barrier is located in divergent control flow, so that waves
that don't have any invocations in the 2nd half can finish as early
as possible. This is problematic for NGG GS because it has more
workgroup barriers after the 2nd half.

So, for NGG GS we need to put the barrier outside
control flow because otherwise the waves that have 0 GS threads
won't be able to wait for the waves which have non-zero GS threads.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
1129575d5e aco/ngg: Implement NGG GS output.
We store emitted GS vertices in LDS.
Then, at the end of the shader, the emitted vertices are compacted
and each thread loads a single vertex from LDS in order to export
a primitive as needed, and the vertex attributes.

The reason this	is done is because there is an impedance mismatch
between	how API	GS and the NGG HW works. API GS can emit an arbitrary
number of vertices and primites	in each	thread,	but NGG	HW can only
export one vertex per thread.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
62b5012ec3 aco/ngg: Implement workgroup reduce / exclusive scan for NGG GS.
This function calculates two things at once:

1. The total number of vertices emitted by the threadgroup.
2. Exclusive scan of emitted vertex count accross the threadgroup.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
c29e288fb5 aco/ngg: Create LDS layout for NGG GS.
For NGG GS, we need to store the following in LDS:

1. The ESGS ring, similarly to legacy ESGS.
2. Emitted vertices from the GS threads.
3. Temporary space used by the workgroup scan.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
9c3d8404de aco/ngg: Allow NGG GS to create VS exports.
NGG GS need to use the same instructions to export vertex
attributes at the end.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
b67878f328 aco/ngg: Allow NGG GS to load per-vertex GS inputs.
They work the same way as in legacy GS, so we can reuse that.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
8f25d9f821 aco/ngg: Allow NGG GS to store ES outputs.
We can reuse the existing ES output code.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
b57b1a06e4 aco/ngg: Clean up and reorganize NGG VS/TES code.
Make the NGG VS/TES code easier to follow, give better names to
some functions and make ngg_nogs_early_prim_export a variable.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
3645a3106a aco/ngg: Make primitive export packing less prone to error.
Use lshl_or instead of lshl_add, which makes it more robust in
handling -1 and -2 indices which will now just become null
exports, which is what we want.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
0bfe0495c1 aco/ngg: Refactor ngg_emit_prim_export in preparation for NGG GS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
b08ced08a2 aco/ngg: Refactor gs_alloc_req in preparation for NGG GS.
Previously, this function inferred the vertex and primitive counts
from the gs_tg_info shader argument, but in case of NGG GS, it will
need to be calculated in runtime.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
57d8799284 aco: Optimize thread_id_in_threadgroup when there is just one wave.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
5e31fb49a3 aco: Use thread_id_in_threadgroup helper for ES outputs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
924f816fe1 aco: Extract thread_id_in_threadgroup to a separate function.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
b1964ad4d6 aco: Extract lanecount_to_mask to a separate function.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Rhys Perry
c1d11bb92c aco: Add loop creation helpers.
Will be useful for NGG GS and probably testing. The helpers take care of
divergence but not creating correct phis.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
2be99012e9 nir: Add ability to count emitted GS primitives.
Add an option to nir_lower_gs_intrinsics which tells it to track
the number of emitted primitives, not just vertices. Additionally,
also make it per-stream.

Also rename the set_vertex_count intrinsic to
set_vertex_and_primitive_count.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Tony Wasserka
5f7810dcb2 aco/isel: Fix out-of-bounds write in visit_load_input
Shaders may read out components past the attributes provided by the
application, so the read mask can indicate a larger component count than
were actually reserved in the array.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6728>
2020-10-07 19:50:01 +00:00
Samuel Pitoiset
3c5eb1f761 aco: more uses of nir_get_io_offset_src()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7003>
2020-10-07 13:31:36 +02:00
Samuel Pitoiset
1211d05bef aco: bail out if the NIR IO base offset isn't zero
nir_io_add_const_offset_to_base takes care of this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7003>
2020-10-07 13:31:25 +02:00
Jason Ekstrand
9750164c09 nir: Rename get_buffer_size to get_ssbo_size
This makes it explicit that this intrinsic is only for SSBOs.  For the
v3dv driver, we'll be adding a get_ubo_size intrinsic and we want to be
able to distinguish between the two.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6812>
2020-09-22 13:34:12 +00:00
Rhys Perry
f100cf0d30 aco: stop multiplying driver_location by 4
This didn't really serve any purpose, doesn't match how FS inputs are
currently done, and prevented us from using
nir_io_add_const_offset_to_base in the future.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6689>
2020-09-22 12:38:43 +00:00
Rhys Perry
fd872c3cf7 aco: remove dead indirect fs input loading
It's asserted that the visit_load_input code isn't reached. It also didn't
handle divergent indexing and this situation should have been lowered
anyway.

I think this used to be needed to pass a dEQP-VK.glsl.indexing.* test, but
it doesn't seem needed anymore.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6689>
2020-09-22 12:38:43 +00:00
Rhys Perry
7f51a0c670 aco: use nir's constant source helpers more
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6689>
2020-09-22 12:38:43 +00:00
Rhys Perry
430cc90071 aco: use nir_get_io_offset_src() in visit_load_input()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6689>
2020-09-22 12:38:43 +00:00
Rhys Perry
9bba79088d aco: use io semantics to get an intrinsic's slot
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6689>
2020-09-22 12:38:43 +00:00
Timur Kristóf
d58a1a87cc aco: Use NIR IO semantics for tess factor IO locations.
Previously we relied on looping over the NIR output variables
to remember the driver location of the tess factors, now use
the new NIR IO semantics instead.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6689>
2020-09-22 12:38:43 +00:00
Samuel Pitoiset
05b6612b4e radv: do not lower UBO/SSBO access to offsets
Use nir_lower_explicit_io instead of lowering to offsets. Extra
(useless) additions are removed by lowering load_vulkan_descriptor
to vec2(src.x, 0).

fossils-db (Navi):
Totals from 18236 (13.21% of 138013) affected shaders:
SGPRs: 1172766 -> 1168278 (-0.38%); split: -0.89%, +0.50%
VGPRs: 940156 -> 952232 (+1.28%); split: -0.08%, +1.37%
SpillSGPRs: 30286 -> 31109 (+2.72%); split: -0.78%, +3.50%
SpillVGPRs: 1893 -> 1909 (+0.85%)
CodeSize: 87910396 -> 88113592 (+0.23%); split: -0.35%, +0.58%
Scratch: 819200 -> 823296 (+0.50%)
MaxWaves: 205535 -> 202102 (-1.67%); split: +0.05%, -1.72%
Instrs: 17052527 -> 17113484 (+0.36%); split: -0.32%, +0.67%
Cycles: 670794876 -> 669084540 (-0.25%); split: -0.38%, +0.13%
VMEM: 5274728 -> 5388556 (+2.16%); split: +3.10%, -0.94%
SMEM: 1196146 -> 1165850 (-2.53%); split: +2.06%, -4.60%
VClause: 381463 -> 399217 (+4.65%); split: -1.08%, +5.73%
SClause: 666216 -> 631135 (-5.27%); split: -5.44%, +0.18%
Copies: 1292720 -> 1289318 (-0.26%); split: -1.28%, +1.01%
Branches: 467336 -> 473028 (+1.22%); split: -0.67%, +1.89%
PreSGPRs: 766459 -> 772175 (+0.75%); split: -0.53%, +1.28%
PreVGPRs: 819746 -> 825327 (+0.68%); split: -0.05%, +0.73%

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6202>
2020-09-21 15:37:11 +00:00
Rhys Perry
ec2185c598 aco: keep track of temporaries' regclasses in the Program
A future change will switch the liveness sets to bit vectors, which don't
contain regclass information.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6733>
2020-09-21 13:47:28 +00:00
Rhys Perry
2228835fb5 radv,aco: fix reading primitive ID in FS after TES
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3530
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6760>
2020-09-21 11:54:53 +00:00
Rhys Perry
4ac4cdb5bf aco: fix incorrect assertion in emit_vop3a_instruction()
Fixes some float controls tests on Polaris10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 0b6448bbe7
   ('aco/isel: refactor emit_vop3a_instruction() to handle 2 operand instructions')

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6744>
2020-09-17 09:52:22 +00:00
Timur Kristóf
26299c87f8 aco: Add base argument to emit_mbcnt.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6699>
2020-09-14 12:19:24 +00:00
Timur Kristóf
f3780e7b8c aco: Clean up emit_mbcnt.
Make it less error-prone and more consistent with other helpers.
Pass the masks as a single argument rather than two.
In wave64 mode, split the argument into low and high halves in
emit_mbcnt rather than where it is called.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6699>
2020-09-14 12:19:24 +00:00
Timur Kristóf
efa1c760d1 aco: Fix emit_boolean_exclusive_scan in wave32 mode.
Use the lane mask instead of s2 for the register class.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6699>
2020-09-14 12:19:24 +00:00
Rhys Perry
834b449a46 aco: fix value numbering of reductions
Non-ssa definitions caused an assertion in value numbering.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6662>
2020-09-09 15:00:45 +00:00
Tony Wasserka
fefeaeef06 aco/isel: Compile all helper functions with static linkage
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
2020-09-08 20:13:51 +00:00
Tony Wasserka
793dc668ea aco/isel: Move add_startpgm to aco_instruction_selection.cpp
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
2020-09-08 20:13:51 +00:00
Tony Wasserka
47de553283 aco/isel: Move context initialization code to a dedicated file
aco_instruction_selection_setup.cpp (previously used as a header) has
been split into a header and an implementation file. The latter "only"
implements init_context and setup_isel_context, but since these files
carry a long trail of helper functions, this cleans up the isel header
a lot.

Reduces library size by 3.1% due to more functions being compiled with
static linkage. Makes aco_instruction_selection.cpp compile 3% faster.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
2020-09-08 20:13:51 +00:00
Tony Wasserka
150de6358d aco/isel: Consistently use references for input parameters in emit_load
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
2020-09-08 20:13:51 +00:00
Tony Wasserka
dab0af0616 aco/isel: Simplify nested branching code
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
2020-09-08 20:13:51 +00:00
Tony Wasserka
757de68a43 aco/isel: Turn the function template emit_load into a proper function
Statically known values were encoded using template parameters previously,
causing specializations for each of the 5 sets of template arguments to be
generated. Since emit_load is not performance critical (the inner loop
never runs more often than twice), it's better for build time to use
runtime arguments everywhere.

Reduces build time of this file by 9% (17.3s -> 15.7s on my machine) and
reduces libaco's size by 2.6%.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
2020-09-08 20:13:51 +00:00
Daniel Schürmann
0b6448bbe7 aco/isel: refactor emit_vop3a_instruction() to handle 2 operand instructions
Only AC:O has been affected.

Totals from 4 (0.00% of 136546) affected shaders (RAVEN):
CodeSize: 16428 -> 16420 (-0.05%)
Instrs: 3294 -> 3292 (-0.06%)
Cycles: 14208 -> 14200 (-0.06%)
VMEM: 936 -> 978 (+4.49%)
VClause: 80 -> 77 (-3.75%)
Copies: 211 -> 209 (-0.95%)
PreVGPRs: 127 -> 126 (-0.79%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6635>
2020-09-08 16:20:44 +00:00
Daniel Schürmann
5b31056257 aco/isel: refactor code and remove unnecessary v_mov
Changes mainly due to avoided v_movs for fmin/fmax/fadd/fmul.

Totals from 12783 (9.36% of 136546) affected shaders (RAVEN):
SGPRs: 1097752 -> 1098264 (+0.05%); split: -0.09%, +0.14%
VGPRs: 856920 -> 850800 (-0.71%); split: -0.82%, +0.11%
SpillSGPRs: 49494 -> 49496 (+0.00%); split: -0.00%, +0.01%
CodeSize: 99997916 -> 99989948 (-0.01%); split: -0.04%, +0.03%
MaxWaves: 53895 -> 54448 (+1.03%)
Instrs: 19634960 -> 19632626 (-0.01%); split: -0.05%, +0.04%
Cycles: 1620601696 -> 1620900712 (+0.02%); split: -0.02%, +0.04%
VMEM: 3334181 -> 3299626 (-1.04%); split: +1.62%, -2.66%
SMEM: 865573 -> 865876 (+0.04%); split: +0.84%, -0.81%
VClause: 337100 -> 335224 (-0.56%); split: -0.88%, +0.32%
SClause: 696813 -> 697267 (+0.07%); split: -0.14%, +0.21%
Copies: 1549897 -> 1548023 (-0.12%); split: -0.52%, +0.40%
Branches: 682118 -> 682108 (-0.00%); split: -0.01%, +0.00%
PreSGPRs: 893524 -> 895129 (+0.18%); split: -0.00%, +0.18%
PreVGPRs: 790180 -> 783036 (-0.90%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6635>
2020-09-08 16:20:44 +00:00
Rhys Perry
6049dc1a9d aco: improve fsign selection
Idea from https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284

fossil-db (Navi):
Totals from 4053 (2.95% of 137413) affected shaders:
SGPRs: 305810 -> 305906 (+0.03%); split: -0.01%, +0.04%
VGPRs: 249000 -> 249144 (+0.06%); split: -0.01%, +0.07%
CodeSize: 29967092 -> 29885768 (-0.27%); split: -0.27%, +0.00%
Instrs: 5749494 -> 5737971 (-0.20%); split: -0.20%, +0.00%
Cycles: 255028584 -> 254955444 (-0.03%); split: -0.04%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6583>
2020-09-08 12:17:43 +00:00
Samuel Pitoiset
73eb24ab31 aco: handle unaligned loads on GFX10.3
Same as GFX10.

Cc: 20.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6594>
2020-09-04 13:19:45 +00:00
Rhys Perry
8faf85f687 aco: fix byte_align_scalar for 3 dword vectors
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: fe08f0ccf9
   ('aco: add byte_align_scalar() & trim_subdword_vector() helper functions')

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4710>
2020-09-04 13:03:50 +00:00
Samuel Pitoiset
8076c7596d aco: fix wrong source position for constant with nir_op_cube_face_coord
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6480>
2020-08-28 08:03:55 +02:00