Commit graph

7753 commits

Author SHA1 Message Date
Rhys Perry
7d95f7510f aco/tests: test copy propagation with DPP instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12601>
2021-08-31 16:58:20 +00:00
Rhys Perry
e27946ca11 aco: don't constant propagate to DPP instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12601>
2021-08-31 16:58:20 +00:00
Samuel Pitoiset
640f15edd7 radv/llvm: fix invalid IR when converting triangle strips to indices
Operand 0 of LLVMBuildSelect() should be i1.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12616>
2021-08-31 09:56:27 +02:00
Samuel Pitoiset
72b6b02e09 ac/llvm: fix huge alignment when loading from shared memory
LLVM doesn't support huge alignments, also it can optimize the shared
loads, so it's unecessary to emit better (but broken) LLVM IR.

Fixes a bunch of crashes with RADV_DEBUG=llvm,checkir.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12616>
2021-08-31 09:56:27 +02:00
Samuel Pitoiset
e72c2e36e0 ac/llvm: adjust assertion for nir_intrinsic_terminate
Fixes dEQP-VK.spirv_assembly.instruction.terminate*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12616>
2021-08-31 09:56:27 +02:00
Mike Blumenkrantz
66fa676e8d radv: ignore dynamic line stipple if line stipple isn't enabled
==244108== Conditional jump or move depends on uninitialised value(s)
==244108==    at 0x48498D5: bcmp (vg_replace_strmem.c:1129)
==244108==    by 0x1C37B7DD: radv_bind_dynamic_state (radv_cmd_buffer.c:237)
==244108==    by 0x1C388027: radv_CmdBindPipeline (radv_cmd_buffer.c:4794)
==244108==    by 0x14E9C01E: bool update_gfx_pipeline<true>(zink_context*, zink_batch_state*, pipe_prim_type) (zink_draw.cpp:406)
==244108==    by 0x14E9AAB9: void zink_draw_vbo<(zink_multidraw)1, (zink_dynamic_state)1, (zink_dynamic_state2)1, (zink_dynamic_vertex_input)1, true>(pipe_cont>
==244108==    by 0x14B017EB: tc_call_draw_single (u_threaded_context.c:3033)
==244108==    by 0x14AF9C0E: tc_batch_execute (u_threaded_context.c:190)
==244108==    by 0x14AFA24F: _tc_sync (u_threaded_context.c:341)
==244108==    by 0x14B006E7: tc_texture_subdata (u_threaded_context.c:2549)
==244108==    by 0x14238F8C: st_TexSubImage (st_cb_texture.c:2134)
==244108==    by 0x14239931: st_TexImage (st_cb_texture.c:2363)
==244108==    by 0x1453698A: teximage (teximage.c:3154)
==244108==    by 0x1453698A: teximage_err (teximage.c:3181)
==244108==    by 0x145388BD: _mesa_TexImage2D (teximage.c:3252)
==244108==    by 0x5E88D4: ??? (in /home/zmike/src/piglit/tesseract/bin_unix/linux_64_client)
==244108==    by 0x5E9527: ??? (in /home/zmike/src/piglit/tesseract/bin_unix/linux_64_client)
==244108==    by 0x5E9B72: ??? (in /home/zmike/src/piglit/tesseract/bin_unix/linux_64_client)
==244108==    by 0x5F1092: ??? (in /home/zmike/src/piglit/tesseract/bin_unix/linux_64_client)
==244108==    by 0x5F10AC: ??? (in /home/zmike/src/piglit/tesseract/bin_unix/linux_64_client)
==244108==    by 0x48CC66: ??? (in /home/zmike/src/piglit/tesseract/bin_unix/linux_64_client)
==244108==    by 0x48DDC7: ??? (in /home/zmike/src/piglit/tesseract/bin_unix/linux_64_client)
==244108==    by 0x40D525: ??? (in /home/zmike/src/piglit/tesseract/bin_unix/linux_64_client)
==244108==    by 0x4FF7B74: (below main) (in /usr/lib64/libc-2.33.so)
==244108==  Uninitialised value was created by a stack allocation
==244108==    at 0x14ECDF55: zink_create_gfx_pipeline (zink_pipeline.c:53)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12618>
2021-08-30 19:24:29 +00:00
Mike Blumenkrantz
90a0556c27 radv: use pool stride when copying single query results
the specified stride is irrelevant for this case since there's only one
result to write

Cc: mesa-stable

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12619>
2021-08-30 19:02:40 +00:00
Samuel Pitoiset
906f7f4296 radv: advertise VK_EXT_primitive_topology_list_restart
Everything should be already supported, except patch list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12611>
2021-08-30 18:39:20 +00:00
Samuel Pitoiset
2d1b85fe22 radv: add support for clearing multi layers with normal gfx clear path
Allow to clear range of layers with vkCmdClear{Color,DepthStencil}Image().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12557>
2021-08-30 16:19:29 +00:00
Rhys Perry
9df9fe7dfa aco: include utility in isel
For std::exchange().

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: c1d11bb92c ("aco: Add loop creation helpers.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5301
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12614>
2021-08-30 14:28:00 +00:00
Timur Kristóf
76b9dd6266 aco: Unset 16 and 24-bit flags from operands in apply_extract.
Consider the following sequence in a shader:
b = p_extract a
c = v_mad_u32_u16 b, X, 0

The optimizer applies extract, resulting in:
c = v_mad_u32_u16 a, X, 0 (correct)

Then it mistakenly turns that into:
c = v_mul_u32_u24 a, X, 0 (incorrect)

In this case, the p_extract is applied to v_mad_u32_u16 by
apply_extract. After this, we can no longer be sure that
the operands are still 16 or 24-bit, so we have to remove
this flag.

No Fossil DB changes.

Fixes: 54292e99c7
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
2021-08-30 14:05:33 +00:00
Samuel Pitoiset
23ef0fb277 radv: do not allocate a clear value for images that support comp-to-single
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12565>
2021-08-30 07:18:19 +00:00
Samuel Pitoiset
df688e6941 radv: do not load/store the clear value for comp-to-single images
Images that are fast cleared with the comp-to-single mode clears DCC
to 0x10 which tells the hardware to get the clear value from the
main surface instead of the reg.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12565>
2021-08-30 07:18:19 +00:00
Samuel Pitoiset
0c550a5fe6 radv: disable DCC image stores on Navi12-14 for displayable DCC corruption
DCC image stores require 128B but 64B is used for displayable DCC.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5265
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5106
Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12521>
2021-08-30 08:28:37 +02:00
Timur Kristóf
cfb0d931f2 aco: Emit zero for the derivatives of uniforms.
Observed in a shader from Resident Evil Village.
This also helps prevent emitting invalid IR.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12599>
2021-08-27 20:34:22 +00:00
Daniel Schürmann
2eeaaabb8e aco/optimizer: combine v_pk_mul_u16 + v_pk_add_u16 -> v_pk_mad_u16
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Daniel Schürmann
be16ebc5ca aco/optimizer: fuse v_mul_f64 + v_add_f64 -> v_fma_f64
No fossil-db changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Daniel Schürmann
8e27ca9953 aco/optimizer: combine v_mul_lo_u16 + v_add_u16 -> v_mad_u16
Totals from 192 (0.13% of 150170) affected shaders: (GFX10.3)
CodeSize: 1027224 -> 1019872 (-0.72%)
Instrs: 174784 -> 173863 (-0.53%)
Latency: 4235742 -> 4232177 (-0.08%); split: -0.11%, +0.03%
InvThroughput: 1777026 -> 1775945 (-0.06%); split: -0.09%, +0.03%
Copies: 34098 -> 34099 (+0.00%); split: -0.03%, +0.03%
PreVGPRs: 4920 -> 4850 (-1.42%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Daniel Schürmann
23d5865f42 aco: refactor nir_op_imul selection
Previously, the optimization to use v_mul_lo_u16 for
32bit multiplications was done in instruction_selection.
This was moved to the optimizer to ease some case distinctions.

The mixed results are due to increased use of SDWA.

Totals from 2616 (1.74% of 150170) affected shaders: (GFX10.3)
VGPRs: 143888 -> 143872 (-0.01%); split: -0.02%, +0.01%
CodeSize: 5604032 -> 5604080 (+0.00%); split: -0.01%, +0.01%
Instrs: 1086798 -> 1083915 (-0.27%); split: -0.27%, +0.01%
Latency: 8215793 -> 8213023 (-0.03%); split: -0.10%, +0.07%
InvThroughput: 20765157 -> 20773766 (+0.04%); split: -0.02%, +0.06%
VClause: 35256 -> 35260 (+0.01%); split: -0.02%, +0.03%
SClause: 29021 -> 29024 (+0.01%); split: -0.00%, +0.01%
Copies: 74163 -> 74306 (+0.19%); split: -0.05%, +0.24%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Daniel Schürmann
d8eef134d8 aco: only apply extract if not used more than 4 times
Totals from 61 (0.04% of 150170) affected shaders: (GFX10.3)
CodeSize: 1087732 -> 1087380 (-0.03%); split: -0.22%, +0.18%
Instrs: 192343 -> 192205 (-0.07%); split: -0.16%, +0.09%
Latency: 7231670 -> 7148073 (-1.16%); split: -1.19%, +0.04%
InvThroughput: 3436715 -> 3394926 (-1.22%); split: -1.25%, +0.04%
VClause: 4831 -> 4833 (+0.04%)
Copies: 50130 -> 49934 (-0.39%); split: -0.67%, +0.28%
Branches: 5945 -> 5948 (+0.05%)
PreSGPRs: 3486 -> 3472 (-0.40%)
PreVGPRs: 5154 -> 5152 (-0.04%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Timur Kristóf
589ccf3d77 aco: Consider maximum number of workgroups per CU/WGP on Navi.
No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12517>
2021-08-27 16:41:08 +00:00
Timur Kristóf
c8698199a1 aco: Consider LDS usage by PS inputs in MaxWaves calculation.
Before PS waves are launched, PS inputs are moved from PC to LDS
and the corresponding part of the PC is deallocated.
Each PS input occupies 3 * vec4 (3 * 16 = 48 bytes) of LDS space.
See Figure 10.3 in the GCN3 ISA manual.

These limit occupancy the same way as other stages' LDS usage does.
Note that PS can request additional LDS space via EXTRA_LDS_SIZE,
so that also must be taken into account here.

No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12517>
2021-08-27 16:41:08 +00:00
Samuel Pitoiset
d90a8c79df radv: remove unecessary radv_finishme() for invalid color formats
Something really bad happen (likely driver bug) if this is triggered.
Replace with some assertions to catch an eventual issue in debug build.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12556>
2021-08-27 07:29:17 +00:00
Samuel Pitoiset
df90bb3f88 radv: remove useless check about number of samples in the HW resolve path
Although this can likely hang, this is invalid and should be caught
by the validation layers. There is many ways to hang the GPU with VK,
this check alone is useless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12556>
2021-08-27 07:29:17 +00:00
Samuel Pitoiset
b05c2023cc radv: remove outdated radv_finishme() in the HW resolve path
Resolving layered MSAA images is actually implemented by the HW
resolve path but never used because the driver uses the compute path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12556>
2021-08-27 07:29:17 +00:00
Timur Kristóf
626b125857 aco: Use workgroup size from input shader info.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12321>
2021-08-26 09:46:18 +00:00
Timur Kristóf
c4ca08548b radv: Remove superfluous workgroup size calculations.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12321>
2021-08-26 09:46:18 +00:00
Timur Kristóf
9fd36bbacd radv: Calculate workgroup sizes in radv_pipeline.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12321>
2021-08-26 09:46:18 +00:00
Timur Kristóf
395c0c52c7 ac: Calculate workgroup sizes of HW stages that operate in workgroups.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12321>
2021-08-26 09:46:18 +00:00
Samuel Pitoiset
66b5f05727 ci: update the list of skipped tests for Fiji/RADV
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12553>
2021-08-26 09:09:47 +00:00
Timur Kristóf
5b7446d74c radv, ac, aco: Use indices 0-2 of gs_vtx_offset argument array on GFX9+.
Previously, indices 0, 2, 4 were used.
This worked, but it was somewhat unintuitive.
This commit changes it to use indices 0, 1, 2 instead, which
makes the code easier to understand.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12511>
2021-08-26 05:20:15 +00:00
Samuel Pitoiset
0f05c84bba radv: allow storage images with VK_FORMAT_E5B9G9R9_UFLOAT_PACK32 on GFX10.3+
It should be supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12543>
2021-08-25 16:27:46 +00:00
Daniel Schürmann
cd489e5388 aco: remove redundant s_and exec after nir_op_inot
Totals from 22585 (15.04% of 150170) affected shaders: (GFX10.3)
VGPRs: 1474048 -> 1473904 (-0.01%)
CodeSize: 155238876 -> 155187688 (-0.03%); split: -0.06%, +0.03%
MaxWaves: 385086 -> 385122 (+0.01%)
Instrs: 29297735 -> 29284442 (-0.05%); split: -0.08%, +0.04%
Latency: 675841742 -> 675764151 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 174859037 -> 174854796 (-0.00%); split: -0.01%, +0.01%
VClause: 479790 -> 479781 (-0.00%); split: -0.01%, +0.00%
SClause: 1106900 -> 1106615 (-0.03%); split: -0.03%, +0.01%
Copies: 1829037 -> 1828042 (-0.05%); split: -0.09%, +0.03%
Branches: 859971 -> 859967 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 1341850 -> 1342356 (+0.04%); split: -0.01%, +0.04%
PreVGPRs: 1327322 -> 1327034 (-0.02%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11573>
2021-08-25 12:43:50 +00:00
Timur Kristóf
abcc83e713 aco: Fix to_uniform_bool_instr when operands are not suitable.
Don't attempt to transform uniform boolean instructions when
their operands are unsuitable. This can happen eg. due to other
optimizations that combine SALU instructions which clear out
the uniform instruction labels.

Cc: mesa-stable
Fixes: 8a32f57fff
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11573>
2021-08-25 12:43:50 +00:00
Samuel Pitoiset
e0a703af11 ci: update the list of expected failures/skips for RADV
Against CTS 1.2.7.0.
Tested chips are Pitcairn, Polaris10, Navi14 and Sienna Cichlid.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12539>
2021-08-25 13:00:07 +02:00
Daniel Schürmann
a3110c308f radv: call nir_lower_flrp() after the first radv_optimize_nir()
instead of inside the optimization loop

Totals from 2504 (1.67% of 150170) affected shaders: (GFX10.3)
VGPRs: 162592 -> 162416 (-0.11%); split: -0.12%, +0.01%
CodeSize: 18399756 -> 18383552 (-0.09%); split: -0.10%, +0.01%
MaxWaves: 42654 -> 42748 (+0.22%)
Instrs: 3499404 -> 3497075 (-0.07%); split: -0.08%, +0.01%
Latency: 87087238 -> 87064270 (-0.03%); split: -0.06%, +0.03%
InvThroughput: 21159621 -> 21150546 (-0.04%); split: -0.05%, +0.01%
VClause: 56653 -> 56667 (+0.02%); split: -0.00%, +0.03%
Copies: 226332 -> 226423 (+0.04%); split: -0.15%, +0.19%
Branches: 110027 -> 110025 (-0.00%); split: -0.05%, +0.04%
PreSGPRs: 168087 -> 168076 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 160814 -> 160705 (-0.07%)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>
2021-08-24 16:10:30 +00:00
Rhys Perry
94ed2ab3a1 radv: use nir_vector_insert_imm in lower_intrinsics
This creates a single nir_op_vecn instead of a nir_op_vecn and several
copies.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12469>
2021-08-24 10:35:19 +00:00
Rhys Perry
b23a9dd1f6 aco/scheduler: allow moving down VMEM stores to below VMEM loads
fossil-db (Vega10):
Totals from 93 (0.06% of 150305) affected shaders:
SGPRs: 4832 -> 4768 (-1.32%)
VGPRs: 4084 -> 4144 (+1.47%)
CodeSize: 316080 -> 317208 (+0.36%); split: -0.11%, +0.47%
MaxWaves: 589 -> 580 (-1.53%)
Instrs: 60229 -> 60511 (+0.47%); split: -0.15%, +0.61%
Latency: 636477 -> 540029 (-15.15%); split: -15.26%, +0.10%
InvThroughput: 293027 -> 283043 (-3.41%); split: -4.21%, +0.80%
VClause: 2557 -> 2716 (+6.22%); split: -0.35%, +6.57%
SClause: 1381 -> 1395 (+1.01%); split: -0.14%, +1.16%
Copies: 9424 -> 9728 (+3.23%); split: -0.74%, +3.97%

fossil-db (Sienna Cichlid):
Totals from 88 (0.06% of 150170) affected shaders:
VGPRs: 3840 -> 3872 (+0.83%)
CodeSize: 300544 -> 300960 (+0.14%); split: -0.09%, +0.23%
Instrs: 53714 -> 53871 (+0.29%); split: -0.05%, +0.35%
Latency: 489854 -> 462001 (-5.69%); split: -6.30%, +0.61%
InvThroughput: 100307 -> 95142 (-5.15%); split: -5.50%, +0.35%
VClause: 2322 -> 2564 (+10.42%); split: -0.39%, +10.81%
SClause: 1345 -> 1358 (+0.97%); split: -0.15%, +1.12%
Copies: 4113 -> 4351 (+5.79%); split: -0.66%, +6.44%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12211>
2021-08-23 16:48:31 +00:00
Rhys Perry
2201f5a58c aco: remove label_extract if the extract is used by a non-VALU
If an extract is used by a non-VALU instruction, it can't be applied to
all instructions, so it's not beneficial to try to apply it.

This check isn't needed because can_apply_extract()/can_use_SDWA() should
already handle non-VALU instructions.

fossil-db (Sienna Cichlid):
Totals from 1020 (0.68% of 150170) affected shaders:
SpillSGPRs: 1577 -> 1571 (-0.38%)
CodeSize: 7863668 -> 7858336 (-0.07%); split: -0.07%, +0.00%
Instrs: 1431583 -> 1431083 (-0.03%); split: -0.04%, +0.01%
Latency: 25891250 -> 25890916 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 7248683 -> 7248655 (-0.00%); split: -0.01%, +0.01%
SClause: 49072 -> 49071 (-0.00%)
Copies: 126649 -> 126580 (-0.05%); split: -0.11%, +0.06%
Branches: 39129 -> 39120 (-0.02%); split: -0.03%, +0.01%
PreSGPRs: 53071 -> 52943 (-0.24%); split: -0.26%, +0.02%
PreVGPRs: 57437 -> 57435 (-0.00%); split: -0.01%, +0.01%

fossil-db (Polaris10):
Totals from 654 (0.43% of 151696) affected shaders:
CodeSize: 5814552 -> 5811568 (-0.05%); split: -0.05%, +0.00%
Instrs: 1105783 -> 1105049 (-0.07%); split: -0.07%, +0.00%
Latency: 20261458 -> 20259744 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 9011785 -> 9011749 (-0.00%); split: -0.00%, +0.00%
Copies: 104693 -> 103904 (-0.75%); split: -0.76%, +0.00%
PreSGPRs: 36105 -> 36095 (-0.03%); split: -0.03%, +0.01%
PreVGPRs: 43813 -> 43809 (-0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12212>
2021-08-23 14:56:37 +01:00
Samuel Pitoiset
e0353296da radv: allocate shaders to 32-bit address to skip PGM_HI
This reduces the number of emitted registers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12466>
2021-08-23 11:28:21 +00:00
Samuel Pitoiset
2dc90ca8a4 radv: don't use SQ_NON_EVENT before GE_PC_ALLOC for better perf on Navi1x
Seems it make the perf worse.
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12466>
2021-08-23 11:28:21 +00:00
Daniel Schürmann
77ffdf41b1 aco: add more validation rules for SDWA operands
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12364>
2021-08-23 10:31:40 +00:00
Daniel Schürmann
077776a866 aco/opcodes: remove definition_size[]
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12364>
2021-08-23 10:31:40 +00:00
Daniel Schürmann
f6b281a1c2 aco/validate: simplify get_subdword_bytes_written()
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12364>
2021-08-23 10:31:40 +00:00
Daniel Schürmann
ec1bbfa608 aco/ra: refactor subdword operand stride
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12364>
2021-08-23 10:31:40 +00:00
Daniel Schürmann
c75138ed64 aco/ra: refactor subdword definition info
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12364>
2021-08-23 10:31:40 +00:00
Daniel Schürmann
e11b23f7cd aco: add instr_is_16bit() helper function
to indicate whether some instruction writes partial registers, only.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12364>
2021-08-23 10:31:40 +00:00
Daniel Schürmann
3d6ca41e44 aco: use VOPC_SDWA on GFX9+
Totals from 5138 (3.42% of 150170) affected shaders: (GFX10.3)
VGPRs: 409520 -> 409416 (-0.03%); split: -0.03%, +0.00%
CodeSize: 43056360 -> 43035696 (-0.05%); split: -0.06%, +0.02%
MaxWaves: 69296 -> 69310 (+0.02%)
Instrs: 8161016 -> 8153365 (-0.09%); split: -0.10%, +0.01%
Latency: 109397002 -> 109756208 (+0.33%); split: -0.05%, +0.38%
InvThroughput: 23238920 -> 23310761 (+0.31%); split: -0.11%, +0.42%
VClause: 135141 -> 135100 (-0.03%); split: -0.05%, +0.02%
SClause: 349511 -> 349489 (-0.01%); split: -0.01%, +0.00%
Copies: 388107 -> 387754 (-0.09%); split: -0.48%, +0.38%
Branches: 184629 -> 184503 (-0.07%); split: -0.08%, +0.01%
PreSGPRs: 258807 -> 258839 (+0.01%)
PreVGPRs: 372561 -> 372184 (-0.10%); split: -0.10%, +0.00%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12364>
2021-08-23 10:31:40 +00:00
Daniel Schürmann
60e171af06 aco/print_ir: fix printing of VOPC_SDWA definitions
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12364>
2021-08-23 10:31:40 +00:00
Rhys Perry
8852c5448d aco: fix vectorized 16-bit load_input/load_interpolated_input
Seems we haven't encountered this before because
nir_lower_io_to_scalar_early usually scalarizes this.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12486>
2021-08-23 10:11:36 +00:00