Commit graph

620 commits

Author SHA1 Message Date
Rhys Perry
2a7fa132be aco: implement udot_4x8/sdot_4x8/udot_2x16/sdot_2x16 opcodes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:28 +00:00
Rhys Perry
e0d232c2fc aco: implement nir_op_pack_32_4x8
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:28 +00:00
Rhys Perry
4dd420f76d radv,aco: implement iadd_sat
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:28 +00:00
Daniel Schürmann
0988f7b9ba aco: remove explicit dst_preserve flag
Instead, we can rely on the fact that subdword definitions
must preserve the unused bits while dword definitions either
pad or sign-extend.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>
2021-09-02 20:39:17 +02:00
Daniel Schürmann
9e3ff06c38 aco: rewrite SDWA selector
This commit introduces a new struct SubdwordSel
in order to ease and clean up the usage of SDWA
selections. This includes removing the distinction
between register-allocated and fixed SDWA selections.
Instead, SDWA selections can now also access the high
bits of subdword variables. Alignment and sizes are
validated accordingly. Size, offset and sign_extend
can be evaluated via helper methods.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>
2021-09-02 20:39:17 +02:00
Rhys Perry
9df9fe7dfa aco: include utility in isel
For std::exchange().

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: c1d11bb92c ("aco: Add loop creation helpers.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5301
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12614>
2021-08-30 14:28:00 +00:00
Timur Kristóf
cfb0d931f2 aco: Emit zero for the derivatives of uniforms.
Observed in a shader from Resident Evil Village.
This also helps prevent emitting invalid IR.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12599>
2021-08-27 20:34:22 +00:00
Daniel Schürmann
23d5865f42 aco: refactor nir_op_imul selection
Previously, the optimization to use v_mul_lo_u16 for
32bit multiplications was done in instruction_selection.
This was moved to the optimizer to ease some case distinctions.

The mixed results are due to increased use of SDWA.

Totals from 2616 (1.74% of 150170) affected shaders: (GFX10.3)
VGPRs: 143888 -> 143872 (-0.01%); split: -0.02%, +0.01%
CodeSize: 5604032 -> 5604080 (+0.00%); split: -0.01%, +0.01%
Instrs: 1086798 -> 1083915 (-0.27%); split: -0.27%, +0.01%
Latency: 8215793 -> 8213023 (-0.03%); split: -0.10%, +0.07%
InvThroughput: 20765157 -> 20773766 (+0.04%); split: -0.02%, +0.06%
VClause: 35256 -> 35260 (+0.01%); split: -0.02%, +0.03%
SClause: 29021 -> 29024 (+0.01%); split: -0.00%, +0.01%
Copies: 74163 -> 74306 (+0.19%); split: -0.05%, +0.24%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Timur Kristóf
5b7446d74c radv, ac, aco: Use indices 0-2 of gs_vtx_offset argument array on GFX9+.
Previously, indices 0, 2, 4 were used.
This worked, but it was somewhat unintuitive.
This commit changes it to use indices 0, 1, 2 instead, which
makes the code easier to understand.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12511>
2021-08-26 05:20:15 +00:00
Daniel Schürmann
cd489e5388 aco: remove redundant s_and exec after nir_op_inot
Totals from 22585 (15.04% of 150170) affected shaders: (GFX10.3)
VGPRs: 1474048 -> 1473904 (-0.01%)
CodeSize: 155238876 -> 155187688 (-0.03%); split: -0.06%, +0.03%
MaxWaves: 385086 -> 385122 (+0.01%)
Instrs: 29297735 -> 29284442 (-0.05%); split: -0.08%, +0.04%
Latency: 675841742 -> 675764151 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 174859037 -> 174854796 (-0.00%); split: -0.01%, +0.01%
VClause: 479790 -> 479781 (-0.00%); split: -0.01%, +0.00%
SClause: 1106900 -> 1106615 (-0.03%); split: -0.03%, +0.01%
Copies: 1829037 -> 1828042 (-0.05%); split: -0.09%, +0.03%
Branches: 859971 -> 859967 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 1341850 -> 1342356 (+0.04%); split: -0.01%, +0.04%
PreVGPRs: 1327322 -> 1327034 (-0.02%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11573>
2021-08-25 12:43:50 +00:00
Rhys Perry
8852c5448d aco: fix vectorized 16-bit load_input/load_interpolated_input
Seems we haven't encountered this before because
nir_lower_io_to_scalar_early usually scalarizes this.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12486>
2021-08-23 10:11:36 +00:00
Timur Kristóf
448592b9ae aco: Use Navi 10 empty NGG output workaround on NGG culling shaders.
Navi 10 can hang when an NGG workgroup has no output,
so we work around that by always exporting a single zero-area
triangle with a single vertex that has all-NaN coordinates.

Thus far, we only employed this for NGG GS, because on all
other stages, the output can't be empty.

However, with NGG culling, the output can be empty, so let's
apply the same workaround there too.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12169>
2021-08-04 12:28:34 +00:00
Rhys Perry
566970f273 aco: use image_dim and image_array intrinsic indices
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12190>
2021-08-04 12:09:07 +00:00
Timur Kristóf
da9f4b2e67 nir, aco: Remove vertex and primitive count overwrite intrinsic.
It's no longer needed.

No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Timur Kristóf
1bbea90f50 aco, nir, ac: Simplify sequence of getting initial NGG VS edge flags.
Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags
at once of every vertex. This takes fewer VALU instructions than
previously.

Fossil DB results on Sienna Cichlid (with NGGC on):

Totals from 56917 (44.24% of 128647) affected shaders:
CodeSize: 161028288 -> 158751628 (-1.41%)
Instrs: 30917985 -> 30519571 (-1.29%)
Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01%
InvThroughput: 21280238 -> 20927401 (-1.66%)
Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00%

No Fossil DB changed with NGGC off.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Samuel Pitoiset
6694c37ea0 aco: implement VK_EXT_shader_atomic_float2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12060>
2021-07-27 08:44:31 +02:00
Jason Ekstrand
e83fe65cd8 radv,radeonsi: Do cube size divide-by-6 lowering in NIR
No point in carrying all this code around twice each in two back-ends.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12005>
2021-07-22 14:22:35 -05:00
Timur Kristóf
55d57b828f aco: Fix how p_elect interacts with optimizations.
Since p_elect doesn't have any operands, ACO's value numbering and/or
the pre-RA optimizer could currently recognize two p_elect instructions
in two different blocks as the same.

This patch adds exec as an operand to p_elect in order to achieve
correct behavior.

Fixes: e66f54e5c8
Closes: #5080
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11943>
2021-07-18 00:48:06 +02:00
Timur Kristóf
e66f54e5c8 aco: Allow elect to take advantage of knowing when all lanes are active.
Implement elect using a pseudo-op which is lowered during the
insert_exec_mask pass. This makes it possible to emit a more
optimal sequence when the exec mask is constant.

Fossil DB results on Sienna Cichlid:
Totals from 211 (0.16% of 128647) affected shaders:
CodeSize: 2254356 -> 2240468 (-0.62%); split: -0.62%, +0.00%
Instrs: 438471 -> 434996 (-0.79%); split: -0.80%, +0.01%
Latency: 2717082 -> 2709400 (-0.28%); split: -0.28%, +0.00%
InvThroughput: 566987 -> 566342 (-0.11%); split: -0.11%, +0.00%
Copies: 40058 -> 40162 (+0.26%)
Branches: 31209 -> 31211 (+0.01%)
PreSGPRs: 9927 -> 10125 (+1.99%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>
2021-07-16 14:31:54 +00:00
Timur Kristóf
b12318f26c aco: Swap s_and operand order for ballot.
This allows our optimizer to recognize this and eliminate it when
it can prove that the s_and with exec is unneeded.

Fossil DB changes on Sienna Cichlid:
Totals from 1969 (1.53% of 128647) affected shaders:
CodeSize: 9468228 -> 9469348 (+0.01%); split: -0.00%, +0.01%
Instrs: 1773566 -> 1773581 (+0.00%); split: -0.01%, +0.01%
Latency: 19504042 -> 19503385 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 3617406 -> 3617333 (-0.00%)
Copies: 108998 -> 110592 (+1.46%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>
2021-07-16 14:31:54 +00:00
Daniel Schürmann
114d38e57d aco/isel: avoid unnecessary calls to nir_unsigned_upper_bound()
These were responsible for ~20% of the time
spent in instruction selection.
Reduces overall compile times by ~0.5%.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11879>
2021-07-14 18:10:40 +02:00
Timur Kristóf
182d9b1e60 aco: Implement NGG culling related intrinsics.
These are very straightforward as they just copy data from
the newly added shader arguments.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
2021-07-13 23:56:33 +00:00
Tony Wasserka
cfd866ed42 aco: Clean up unneeded literal casts
These were only needed to select the appropriate Operand constructor before.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>
2021-07-13 17:43:26 +00:00
Tony Wasserka
66e51dc474 aco: Remove use of deprecated Operand constructors
This migration was done with libclang-based automatic tooling, which
performed these replacements:
* Operand(uint8_t) -> Operand::c8
* Operand(uint16_t) -> Operand::c16
* Operand(uint32_t, false) -> Operand::c32
* Operand(uint32_t, bool) -> Operand::c32_or_c64
* Operand(uint64_t) -> Operand::c64
* Operand(0) -> Operand::zero(num_bytes)

Casts that were previously used for constructor selection have automatically
been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)).

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>
2021-07-13 17:43:26 +00:00
Daniel Schürmann
b97cd93b35 aco: fix extract_vector optimization
If the allocated_vec map contains a different RegType
for the elements, ensure that the size matches exactly.

Otherwise, it could happen that extracting a dword
element matched with a subdword element.

No fossil-db changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11823>
2021-07-13 09:14:43 +02:00
Daniel Schürmann
1e2639026f aco: Format.
Manually adjusted some comments for more intuitive line breaks.

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11258>
2021-07-12 21:27:31 +00:00
Samuel Pitoiset
ee79b87c62 radv: lower primitive shading rate in NIR
This allows more potential compiler optimizations if the value is a
constant or from a scalar load.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11579>
2021-07-12 17:54:07 +00:00
Daniel Schürmann
0eea0e55ad aco: add 'common/' and 'llvm/' prefix to #includes
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>
2021-07-12 12:09:31 +00:00
Daniel Schürmann
59fdaa1985 aco: reorder and cleanup #includes
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>
2021-07-12 12:09:31 +00:00
Samuel Pitoiset
543eb42c35 aco: use nir_ssa_def_is_unused() to determine if atomic dest is used
Instead of duplicating this chunk everywhere.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11793>
2021-07-09 12:12:57 +00:00
Samuel Pitoiset
74a221bcfd aco: fix shared_atomic_comp_swap if the second source isn't a VGPR
Only VGPRs are valid with DS instructions.

Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11777>
2021-07-08 10:41:14 +00:00
Rhys Perry
a9c4a31d8d aco: handle NIR loops without breaks
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11626>
2021-07-01 10:01:52 +00:00
Rhys Perry
c094765a01 aco: remove resource flags
After disabling SMEM stores, nir_opt_access() now does the same analysis
and we don't need this anymore. Doing it in isel is also too late if we
want to lower descriptor loads in NIR.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11652>
2021-06-30 19:07:12 +01:00
Timur Kristóf
e6bf5cfe59 aco/gfx10: Emit barrier at the start of NGG VS and TES.
The Navi 1x NGG hardware can hang in certain conditions when
not every wave launched before s_sendmsg(GS_ALLOC_REQ).

As a workaround, to ensure this never happens, let's emit a
workgroup barrier at the beginning of NGG VS and TES.
Note that NGG GS already has a workgroup barrier so it doesn't
need this.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10837>
2021-06-22 14:32:27 +00:00
Timur Kristóf
f9447abb36 aco/gfx10: NGG zero output workaround for conservative rasterization.
Navi 1x GPUs have an issue: they can hang when the output vertex
and primitive counts are zero. The workaround is exporting a dummy
triangle.

This commit changes the dummy triangle's vertex so its positions
are all NaN. This should make sure the triangle is never rendered.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10837>
2021-06-22 14:32:27 +00:00
Jason Ekstrand
f0f713960b nir,amd: Suffix nir_op_cube_face_coord/index with _amd
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>
2021-06-21 09:03:34 -05:00
Rhys Perry
1d50ef9ca6 aco: adjust the condition for expanding vertex fetch data format
Instead of avoiding out-of-bounds access, avoid creating a load larger
than the original attribute. This should work just as well, since the only
situations expending a load helped was because we shrunk it first.

Also fixes a bug where a 3 component load (4 components with the first
component skipped) would be incorrectly expanded to 4 components because
the stride check would never be performed. Maybe we should avoid skipping
the first component in some situations, but I'm not sure if it's worth
the VGPR cost.

fossil-db (vega10):
Totals from 583 (0.39% of 149974) affected shaders:
CodeSize: 1496848 -> 1500868 (+0.27%); split: -0.03%, +0.30%
Instrs: 286155 -> 286575 (+0.15%); split: -0.07%, +0.22%
Latency: 2947101 -> 2946865 (-0.01%); split: -0.23%, +0.22%
InvThroughput: 797396 -> 797127 (-0.03%); split: -0.08%, +0.04%

fossil-db (polaris10):
Totals from 583 (0.39% of 151365) affected shaders:
SGPRs: 38880 -> 39216 (+0.86%)
VGPRs: 24440 -> 24356 (-0.34%)
CodeSize: 1506808 -> 1510876 (+0.27%); split: -0.01%, +0.28%
Instrs: 288735 -> 289167 (+0.15%); split: -0.06%, +0.21%
Latency: 2963263 -> 2961884 (-0.05%); split: -0.24%, +0.19%
InvThroughput: 802351 -> 801665 (-0.09%); split: -0.12%, +0.04%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9007>
2021-06-14 09:48:32 +00:00
Rhys Perry
91f8f82806 radv,aco: use all attributes in a binding to obtain an alignment for fetch
Instead of assuming scalar alignment for an attribute, we can use the
required alignment of other attributes in a binding to expect a higher
one.

This uses the alignment of all attributes in the pipeline, not just the
ones loaded. This can create slightly better code, but could break
pipelines which relied on unused (and unaligned) attributes no being
loaded. I don't think such pipelines are allowed by the spec.

fossil-db (Sienna Cichlid):
Totals from 44350 (30.32% of 146267) affected shaders:
VGPRs: 1694464 -> 1700616 (+0.36%); split: -0.08%, +0.44%
CodeSize: 60207184 -> 58093836 (-3.51%); split: -3.51%, +0.00%
MaxWaves: 1175998 -> 1174948 (-0.09%); split: +0.02%, -0.11%
Instrs: 11763444 -> 11458952 (-2.59%); split: -2.60%, +0.01%
Latency: 70679612 -> 67062215 (-5.12%); split: -5.27%, +0.15%
InvThroughput: 11482495 -> 11362911 (-1.04%); split: -1.20%, +0.16%
VClause: 359459 -> 343248 (-4.51%); split: -6.36%, +1.85%
SClause: 422404 -> 419229 (-0.75%); split: -1.17%, +0.42%
Copies: 754384 -> 764368 (+1.32%); split: -1.74%, +3.06%
Branches: 197472 -> 197474 (+0.00%); split: -0.03%, +0.03%
PreVGPRs: 1215348 -> 1215503 (+0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9007>
2021-06-14 09:48:32 +00:00
Rhys Perry
9162963f0a aco: fix emit_mbcnt() with a VGPR mask
Found by inspection. Should be possible with nir_intrinsic_mbcnt_amd.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11295>
2021-06-10 11:21:47 +00:00
Timur Kristóf
18337fbcf2 aco: Use as_vgpr for the second source of mbcnt_amd.
Fixes: 1e49018ced
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11292>
2021-06-10 10:13:02 +00:00
Timur Kristóf
1e49018ced amd: Add extra source to the mbcnt_amd NIR intrinsic.
The v_mbcnt instructions can take an extra source that they add to
the result. This is not exposed in SPIR-V but we now expose it in NIR.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf
ce141e4c5f aco: Implement byte and lane permute intrinsics.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf
fd6605367d aco: Implement nir_op_sad_u8x4.
Fix up the operand size for v_sad instructions, and implement
the new NIR horizontal add. There is no viable way to do this
in SALU, so let's always use a VGPR destination.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Rhys Perry
c129ede523 aco: use ds_read_{u8,u16}_d16
This allows partial writes and writes to the upper half of the destination.

fossil-db (Sienna Cichlid):
Totals from 135 (0.09% of 149839) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11113>
2021-06-09 12:06:50 +00:00
Rhys Perry
6334d73fc9 aco: don't ever widen 8/16-bit sgpr load_shared
Doesn't seem to create incorrect code, but it is suboptimal.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11113>
2021-06-09 12:06:50 +00:00
Rhys Perry
4870d7d829 aco: use v1b/v2b for ds_read_u8/ds_read_u16
The p_extract_vector isn't necessary.

For ds_read_u8 and ds_read_u16, we used a 32-bit regclass, but did't load
32 bits, and used dst_hint for vector loads when we shouldn't have.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4863
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11113>
2021-06-09 12:06:50 +00:00
Samuel Pitoiset
3761d994f6 aco: fix range checking for SSBO loads/stores with SGPR offset on GFX6-7
GFX6-7 are affected by a hw bug that prevents address clamping to work
correctly when the SGPR offset is used. Use the VGPR offset to fix it.

Fixes various hangs with dEQP-VK.robustness.robustness2.* on Bonaire.

Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11238>
2021-06-09 06:40:16 +00:00
Rhys Perry
daa329f664 aco: use byte/word extract pseudo-instructions
fossil-db (Sienna Cichild):
Totals from 1890 (1.26% of 149839) affected shaders:
CodeSize: 5104196 -> 5104300 (+0.00%); split: -0.00%, +0.01%
Latency: 11572943 -> 11572880 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 4876941 -> 4876982 (+0.00%); split: -0.00%, +0.00%
SClause: 26774 -> 26775 (+0.00%)
Copies: 125778 -> 125813 (+0.03%)
PreSGPRs: 56452 -> 56451 (-0.00%)

fossil-db (Polaris):
Totals from 1884 (1.24% of 151365) affected shaders:
CodeSize: 3849340 -> 3849312 (-0.00%); split: -0.00%, +0.00%
Instrs: 741391 -> 741382 (-0.00%)
Latency: 13533815 -> 13533439 (-0.00%)
InvThroughput: 12058777 -> 12058500 (-0.00%)
Copies: 120890 -> 120891 (+0.00%)
PreSGPRs: 48940 -> 48939 (-0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:43 +00:00
Rhys Perry
1f2518ef9f aco: implement nir_op_extract/nir_op_insert
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:43 +00:00
Caio Marcelo de Oliveira Filho
c8a7bd0dc8 nir: Rename WORK_GROUP (and similar) to WORKGROUP
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.

Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00