aco_instruction_selection_setup.cpp (previously used as a header) has
been split into a header and an implementation file. The latter "only"
implements init_context and setup_isel_context, but since these files
carry a long trail of helper functions, this cleans up the isel header
a lot.
Reduces library size by 3.1% due to more functions being compiled with
static linkage. Makes aco_instruction_selection.cpp compile 3% faster.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
Statically known values were encoded using template parameters previously,
causing specializations for each of the 5 sets of template arguments to be
generated. Since emit_load is not performance critical (the inner loop
never runs more often than twice), it's better for build time to use
runtime arguments everywhere.
Reduces build time of this file by 9% (17.3s -> 15.7s on my machine) and
reduces libaco's size by 2.6%.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
If the offset is large enough, it could affect the width. I'm also not
sure if the hardware masks the offset by 0x1f.
Found by inspection. No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6424>
These consist of the variations nir_op_{i|u|f}{min|max|med}3 which are either
lowered in the backend (LLVM) anyway or can be recombined by the backend (ACO).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6421>
It's way easier to write a trap handler shader using ACO IR
instead of writing disassembly by hand + clrxasm + copy&paste.
This trap handler is quite simple for now, it just loads a
buffer descriptor from the TMA BO, it saves ttmp0-1 which
contain various info about the faulty instruction, and it
stores some hw registers about the wave/trap status.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6384>
It was always fneu but naming it fne causes confusion from time to time. So
lets rename it. Later we also want to add other unordered and fne, this is
a smaller preparation for that.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>
We were using the wrong conversion opcode. The high bits are also not
zero'd on GFX10, which can cause v_cvt_pk_u16_u32 to clamp.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: df645fa369 ('aco: implement VK_KHR_shader_float_controls')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6346>
The OpenCL image_width/height/depth functions have variants which can
take an LOD parameter. More importantly, LLVM-SPIRV-Translator always
generates OpImageQuerySizeLod even if the LOD is guaranteed to be zero.
Given that over half the hardware out there has an LOD field for image
size queries (based on a rudimentary scan through their NIR -> whatever
code), we may as well just add the source to the NIR intrinsic. If this
is ever a problem for anyone, the lowering is pretty trivial.
I've also added asserts to everyone's drivers that should alert them if
they ever see an LOD other than zero. This will never happen with GL or
Vulkan so there's no need for panic.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6396>
The upcoming change will allow to report all ACO errors (or warnings)
directly to the app via VK_EXT_debug_report. This is similar to what
we already do for reporting various SPIRV->NIR errors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6318>
We only need one s_bfe for a conversion with a swizzled source.
shader-db (parallel-rdp, Navi):
Totals from 487 (71.30% of 683) affected shaders:
SpillSGPRs: 3284 -> 3233 (-1.55%); split: -2.71%, +1.16%
SpillVGPRs: 2174 -> 2150 (-1.10%); split: -1.24%, +0.14%
CodeSize: 2497864 -> 2445544 (-2.09%); split: -2.11%, +0.01%
Instrs: 450613 -> 445104 (-1.22%); split: -1.27%, +0.05%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5259>
NIR doesn't have atomic loads/stores, so we have to workaround that with
this for dEQP-VK.memory_model.* to pass.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4905>
Previously, we've set element_size == 16 which causes loads from
packed vec3 arrays to cross the boundary and return wrong data.
This patch sets element_size = 4 and splits loads into single channel.
Fixes all of dEQP-VK.subgroups.ballot_broadcast.*
Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5977>
This (combined with a pass to actually set the corresponding NIR flags)
should help fix a lot of the regressions from the SMEM addition combining
change.
fossil-db (Navi):
Totals from 12 (0.01% of 135946) affected shaders:
CodeSize: 12376 -> 12304 (-0.58%)
Instrs: 2436 -> 2422 (-0.57%)
VMEM: 1105 -> 1096 (-0.81%)
SClause: 133 -> 130 (-2.26%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
SMEM does the addition with 64-bits, not 32. So if the original code
relied on wrapping around (for example, for subtraction), it would break.
Apparently swizzled MUBUF accesses also have issues with combining
additions that could overflow. Normal MUBUF accesses seem fine.
fossil-db (Navi):
Totals from 27219 (20.02% of 135946) affected shaders:
CodeSize: 128303256 -> 131062756 (+2.15%); split: -0.00%, +2.15%
Instrs: 24818911 -> 25280558 (+1.86%); split: -0.01%, +1.87%
VMEM: 162311926 -> 177226874 (+9.19%); split: +9.36%, -0.17%
SMEM: 18182559 -> 20218734 (+11.20%); split: +11.53%, -0.34%
VClause: 423635 -> 424398 (+0.18%); split: -0.02%, +0.20%
SClause: 865384 -> 1104986 (+27.69%); split: -0.00%, +27.69%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2748
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
This is needed since we will be lowering some 8/16-bit shuffles to
masked_swizzle_amd.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5695>
fossil-db (Pitcairn):
Totals from 2172 (1.58% of 137414) affected shaders:
CodeSize: 7109080 -> 7100100 (-0.13%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5623>
SPI_SHADER_COL_FORMAT allocates export memory and CB_SHADER_MASK
map them to higher MRTs if necessary. The hardware allows to remap
MRTs to avoid holes somehow.
For example, if we have a scenario where MRT0 is unused and only
MRT1 and MRT2 are used, SPI_SHADER_COL_FORMAT is 0x77 and
CB_SHADER_MASK/CB_TARGET_MASK are 0x770 (this assumes
SPI_SHADER_UINT16_ABGR is set).
This allows us to remove one workaround that was added for fixing
GPU hangs with DXVK. I think this is because SPI_SHADER_COL_FORMAT
expects contiguous MRTs to be allocated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5434>
GFX6 doesn't support v_floor_f64 and the precision of v_fract_f64
which is used to implement 64-bit floor is less than what Vulkan
requires.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5609>