Commit graph

196393 commits

Author SHA1 Message Date
Paulo Zanoni
4c366ef67b anv/trtt: set every entry to NULL when we create an L2 table
When we create sparse resources the first thing we do is a NULL bind
on them, as the Vulkan spec mandates certain behavior even for unbound
sparse resources. We do this with the minimal effort possible: if we
can get away with marking an L2 pointer as NULL in the L3 table, we
just do it and return, instead of going all the way to creating L1
tables and marking all the final entries as NULL.

The strategy we were using had a bug that could lead to previously
created NULL entries not being marked as NULL anymore. Let's give an
example:

 (before proceeding, keep in mind that a NULL entry in the L3 and L2
  tables has bit 1 set, it does *not* have the value 0)

 - Create a 64mb buffer that uses an entire L1 table (needs to be
   properly aligned), which triggers a NULL bind.
     - Our algorithm will just set the L3 entry (pointing to the L2
       table) as NULL.
 - Create a 64kb buffer that uses the same L2 table (but a different
   L1 table).
     - The NULL bind triggered won't do anything as the L2 table is
       already NULL.
 - Bind the first buffer to actual memory. This will end up creating
   the L2 table and the L1 table. The only entry we will set in the L2
   table will be the one pointing to the L1 table. All the other
   values will be 0 (so they won't have neither the NULL or Invalid
   bits set: access to them will lead to page faults).
 - Try to use the second buffer, which is still unbound. It was
   relying on the fact that its L2 table pointer was NULL, but now
   it's not anymore, so the page walker will fetch the L1 entries in
   the L2 table and they will all be zero instead of having the NULL
   bit set.

The fix is pretty simple: whenever we create a new L2 table, set every
entry to NULL (except the one we're about to set to non-NULL). This
preserves behavior for every other NULL resource relying on the L3
entry being set to NULL.

We don't need to do this for the L1 table because its entries are
different and instead of having bits to signal NULL entries we have
a special TR-TT register that we can set that gets compared to check
if an entry is NULL, and we conveniently program it to 0: see
ANV_TRTT_L1_NULL_TILE_VAL.

I am not aware of any real workloads that are triggering this
behavior, I found this issue while investigating something else,
running a custom sparse program in our pre-silicon environment, and it
told us about the page faults.

Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
2024-10-15 23:05:30 +00:00
M Henning
537ada2308 nak: Phi coalescing via biased register coloring
Reduces code size by -29.08% on shaderdb + nvk-fossils-foss

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31498>
2024-10-15 22:29:11 +00:00
Dylan Baker
38f7ae5288 release: push 24.3 out two weeks
I've had a couple of requests to push the release out 1-2 weeks. There
have been various reasons for this, but the best one (IMHO) is that this
is the week directly after XDC, and many people will be jetlagged and/or
suffering from the post-XDC flu.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31637>
2024-10-15 14:59:50 -07:00
Karol Herbst
ff2c4e8f11 zink: add CL CTS result
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31614>
2024-10-15 21:07:07 +00:00
Juston Li
0c9ee0f2b9 android: look for debug/vendor prefixed options
Properties from the vendor partition must use a "vendor." prefix from
Android T+. Meanwhile the "debug." prefix can be used for local
overrides.

The order of precedence thus becomes:
1. getenv
2. debug.mesa.*
3. vendor.mesa.*
4. mesa.* (as a fallback for older versions)

Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31584>
2024-10-15 20:22:17 +00:00
Kenneth Graunke
4cb67cb07a intel/brw: Use whole 512-bit registers in constant combining on Xe2
Xe2 increased the register size from 256-bits to 512-bits.  So we can
store 32 16-bit values in a register, rather than 16 values.  Prior to
this patch, we hadn't updated the pass, so the second half of each of
our registers was unused.

Backport-to: 24.2
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
d9e5022650 intel/brw: Delete more Gfx8 code from brw_fs_combine_constants
These platforms are supported by elk, not brw.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
dea61b7399 intel/brw: Fix register and builder size in emit_barrier() for Xe2
We were manually allocating 1 REG_SIZE for the barrier payload, which is
only half a register on Xe2.  This should eventually get allocated to a
whole register anyway, but it's awkward in the meantime.  Also, we were
zero-initializing the header using group(8, 0) which only initialized
half the register.  The rest of the fields are Reserved MBZ, so they're
likely unused and unread anyway - but it's better to zero-initialize
them so we don't get random undefined, miserable-to-debug behavior.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
7c9eb8b289 intel/brw: Make a ubld temporary in emit_barrier()
Saves typing .exec_all() in a lot of places.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
a9d9488788 intel/brw: Delete Gfx7-8 code from emit_barrier()
Those are supported by elk, not brw.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
c747c1e1f4 intel/brw: Fix spill/fill count for load/store_scratch in SIMD32
Honestly, I don't know what I was thinking - we are emitting a single
spill/fill message here, but were counting it as 2 spill/fills in SIMD32
shaders.  So our eventual shader stat reporting would subtract the
number of spills and fills from send_count, and get a negative number,
wrapping around to just shy of UINT32_MAX.  That's way too many sends.

This is especially noticable on Xe2 which often uses SIMD32 shaders.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Pavel Ondračka
58d6906f8c r300/ci: update ci expectations after piglit uprev
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31663>
2024-10-15 17:43:00 +00:00
Faith Ekstrand
03a393d6ca nak: Handle annotations in legalization
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31665>
2024-10-15 17:13:27 +00:00
Faith Ekstrand
36d9d11882 nak: Remove annotations before calc_instr_deps()
Otherwise the annotations might throw off latency information which
needs exact instruction counts.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31665>
2024-10-15 17:13:27 +00:00
Aleksi Sapon
9e769a0620 lavapipe: enable alpha-to-coverage dithering
This is a common feature on hardware, both Nvidia
and Apple GPUs have it always enabled.

On OpenGL this can be controlled using
NV_alpha_to_coverage_dither_control, but as far
as I can tell there is no extension on Vulkan.
Metal also has this feature without a control.

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31373>
2024-10-15 16:17:40 +00:00
Aleksi Sapon
ad4635d6ef llvmpipe: implement alpha-to-coverage dithering
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31373>
2024-10-15 16:17:40 +00:00
Danylo Piliaiev
6d6d5b869c freedreno/cffdec: Add option to dump bindless descriptors
cffdump --bindless would dump bindless descriptors. We don't know
what exactly is in the descriptors, so we dump all interpretations
for each of them.

Example:
    set[1]:
    UBO[0]:
        { BASE_LO = 0x23806420 }
        { BASE_HI = 0xc | SIZE = 0x2 }
    STORAGE/TEXEL/IMAGE[0]:
        { TILE_MODE = TILE6_LINEAR | SWIZ_X = A6XX_TEX_Z | SWIZ_Y = A6XX_TEX_X | SWIZ_Z = A6XX_TEX_Y | SWIZ_W = A6XX_TEX_W | MIPLVLS = 0 | SAMPLES = MSAA_ONE | FMT = FMT6_R8_G8B8_2PLANE_420_UNORM | SWAP = WZYX }
        { WIDTH = 12 | HEIGHT = 8 }
        { STRUCTSIZETEXELS = 1024 | STARTOFFSETTEXELS = 0 | PITCHALIGN = 1 | PITCH = 128 | TYPE = A6XX_TEX_2D }
        { ARRAY_PITCH = 4096 | MIN_LAYERSZ = 0 }
        { BASE_LO = 0xa5000 }
        { BASE_HI = 0x1 | DEPTH = 1 }
        { MIN_LOD_CLAMP = 0.000000 | PLANE_PITCH = 128 }
        { FLAG_LO = 0xa6000 }
        { FLAG_HI = 0x1 }
        { FLAG_BUFFER_ARRAY_PITCH = 327680 | 0xa0000 }
        { FLAG_BUFFER_PITCH = 64 | FLAG_BUFFER_LOGW = 0 | FLAG_BUFFER_LOGH = 0 }
        { 11 = 0 }
        { 12 = 0 }
        { 13 = 0 }
        { 14 = 0 }
        { 15 = 0 }
    SAMPLER[0]:
        { XY_MAG = A6XX_TEX_NEAREST | XY_MIN = A6XX_TEX_NEAREST | WRAP_S = A6XX_TEX_CLAMP_TO_EDGE | WRAP_T = A6XX_TEX_MIRROR_CLAMP | WRAP_R = A6XX_TEX_MIRROR_CLAMP | ANISO = A6XX_TEX_ANISO_2 | LOD_BIAS = 4.437500 }
        { COMPARE_FUNC = FUNC_GEQUAL | MAX_LOD = 4.000000 | MIN_LOD = 0.000000 }
        { REDUCTION_MODE = A6XX_REDUCTION_MODE_MIN | BCOLOR = 0x400080 }
        { 3 = 0x1 }

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31632>
2024-10-15 15:35:39 +00:00
Danylo Piliaiev
e2e9dd4f21 freedreno/rnndec: Consider array length when finding by reg name
Otherwise we get a valid reg base for reg array with OOB index.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31632>
2024-10-15 15:35:39 +00:00
Deborah Brouwer
0007077c11 ci: remove xfail program@build@include-directories
Now that build-piglit.sh is no longer removing ‘include_test.h’
this test `program@build@include-directories` is passing which is causing
jobs to fail due to this unexpected improvement. Remove this test from
expected fails so that the jobs can pass.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31379>
2024-10-15 15:50:47 +01:00
Collabora's Gfx CI Team
68aa78a858 Uprev Piglit to 7ce69da1199d12ed0ddaa251ed489750523798fb
e9ab30aeae...7ce69da119

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31379>
2024-10-15 15:50:47 +01:00
Mike Blumenkrantz
4ac4004816 llvmpipe: expose GL multiview extensions
this is a no-op since lavapipe is already doing it

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31590>
2024-10-15 14:01:42 +00:00
Mike Blumenkrantz
f5bd39e0e3 gallium: delete duplicated viewmask member in draw info
this was added for lavapipe, but it should have been in the
framebuffer state since it is a framebuffer state

now the GL multiview extensions are supported with viewmask
in the framebuffer struct, which means this is all redundant
and should be corrected/deleted

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31590>
2024-10-15 14:01:42 +00:00
Mike Blumenkrantz
8487ecfa44 iris: assert that viewmask is 0
this is not supported by the driver, so it doesn't need to
be checked at runtime

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31590>
2024-10-15 14:01:42 +00:00
Mike Blumenkrantz
a82d8e638d util/framebuffer: add viewmask compare for fb equal
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31590>
2024-10-15 14:01:42 +00:00
Boris Brezillon
e113ce0d87 panvk/csf: Fix the clear-only RUN_FRAGMENT case
Issuing a RUN_FRAGMENT with no tiler descriptor is a valid use case
when one just needs to clear attachments. Make sure we take that case
into account in issue_fragment_jobs().

Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31625>
2024-10-15 13:16:07 +00:00
Boris Brezillon
e9462e77d8 panvk: Advertise dynamic rendering support
This was already supported, but not yet exposed.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31625>
2024-10-15 13:16:07 +00:00
Boris Brezillon
66543a111c panvk/csf: Fix a buffer/stack-overflow when PANVK_DEBUG=sync
We're not allocating enough qsubmit slots when force_sync=true in
panvk_queue_submit().

Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31625>
2024-10-15 13:16:07 +00:00
Boris Brezillon
195fd67910 panvk/csf: Fix cmd_emit_dcd() in the FB preload logic
We need to mask the bound_attachments value with
MESA_VK_RP_ATTACHMENT_ANY_COLOR_BITS otherwise we're passing
depth/stencil attachments masks too.

Fixes: 0bc3502ca3 ("panvk: Implement a custom FB preload logic")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31625>
2024-10-15 13:16:07 +00:00
Boris Brezillon
4199212ebe panvk/csf: Fix dirty checking in prepare_ds()
If the fragment shader changed, we need to re-emit the depth-stencil
descriptor.

Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31625>
2024-10-15 13:16:07 +00:00
Boris Brezillon
1096adb128 panvk/csf: Fix no-fragment IDVS
Fragment shader program/resource table are only set when the shader or
descriptor table is updated. But if the first RUN_IDVS happening on
the command buffer doesn't require fragment shading, those registers
won't be updated, and we might inherit values set by a previous command
buffer executed on the same queue, leading to GPU faults if these
descriptor buffers have been recycled.

Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31625>
2024-10-15 13:16:07 +00:00
Boris Brezillon
ce1562e9cc panvk: Make panvk_pool_free_mem() error proof
It's pretty easy to pass the wrong pool to panvk_pool_free_mem()
(was the case in panvk_shader_destroy() and
panvk_internal_shader_destroy()), so let's make the existing interface
more robust to this kind of mistake by storing the 'owned-by-pool'
information at the panvk_priv_mem level. We use the lower 3 bits of the
BO pointer for that, since a BO object is guaranteed to be aligned on
8-byte.

Fixes: ce14681ebf ("panvk: Don't leak vertex shader program descriptors")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31625>
2024-10-15 13:16:07 +00:00
Georg Lehmann
40c4ec881d radv: call nir_opt_remove_phis in radv_optimize_nir_algebraic
Foz-DB Navi31:
Totals from 3048 (3.84% of 79395) affected shaders:
Instrs: 603535 -> 599281 (-0.70%); split: -0.74%, +0.03%
CodeSize: 3074416 -> 3056236 (-0.59%); split: -0.60%, +0.01%
Latency: 2851382 -> 2849808 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 294247 -> 294201 (-0.02%); split: -0.02%, +0.01%
SClause: 18077 -> 18083 (+0.03%); split: -0.03%, +0.07%
Copies: 63860 -> 59926 (-6.16%); split: -6.33%, +0.17%
Branches: 15901 -> 15899 (-0.01%)
PreSGPRs: 62441 -> 61353 (-1.74%)
VALU: 291049 -> 291035 (-0.00%); split: -0.01%, +0.00%
SALU: 96786 -> 92606 (-4.32%); split: -4.42%, +0.10%

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31360>
2024-10-15 10:01:43 +00:00
Pavel Ondračka
f94087be2c r300/compiler: reformat using default mesa .clang-format rules
Most notably switch from tabs to 3 spaces.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Acked-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23771>
2024-10-15 09:24:02 +00:00
Pavel Ondračka
4a6abbc9c1 r300: opt in to clang-format CI enforcement for the compiler
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Acked-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23771>
2024-10-15 09:24:02 +00:00
Pavel Ondračka
4e4b124fa9 r300: add .clang-format file for the compiler
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Acked-by: :Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23771>
2024-10-15 09:24:02 +00:00
Mary Guillemard
b12c294e7b panvk: Define primitive size for RUN_TILER/RUN_IDVS
We were ignoring line width with line topologies.

This also force a value of 1.0f in case point topology is in use while
no write in shader is being performed to respect maintenance5
requirements.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31623>
2024-10-15 08:50:19 +00:00
Iago Toral Quiroga
188f1c6cbe v3dv: rewrite device identification
Instead of trying to match device compatible strings like 'brcm,2712-v3d',
which may change with product revisions, match the device name, like 'v3d'.
This simplifies a bit the matching logic and allows us to have less
diverging paths for hardware and simulator.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31619>
2024-10-15 07:57:51 +00:00
Iago Toral Quiroga
23432921b3 v3dv: drop device_id field
This was added only to report the DRM device ID of the actual
GPU used in the simulated environment but there is no real
reason we need to do that, so let's juts keep it simple and
provide the device ID of the simulated device instead.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31619>
2024-10-15 07:57:51 +00:00
Tapani Pälli
a3c03b6a96 mesa: fix DXT1 support with EXT_texture_compression_dxt1
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11987
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31540>
2024-10-15 07:19:55 +00:00
Utku Iseri
271fdedc5a st/mesa: clamp reported max lod bias
mesa clamps lod bias values to -32,31 during quantization,
so the reported max value should also be limited to 31.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11977
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31525>
2024-10-15 06:38:48 +00:00
Marek Olšák
0727634443 nir/opt_load_store_vectorize: vectorize load_smem_amd
radeonsi+ACO with the new vectorization callback:

TOTALS FROM AFFECTED SHADERS (19508/58918)
  VGPRs: 708672 -> 708864 (0.03 %)
  Code Size: 31458688 -> 31217160 (-0.77 %) bytes
  Max Waves: 305960 -> 305952 (-0.00 %)

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
a44e5cfccf nir/opt_load_store_vectorize: allow a 4-byte hole between 2 loads
If there is a 4-byte hole between 2 loads, drivers can now optionally
vectorize the loads by including the hole between them, e.g.:
    4B load + 4B hole + 8B load --> 16B load

All vectorize callbacks already reject all holes, but AMD will want to
allow it.

radeonsi+ACO with the new vectorization callback:

TOTALS FROM AFFECTED SHADERS (25248/58918)
  VGPRs: 871116 -> 871872 (0.09 %)
  Spilled SGPRs: 397 -> 407 (2.52 %)
  Code Size: 43074536 -> 42496352 (-1.34 %) bytes

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
80c156422d nir/opt_load_store_vectorize: allow overfetching, merge overfetched loads
New load merging transformations (first, second), examples:
    (vec4, vec3) ==> vec8(read=0x7f) (because NIR doesn't have vec7)
    (vec1, vec8(read=0x7f)) ==> vec8(read=0xff)
    - the unused component at the end of vec8 is dropped

Not merged:
    vec8(read=0xfe) + vec1
    - unused components at the beginning are kept

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
65ace5649b nir: reject unsupported component counts from all vectorize callbacks
If you allow an unsupported component count in the callback for loads,
nir_opt_load_store_vectorize will align num_components to the next supported
vector size, essentially overfetching.

This changes all callbacks to reject it. AMD will enable it in a later commit.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
02923e237d nir: add hole_size parameter into the vectorize callback
It will be used to allow merging loads with a hole between them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
8ce43b7765 nir/opt_load_store_vectorize: add entry::num_components
We will represent vec6..vec7, vec9..vec15 loads with 8 and 16
components respectively, so we need to track how many components
we really use.

This is a prerequisite for optimal merging up to vec16. Example:
    Step 1: vec4 + vec3 ==> vec7as8 (last component unused)
    Step 2: vec1 + vec7as8 ==> vec8 (last unused component dropped)

Without using the number of components read, the same example would end up
doing:
    Step 1: vec4 + vec3 ==> vec8
    Step 2: vec1 + vec8 ==> vec9 (fail)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Alyssa Rosenzweig
e9303c0952 nir: extract round component helper
another nir pass will use this.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Faith Ekstrand
c2684968de nvk: Advertise 64-bit atomics on buffer views
We also add an nvk_format_supports_atomics() helper.  This helper lives
in NVK for now because it's not just about the format and hardware but
also about whether or not we have compiler support in NAK.

Fixes: 1d10de539c ("nvk: Implement VK_EXT_shader_image_atomic_int64")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31633>
2024-10-15 05:21:03 +00:00
Faith Ekstrand
d3d8271620 nvk: Re-sort the features table
There were a couple of KHR extensions that got mixed in with the EXTs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31633>
2024-10-15 05:21:03 +00:00
Faith Ekstrand
681f807747 nvk: Only set texture/sampler tables and SLM for enabled engines
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31633>
2024-10-15 05:21:02 +00:00