Compare commits

...

134 commits

Author SHA1 Message Date
Dylan Baker
944ec88ca5 VERSION: bump for 25.3.0-rc4
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-11-06 09:02:18 -08:00
Georg Lehmann
7d4557bae8 radv: do not report wave32 in gl_SubgroupSize for Doom Dark Ages
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
The shaders in question use:

(memory_load + (gl_SubgroupSize - 1)) & ~(gl_SubgroupSize - 1)

My guess is that this is supposed to be the subgroup size of whatever
produced the value, not the subgroup size in this shader.
And because in the consumer the workgroup size is 32, we use wave32.

Fixes: a2d3cbac2a ("radv: determine subgroup/wave size early")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14187

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 83e9ae2d5c)

Conflicts:
	src/amd/vulkan/radv_instance.c
	src/amd/vulkan/radv_instance.h
	src/util/driconf.h

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-05 10:18:26 -08:00
Yiwei Zhang
4482281292 panvk: fix sample shading of internal blend shader for MSAA
Align with gallium side. When fixed-function blending is not available,
the internal blend shader is used. This is handled by a single ST_TILE
in the blend shader with the current sample ID, which requires sample
shading enablement.

Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 763d2418b8)

CI fails removed from cherry-pick as the file doesn't exist on stable,
and the main branch change has only removals.

Conflicts:
	src/panfrost/ci/panfrost-g925-fails.txt

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-05 10:16:55 -08:00
Connor Abbott
1dadb38d7b tu: Fix attachment stores with subpasses with partial views
Subpasses can have different view masks, although this isn't often used.
So we can't use the view mask of the last subpass when deciding what to
store, instead we have to use the same used_views field that's used by
loads and clears.

Noticed by upcoming tests for VK_QCOM_multiview_per_view_render_areas.

Cc: mesa-stable
(cherry picked from commit c0b5c04b84)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 14:12:21 -08:00
Connor Abbott
ef9457d119 tu: Rename tu_render_pass_attachment::clear_views to used_views
It's not just used for clears, it was already used for loads and it
needs to be used for stores too so clear_views was a confusing name.

Cc: mesa-stable
(cherry picked from commit 6c3ed74ed2)

Conflicts:
	src/freedreno/vulkan/tu_pass.cc
	src/freedreno/vulkan/tu_pass.h

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 14:12:20 -08:00
Lionel Landwerlin
9696921018 anv: avoid null pointer access in utrace copies on CCS
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3e0ad0176b ("anv: Emit state cache invalidation after every compute dispatch")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 6f138fe723)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Lionel Landwerlin
30678337dd u_trace: reserve chunk space before emitting copies
Some implementations can emit tracepoints when copying u_trace
buffers. It's important to reserve the slots we want to copy into
before emitting the copies so that both processes don't clash with one
another.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit df5f92d114)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Danylo Piliaiev
eb2668aad7 vulkan: Always fill DS state for EXT_dynamic_rendering_unused_attachments
If renderpass has D/S attachment, but pipeline has D/S as UNDEFINED,
D/S should be properly disabled for the pipeline. The easiest way is to
ensure that D/S state is valid when pipeline's D/S format is UNDEFINED.
So we always create VkPipelineDepthStencilStateCreateInfo.

CC: mesa-stable

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit 2798ef7bfd)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Ryan Houdek
e23c722170 freedreno/fdl: Fix typo in tiled_to_linear_2cpp
The non-aarch64 path was copying in the wrong direction.

Fixes: 7a5a33e0e3 ("freedreno/fdl: Add tiling/untiling implementation for a6xx/a7xx")
(cherry picked from commit 455eb2c751)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Mel Henning
2ad12150e8 nak/opt_lop: Don't handle modifiers in dedup_srcs
The handling in dedup_srcs was incorrect because it would apply the
modifier from srcs[i] to the LUT without removing the modifier from the
instruction. We can fix and simplify this code by removing all modifiers
before the dedup_srcs() call, which we were doing immediately after the
call anyway.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13966
Fixes: 66c9c40f68 ("nak: Handle modifiers in dedup_srcs() in opt_lop()")
Reviewed-by: Seán de Búrca <sdeburca@fastmail.net>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
(cherry picked from commit 041216e605)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
3109237d7c llvmpipe: Always recompute 1/w
The value depends on the tgsi_interpolate_loc which is not constant for
the loop. llvm should be able to cse in cases where they are the same.

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit aa28fcb610)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
775652e08b gallivm/nir/soa: Use the sign of src1 for imod
This is the behavior specified by the nir opcode, the spirv spec and
required by maintenance8.

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4d30da6599)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
51285c6715 lavapipe: Bump MAX_DESCRIPTOR_UNIFORM_BLOCK_SIZE
cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 25e678a37d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
c70fd7f766 lavapipe: Zero image null descriptors
The size queries for images do not use function pointers so we need to
be careful that width, height and depth are 0.

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit d6dd96e1c7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
d527eedb15 lavapipe: Bump maxPrimitiveCount
The vulkan spec requires at least 2^29-1.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14212
cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ff145d2ddc)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Marek Olšák
7c18540961 Revert ABI breakage "amd: Add user queue HQD count to hw_ip info"
This reverts commit 56d758d321.

It broke ABI between Mesa and libdrm, causing crashes due to stack smashing.

See: https://gitlab.freedesktop.org/mesa/libdrm/-/issues/121#note_3172362

Fixes: 56d758d321
(cherry picked from commit 5d92c92ce5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Janne Grunau
90b6c3a8ac hk: Report the correct plane count in VkDrmFormatModifierProperties2?EXT
Fixes import of planar formats like NV12 in gtk4. Allows
`gst-launch-1.0 v4l2src ! gtk4paintablesink` to use vulkan instead of
falling back to OpenGL.

Closes: #14217
Cc: mesa-stable
Signed-off-by: Janne Grunau <j@jannau.net>
(cherry picked from commit 83b97379dc)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Alyssa Rosenzweig
3dab73159b asahi,ail: fix multi-plane imports
We need to handle plane offsets everywhere. I noticed this broken before but
didn't realize it was a GL driver issue. Fix is easy, wrote this on my sofa
while waking up in the morning.

Fixes gst-launch-1.0 v4l2src ! glimagesink

Note that cheese & snapshot both still hang for some reason due to
libgstpipewire, but the Mesa side should be fine now.

Closes: #14217
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Cc: mesa-stable
(cherry picked from commit aa9f937116)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Ian Romanick
55a37838b9 elk: Apply vgrf127 workaround in more cases
No shader-db changes on Broadwell. Older platforms were not tested.

Fixes: e7b7d572b3 ("intel/fs/ra: Re-arrange interference setup")
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 2e8b89ec60)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Ian Romanick
9bad1beb98 brw: Apply Gfx9 vgrf127 workaround in more cases
No shader-db changes on any Intel platform.

fossil-db:

Skylake
Intel(R) HD Graphics 530 (SKL GT2)
Totals:
Cycle count: 57669758527 -> 57669757913 (-0.00%); split: -0.00%, +0.00%

Totals from 10 (0.00% of 1736875) affected shaders:
Cycle count: 274949 -> 274335 (-0.22%); split: -0.36%, +0.14%

This change is likely due to subtle differences of different registers
being allocated.

In addition, fossils/google-meet-clvk/BgBlur.1f58fdf742c27594.1.foz and
fossils/google-meet-clvk/Relight.1f58fdf742c27594.1.foz stopped failing
EU validation on Gfx9 platforms.

Closes: #14171
Fixes: e7b7d572b3 ("intel/fs/ra: Re-arrange interference setup")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 3e6af6c5bb)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Dylan Baker
3086692bcd .pick_status.json: Update to 27d9e4ec2a
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
David Rosca
ce6c6a7a57 radv/video: Only use write_memory for encode feedback with full support
write_memory is used after encoding every frame to mark the feedback
buffer as ready. Only use it when write_memory can work without PCIe
atomics support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 874e02003a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-03 13:17:41 +01:00
David Rosca
629a0a4dcc radv/video: Introduce two levels of write_memory support
Print warning when using write_memory with firmwares that require
PCIe atomics support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 8e1d74bbb4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-03 13:17:38 +01:00
Icenowy Zheng
12c82aaa82 gallivm: orcjit: remember Context in addition to ThreadSafeContext
The llvm::orc::ThreadSafeContext object wraps an llvm::Context and keeps
its reference.

As we are no longer able to squeeze out Context from ThreadSafeContext
in LLVM 21, do not let ThreadSafeContext create Context implicitly for
LLVM 21, instead explicitly create Context and then remember it.

This also eliminates the code creating a Context that is never disposed.

Fixes: cd129dbf8a ("gallivm: support LLVM 21")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
(cherry picked from commit cc60a7a39d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:14 -07:00
Samuel Pitoiset
1e885e7a88 radv: add a workaround for illegal depth/stencil descriptors with No Man's Sky
Using descriptors with both depth and stencil aspects is illegal in
Vulkan and this hangs the GPU.

Use NULL descriptors to mitigate the issue. Note that AMDVLK also
ignores them.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13325
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit cb4e0c4140)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:13 -07:00
Tapani Pälli
3ddddf78b4 anv: bring back some lost game drirc workarounds for subgroups
Fixes: d39e443ef8 (" anv: add infrastructure for common vk_pipeline")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit f48df6f45c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:12 -07:00
Gert Wollny
86313f9571 r600/sfn: AR loads are not dependend on the future and other code blocks
If the AR is loaded from a register changing that register in a loop was
resulting in a scheduling failure because the AR load was made dependend
on a later instruction. Fix the dependencies by only using dependencies on
older instruuctions in the same block.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14114
Fixes: d21054b4bc ("r600/sfn: Add pass to split addess and index register loads")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 43d9765e35)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:12 -07:00
Paul Gofman
a46307a732 driconf: add a workaround for Investigation Stories : gunsound
CC: mesa-stable
(cherry picked from commit 63aec75981)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:11 -07:00
Dylan Baker
0a0d08dfe0 .pick_status.json: Update to e44a776f47
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:09 -07:00
Faith Ekstrand
182877f3c8 nvk: Don't re-initialize the descriptor writer if the set matches
The logic here before was wrong.  In the case where the set is the same,
it would avoid the flush but then re-initialize anyway, loosing the
dirty information and causing us not to actually flush out all the
descriptors.

Fixes: 1f0fda22f7 ("nvk: Flush descriptor set maps")
(cherry picked from commit 2f6b3b6b91)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:14:05 -07:00
Eric Engestrom
9aeac1e0a7 util/meson: don't build libmesa_util_clflush unless needed
Fixes: efbecd93ba ("util: Build util/cache_ops_x86.c with -msse2")
(cherry picked from commit 0fe0acd4c3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:14:05 -07:00
Eric Engestrom
46f0422165 util/meson: don't build libmesa_util_clflushopt unless needed
Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
(cherry picked from commit ccf33664e8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:14:04 -07:00
Samuel Pitoiset
f69d1abfcf radv: ignore dual-source blending when blending isn't enabled for MRT0
The Vulkan spec says:
    "VUID-vkCmdDraw-maxFragmentDualSrcAttachments-09239
     If blending is enabled for any attachment where either the source
     or destination blend factors for that attachment use the secondary
     color input, the maximum value of Location for any output attachment
     statically used in the Fragment Execution Model executed by this
     command must be less than maxFragmentDualSrcAttachments"

Which means it must be disabled.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14190
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit b2badb2b24)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:14:00 -07:00
Eric Engestrom
770e095766 asahi/virtio: fix memleak
Fixes: c64a2bbff5 ("asahi: port to stable uAPI")
(cherry picked from commit fdef10916e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:58 -07:00
Dmitry Osipenko
205fe1a245 virtio/vdrm: Fix varying offsets of struct vdrm_device members
Struct virgl_renderer_capset_drm has a varying size depending on whether
AMDGPU driver is enabled or not. This breaks offset of struct vdrm_device
members for non-AMD drivers when Mesa is built with multiple native context
drivers including the AMD driver. Place varying capsets in the end struct
vdrm_device to mitigate the issue.

Fixes: 5736280730 ("virtio/vdrm: add ENABLE_DRM_AMDGPU for c_args")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
(cherry picked from commit bd8377bb04)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:58 -07:00
Mike Blumenkrantz
093c7d9d8e zink: don't destroy old push layout when enabling fbfetch descriptor
this may be in use by programs, and adding tracking/refcounting just to
delete a descriptor layout isn't worth the effort

cc: mesa-stable

(cherry picked from commit 272cf1db8e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:57 -07:00
Gert Wollny
2c67b0fac6 r600/sfn: make sure kill and update_exec don't happen in one group
v2: - Correctly test in multi-slot split whether the group has kill if
      we want to add a multi-slot op.
    - update group_has_predicate if an according vector op was added

Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 317345cc98)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:57 -07:00
Gert Wollny
e082f6b6c0 r600/sfn: Track whether a ALU group has a exec flag update
Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 0d065a2421)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:56 -07:00
Gert Wollny
a12369eb3d r600/sfn: move some common code into try_readport
Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 51e7c477d6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:55 -07:00
Gert Wollny
6670d0742b r600/sfn: extract function to update group after instr insert
Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit a7f477b51f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:55 -07:00
Mike Blumenkrantz
a7a020dde6 zink: collapse mesh pipeline fetching and binding conditionals
this avoids taking the wrong conditional if a pipeline fetch fails

cc: mesa-stable

(cherry picked from commit 343eef990e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:54 -07:00
Mike Blumenkrantz
7e15070ee1 zink: collapse gfx pipeline fetching and binding conditionals
this avoids taking the wrong conditional if a pipeline fetch fails

cc: mesa-stable

(cherry picked from commit 0b24fd174a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:54 -07:00
Sagar Ghuge
0edb1852a7 vulkan/runtime: Fix typo in stack size calculation
Fixes: 69a04151db ("vulkan/runtime: add ray tracing pipeline support")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
(cherry picked from commit a00560f763)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:53 -07:00
Alyssa Rosenzweig
3ce875a2d0 anv: use D3D-compatible texturing for Proton
Intel & AMD Direct3D drivers modify their rounding behaviour for texturing to
match Direct3D expectations. Such behaviour is not conformant in Vulkan, and
Intel hardware lacks a reasonable way to get NVIDIA's behaviour (which uniquely
works for Vulkan & Direct3D). The second best choice is to use
Direct3D-compatible behaviour for Proton (via driconf) and our current
Vulkan-conformant behaviour everywhere else. Given the APIs diverge and there is
no Vulkan extension to control the behaviour explicitly, driconf'ing on the
engineName is the reasonable solution.

anv already has a anv_force_filter_addr_rounding driconf option to force
Direct3D behaviour for certain Direct3D titles. Here we simply apply it to all
D3D10+ titles, aligning us with the Windows driver.

Note that D3D9 does not have this behaviour. We therefore use standard Vulkan
behaviour for D3D9 to avoid breaking D3D9 titles, even though the engineName is
the same as D3D10+.

This is the same solution radv uses, they call it radv_disable_trunc_coord. We
could unify the driconf entries later.

See https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38098#note_3166306
for a more detailed analysis, as well as the linked references:

   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27337
   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25911
   https://github.com/HansKristian-Work/vkd3d-proton/pull/1884

This fixes misrendering in piles of Direct3D games run on anv via Proton,
including Assassin's Creed Valhalla.

Cc: mesa-stable
Closes: #13886
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Co-authored-by: Calder Young <cgiacun@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
(cherry picked from commit 7a71952762)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:52 -07:00
Dylan Baker
fd777ce645 .pick_status.json: Update to 3334284845
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:41 -07:00
Dylan Baker
315b688976 VERSION: bump for rc3
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-10-30 11:35:06 -07:00
Job Noorman
3a71d94735 spirv: don't set in_bounds for structs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The arr::in_bounds field was set unconditionally for every deref created
for a chain. For struct derefs, which don't have this field, this would
write to an unused memory location, which is probably why this never
caused issues.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: f19cbe98e3 ("nir,spirv: Preserve inbounds access information")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 0ac55b786a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:18 -07:00
Benjamin Cheng
8a2bf930bb radv/video: Override H265 SPS unaligned resolutions
VCN requires 64x16 alignment for HEVC. When the app requests non-aligned
resolutions, make up for it with conformance window cropping.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Cc: mesa-stable
(cherry picked from commit cef8eff74d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:18 -07:00
Benjamin Cheng
ac492d42be radv/video: Override H265 SPS block size parameters
VCN only supports this set of parameters.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Cc: mesa-stable
(cherry picked from commit 84b6d8e0d7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:17 -07:00
Lionel Landwerlin
2e17fd0cb2 vulkan/render_pass: Add a missing sType
Fixes: 3a204d5cf3 ("vulkan/render_pass: Add a better helper for render pass inheritance")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit c5740c2548)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:16 -07:00
Marek Olšák
9311f170c4 zink: fix mesh and task shader pipeline statistics
Fixes: 9d0e73335a - zink: enable GL_EXT_mesh_shader
(cherry picked from commit 41a8c4d37c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:15 -07:00
Dylan Baker
3e227a04b1 .pick_status.json: Update to 32b646c597
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:10 -07:00
Sagar Ghuge
f63a5df30b brw/rt: fix ray_object_(direction|origin) for closest-hit shaders
We were returning world BVH level for origin/direction, this commit
fixes by retuning correct object BVH level origin/direction.

Fixes: aaff191356 ("brw/rt: fix ray_object_(direction|origin) for closest-hit shaders")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 89fbcc8c34)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Sagar Ghuge
9ba765e3e3 brw/rt: Move nir_build_vec3_mat_mult_col_major helper to header
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 3edeb1e191)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mike Blumenkrantz
8010d0cd39 zink: disable primitiveFragmentShadingRateMeshShader feature
features are auto-enabled, but some of them cause validation errors
which are simple to work around

Fixes: 90f3c57337 ("zink: hook up VK_EXT_mesh_shader")
(cherry picked from commit a2ef369abf)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Job Noorman
f1f32d557e ir3/ra: fix assert during file start reset
While accounting for an input register's merge set when resetting the
file start after the preamble, we implicitly assume that the allocated
register is the preferred one by asserting that the register's merge set
offset is not smaller than its physreg (to prevent an underflow).
However, inputs are not guaranteed to have their preferred register
allocated which causes the assert to get triggered.

Fix this by only taking the whole merge set into account for inputs that
actually got their preferred register allocated.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 9d4ba885bb ("ir3/ra: make main shader reg select independent of preamble")
(cherry picked from commit f84d85790e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Natalie Vock
05e5db1a4d nir/lower_shader_calls: Repair SSA after wrap_instrs
Wrapping jump instructions that are located inside ifs can break SSA
invariants because the else block no longer dominates the merge block.
Repair the SSA to make the validator happy again.

Cc: mesa-stable
(cherry picked from commit 50e65dac79)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Taras Pisetskyi
5ae8474029 drirc/anv: force_vk_vendor=-1 for Wuthering Waves
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12459

Signed-off-by: Taras Pisetskyi <taras.pisetskyi@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit dcd9b90aff)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
b3470359bf hk: Allocate the temp tile buffer in copy_image_to_image_cpu
We may require a bigger more than 16KiB to handle the image copy.
We now always allocate a buffer to handle it properly fixing the
remaining failures on VKCTS 1.4.4.0 for HIC.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
(cherry picked from commit d37ba302d0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
5e1a88cea0 hk: Make width and height per block in HIC
We were assuming that every formats used for HIC had a block widgh and
height of 1x1.

This is wrong for compressed formats like BC5, ASTC, ect.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Eric Engestrom <eric@igalia.com>
(cherry picked from commit 887f06a966)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Sagar Ghuge
040453857b anv: Call brw_nir_lower_rt_intrinsics_pre_trace lowering pass
Call this pass before nir_lower_shader_calls().

Fixes: d39e443e ("anv: add infrastructure for common vk_pipeline")
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 006085e676)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
28e172e956 hk: Remove unused allocation in queue_submit
Unused and leaking memory, found with address sanitizer.

Fixes: c64a2bbff5 ("asahi: port to stable uAPI")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 64131475a8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
74880f8954 hk: Disable 1x in sampleLocationsSampleCounts
We don't support it, everyone dropped support for that, let's not expose it.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 7e636d52f1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
f02f5e217f hk: Fix maxVariableDescriptorCount with inline uniform block
Same problem as NVK on VKCTS 1.4.4.0

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 8447b99f61)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Dylan Baker
d9636807f7 intel/compiler/brw: Add assert that we don't have a negative value
Coverity notices that `nir_get_io_index_src_number` could return -1, and
that we use it to index an array. It cannot understand that -1 only
happens for unhandled enum values, but all of these are handled. Add an
assert to help it out.

CID: 1667234
Fixes: 37a9c5411f ("brw: serialize messages on Gfx12.x if required")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit a5b9f428f9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Dylan Baker
b768139858 .pick_status.json: Update to 45a762727c
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Olivia Lee
498a25cfb8 hk: fix data race when initializing poly_heap
hk_heap is called during command buffer recording, which may be
concurrent, so writing dev->heap without synchronization is a data race.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
(cherry picked from commit bca29b1c92)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Connor Abbott
9728bbf7b0 tu: Also disable stencil load for attachments not in GMEM
We were accidentally still emitting loads for D32S8 resolve attachments.

Cc: mesa-stable
(cherry picked from commit a3652af380)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 08:07:30 -07:00
Connor Abbott
f142fdc273 tu: Fix 3d load path with D24S8 on a7xx
We need to always use the FMT6_Z24S8_AS_R8G8B8A8 format for GMEM even if
UBWC is disabled, as already done for the 2d store path. Because we
use the pre-baked RB_MRT_BUF_INFO register value, this means we have to
override it.

Cc: mesa-stable
(cherry picked from commit 9417ce287c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 08:07:28 -07:00
Connor Abbott
1c52a94428 tu: Don't patch GMEM for input attachments never in GMEM
This can happen if we resolve to a resolve attachment and then use that
resolve attachment as an input attachment in a later subpass. We don't
need to put it in GMEM, but it's still considered "written" because
input attachment reads need a dependency after the resolve.

MSRTSS input attachment tests effectively created such a scenario after
lowering to transient multisample attachments and inserting resolves.

Cc: mesa-stable
(cherry picked from commit d491a79027)

Conflicts:
	src/freedreno/vulkan/tu_pass.cc

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 08:07:28 -07:00
Faith Ekstrand
2cfd3c52b2 panvk/shader: Use the right copy size for deserializing dynamic UBOs/SSBOs
Fixes: 563823c9ca ("panvk: Implement vk_shader")
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 64ad337036)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:08 -07:00
Faith Ekstrand
606ebb042e panvk/shader: [de]serialize desc_info.max_varying_loads
Fixes: de86641d3f ("panvk: Limit AD allocation to max var loads in v9+")
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit a546484ed9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:07 -07:00
Samuel Pitoiset
424f37b348 radv: dirty dynamic descriptors when required
The user SGPRS might be different and dynamic descriptors need to be
re-emitted again

This fixes a regression with ANGLE, and VCKTS is currently missing
coverage.

Fixes: a47952d495 ("radv: upload and emit dynamic descriptors separately from push constants")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14146
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 54a6c81d3a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:06 -07:00
Faith Ekstrand
7f75931019 nvk: Capture/replay buffer addresses for EDB capture/replay
Fixes: 3f1c3f04be ("nvk: Advertise VK_EXT_descriptor_buffer")
(cherry picked from commit 998dbd43d3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:05 -07:00
Faith Ekstrand
ba107091c2 nvk: Look at the right pointer in GetDescriptorInfo for SSBOs
It doesn't actually matter but we shouldn't poke at the wrong union
field.

Fixes: 77db71db7d ("nvk: Implement GetDescriptorEXT")
(cherry picked from commit a13474939d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:05 -07:00
Faith Ekstrand
b74000dbce nvk: Emit inactive vertex attributes
VK_KHR_maintenance9 requires that vertex attributes in shaders which map
to vertex attributes that aren't bound at the API return a consistent
value.  In order to do this, we need toemit SET_VERTEX_ATTRIBUTE_A, even
for unused attributes.  The RGBA32F format was chosen to ensure we
return (0, 0, 0, 0) from unbound attributes.

Fixes: 7692d3c0e1 ("nvk: Advertise VK_KHR_maintenance9")
(cherry picked from commit d39221cef3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:04 -07:00
Mauro Rossi
fb2273df78 util: Fix gnu-empty-initializer error
Fixes the following building error happening with clang:

../src/util/os_file.c:291:29: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
   struct epoll_event evt = {};
                            ^
1 error generated.

Fixes: 17e28652 ("util: mimic KCMP_FILE via epoll when KCMP is missing")
Cc: "25.3"
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit 7bbbfa6670)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:03 -07:00
Connor Abbott
65eb3aed4b tu: Fix RT count with remapped color attachments
The index of each RT is the remapped color attachment index, so we have
to use the remapped indices when telling the HW the number of RTs.

This fixes KHR-GLES3.framebuffer_blit.scissor_blit on ANGLE once we
enabled VK_EXT_multisampled_render_to_single_sampled, which switched
ANGLE to using dynamic rendering with
VK_KHR_dynamic_rendering_local_read.

Fixes: d50eef5b06 ("tu: Support color attachment remapping")
(cherry picked from commit 8d276e0d70)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:02 -07:00
Lionel Landwerlin
a9653fa019 anv: destroy sets when destroying pool
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14169
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit 2689056c82)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:02 -07:00
Lionel Landwerlin
159d397437 anv/brw: fix output tcs vertices
brw_prog_tcs_data::instances can be divided by vertices per threads on
earlier generations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a91e0e0d61 ("brw: add support for separate tessellation shader compilation")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit e450297ea9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:01 -07:00
Xaver Hugl
6a7effe059 vulkan/wsi: remove support for VK_COLOR_SPACE_EXTENDED_SRGB_NONLINEAR_EXT
It's not really clear whether or not it should use gamma 2.2 or the piece-wise
transfer function, or how clients would use it for wider gamut in general.
Currently no compositors I know of support ext_srgb, so this shouldn't affect
applications in practice.

Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Fixes: 4b663d56 ("vulkan/wsi: implement support for VK_EXT_hdr_metadata on Wayland")
(cherry picked from commit 14fcf145e3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:00 -07:00
Konstantin Seurer
2a0a2cc5b0 aco: Fixup out_launch_size_y in the RT prolog for 1D dispatch
launch_size_y is set to ACO_RT_CONVERTED_2D_LAUNCH_SIZE for 1D
dispatches. The prolog needs to set it to 1 so that the app shader
loads the correct value.

cc: mesa-stable

(cherry picked from commit 47ffe2ecd4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:59 -07:00
Faith Ekstrand
3f9f4d79d3 nvk: Disable sampleLocationsSampleCounts for 1x MSAA
Suggested-by: Mel Henning <mhenning@darkrefraction.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14108
Fixes: a34edc7500 ("nvk: Fill out sample locations on Maxwell B+")
(cherry picked from commit aa0f404f7b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:59 -07:00
Faith Ekstrand
cd253df92a nvk: Include the chipset in the pipeline/binary cache UUID
Cc: mesa-stable
(cherry picked from commit d1793c7a59)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:58 -07:00
Lionel Landwerlin
bfd09d9891 nir/lower_io: add missing levels intrinsics to get_io_index_src_number
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7ac46a1d8 ("nir/lower_io: add get_io_index_src_number support for image intrinsics")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit aa929ea706)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:57 -07:00
Lionel Landwerlin
dcecd8fd1e brw: handle GLSL/GLSL tessellation parameters
Apparently various tessellation parameters come specified from
TESS_EVAL stage in GLSL while they come from the TESS_CTRL stage in
HLSL.

We switch to store the tesselation params more like shader_info with 0
values for unspecified fields. That let's us merge it with a simple OR
with values from from tcs/tes and the resulting merge can be used for
state programming.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a91e0e0d61 ("brw: add support for separate tessellation shader compilation")
Fixes: 50fd669294 ("anv: prep work for separate tessellation shaders")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit f3df267735)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:56 -07:00
Lionel Landwerlin
1648f759c1 anv: rename structure holding 3DSTATE_WM_DEPTH_STENCIL state
Cc stable for the next commit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit 8d05b7b72e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:55 -07:00
Valentine Burley
d5f7261ce5 tu: Fix maxVariableDescriptorCount with inline uniform blocks
It must not be larger than maxInlineUniformBlockSize.

Fixes VKCTS 1.4.4.0's
dEQP-VK.api.maintenance3_check.support_count_inline_uniform_block*.

Cc: mesa-stable

Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
(cherry picked from commit fd2fa0fbc9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:54 -07:00
Valentine Burley
2c1c52a8c8 tu: Fix indexing with variable descriptor count
Based on RADV.
The Vulkan spec says:
    "If bindingCount is zero or if this structure is not included in
     the pNext chain, the VkDescriptorBindingFlags for each descriptor
     set layout binding is considered to be zero. Otherwise, the
     descriptor set layout binding at
     VkDescriptorSetLayoutCreateInfo::pBindings[i] uses the flags in
     pBindingFlags[i]."

Fixes dEQP-VK.api.maintenance3_check.* in VKCTS 1.4.4.0.

Cc: mesa-stable

Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
(cherry picked from commit 17e25b4983)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:52 -07:00
Dylan Baker
fe3a3b08c9 .pick_status.json: Update to fd55e874ed
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:46 -07:00
Dylan Baker
d9812eaea8 VERSION: bump for rc2
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-10-22 16:13:33 -07:00
Benjamin Cheng
be191ceff7 radv/video_enc: Cleanup slice count assert
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was left over when first enabling multiple slice encoding.

Fixes: 63e952ff2c ("radv/video: Support encoding multiple slices")
Reviewed-by: David Rosca <david.rosca@amd.com>
(cherry picked from commit b6d6c1af73)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:39 -07:00
Pierre-Eric Pelloux-Prayer
49bfddbd11 radeonsi: propagate shader updates for merged shaders
In case of merged shaders (eg: VS+GS), a change to VS should trigger
a GS update.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13935
Fixes: b1a34ac95d ("radeonsi: change do_update_shaders boolean to a bitmask")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 90103fe618)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:39 -07:00
Faith Ekstrand
0182cde848 util: Build util/cache_ops_x86.c with -msse2
__builtin_ia32_clflush() requires -msse2 so we need to set -msse2 at
least for building that file.  Fortunately, there are no GPUs that
actually need userspace cache flushing that can ever be bolted onto a
pre-SSE2 x86 CPUs.

Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14134
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit efbecd93ba)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:38 -07:00
Faith Ekstrand
94ec7c686d util: Don't advertise cache ops on x86 without SSE2
Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit 3739d7a90c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:37 -07:00
Olivia Lee
4202ea6c7f panfrost: fix cl_local_size for precompiled shaders
nir_lower_compute_system_values will attempt to lower
load_workgroup_size unless workgroup_size_variable is set. For precomp
shaders, the workgroup size is set statically for each entrypoint by
nir_precompiled_build_variant. Because we call
lower_compute_system_values early, it sets the workgroup size to zero.
Temporarily setting workgroup_size_variable while we are still
processing all the entrypoints together inhibits this.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 20970bcd96 ("panfrost: Add base of OpenCL C infrastructure")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit a410d90fd2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:37 -07:00
Rhys Perry
10475e8ac1 amd/lower_mem_access_bit_sizes: fix shared access when bytes<bit_size/8
This can happen with (for example) 32x2 loads with
align_mul=4,align_offset=2.

This patch does bit_size=min(bit_size,bytes) to prevent num_components
from being 0.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 52cd5f7e69 ("ac/nir_lower_mem_access_bit_sizes: Split unsupported shared memory instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit b18421ae3d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:36 -07:00
Rhys Perry
c1cf6e75ae amd/lower_mem_access_bit_sizes: be more careful with 8/16-bit scratch load
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.3
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit e89b22280f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:34 -07:00
Rhys Perry
2b8675fd86 amd/lower_mem_access_bit_sizes: improve subdword/unaligned SMEM lowering
Summary of changes:
- handle unaligned 16-bit scalar loads when supported_dword=true
- increases the size of 8/16/32/64-bit buffer loads which are not dword
  aligned, which can create less SMEM loads.
- handles when "bytes" is less than "bit_size / 8"

fossil-db (gfx1201):
Totals from 26 (0.03% of 79839) affected shaders:
Instrs: 12676 -> 12710 (+0.27%); split: -0.30%, +0.57%
CodeSize: 67272 -> 67384 (+0.17%); split: -0.24%, +0.40%
Latency: 44399 -> 44375 (-0.05%); split: -0.09%, +0.04%
SClause: 352 -> 344 (-2.27%)
SALU: 3972 -> 3992 (+0.50%)
SMEM: 554 -> 528 (-4.69%)

fossil-db (navi21):
Totals from 6 (0.01% of 79825) affected shaders:
Instrs: 2192 -> 2186 (-0.27%)
CodeSize: 12188 -> 12140 (-0.39%)
Latency: 10037 -> 10033 (-0.04%); split: -0.12%, +0.08%
SMEM: 124 -> 118 (-4.84%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 8829fc3bd6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:32 -07:00
Rhys Perry
e967da84a8 amd/lower_mem_access_bit_sizes: don't create subdword UBO loads with LLVM
These are unsupported.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14127
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 79b2fa785d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:31 -07:00
Dylan Baker
2a8f2ff397 .pick_status.json: Update to e38491eb18
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:27 -07:00
Mel Henning
7a30a71c45 nvk: VK_DEPENDENCY_ASYMMETRIC_EVENT_BIT_KHR
This was missed in the original maintenance9 MR.

Fixes the flakes in test
dEQP-VK.synchronization2.op.single_queue.event.write_ssbo_compute_read_ssbo_compute.buffer_16384_maintenance9

Fixes: 7692d3c0 ("nvk: Advertise VK_KHR_maintenance9")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit 28fbc6addb)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:55 -07:00
Karol Herbst
9c57c0a194 nak: fix MMA latencies on Ampere
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 7a01953a39 ("nak: Add Ampere and Ada latency information")
(cherry picked from commit e7dca5a6ca)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:54 -07:00
Karol Herbst
425c49ebf2 nak: ensure deref has a ptr_stride in cmat load/store lowering
With untyped pointer we might get a deref_cast with a 0 ptr_stride. But we
were supposed to ignore the stride information on the pointer anyway, so
let's do that properly now.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 05dca16143 ("nak: extract nir_intrinsic_cmat_load lowering into a function")
(cherry picked from commit 3bbf3f7826)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:54 -07:00
Karol Herbst
7b7cb63a14 nak: extract cmat load/store element offset calculation
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 05dca16143 ("nak: extract nir_intrinsic_cmat_load lowering into a function")
(cherry picked from commit f632bfc715)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:53 -07:00
Faith Ekstrand
1941ada4a6 panvk: Fix integer dot product properties
We already set has_[su]dot_4x8[_sat] in nir_shader_compiler_options so
we're already getting the opcodes.  We just need to advertise the
features properly.  If bifrost_compile.h is to be believed, those are
all available starting at gen 9.

Closes: https://gitlab.freedesktop.org/panfrost/mesa/-/issues/218
Closes: https://gitlab.freedesktop.org/panfrost/mesa/-/issues/219
Fixes: f7f9b3d170 ("panvk: Move to vk_properties")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 38950083ae)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:52 -07:00
Lionel Landwerlin
e982234bb6 nir/divergence: fix handling of intel uniform block load
Those are normally uniform always, but for the purpose of fused
threads handling, we need to check their sources.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ca1533cd03 ("nir/divergence: add a new mode to cover fused threads on Intel HW")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 255d1e883d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:52 -07:00
Lionel Landwerlin
dbbadebe13 brw: fix ballot() type operations in shaders with HALT instructions
Fixes dEQP-VK.reconvergence.terminate_invocation.bit_count

LNL fossildb stats:

 Totals from 16489 (3.36% of 490184) affected shaders:
 Instrs: 3710499 -> 3710500 (+0.00%)
 Cycle count: 91601018 -> 90305642 (-1.41%); split: -1.81%, +0.40%
 Max dispatch width: 523936 -> 523952 (+0.00%); split: +0.02%, -0.01%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 757c042e39)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:51 -07:00
Lionel Landwerlin
0d100cc078 brw: only consider cross lane access on non scalar VGRFs
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1bff4f93ca ("brw: Basic infrastructure to store convergent values as scalars")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 70aa028f27)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:50 -07:00
Lionel Landwerlin
f656d062e3 brw: constant fold u2u16 conversion on MCS messages
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bddfbe7fb1 ("brw/blorp: lower MCS fetching in NIR")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit f48c9c3a37)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:50 -07:00
Mel Henning
847ad886d6 nvk: Really fix maxVariableDescriptorCount w/ iub
I didn't test "nvk: Fix maxVariableDescriptorCount with iub" as
thoroughly as I should have and it regressed
dEQP-VK.api.maintenance3_check.descriptor_set because we were then
violating the requirement that maxPerSetDescriptors describes a limit
that's guaranteed to be supported (and reported as supported in
GetDescriptorSetLayoutSupport).

That commit was also based on a misreading of nvk_nir_lower_descriptors.c
where I thought that the end offset of an inline uniform block needed to
be less than the size of a UBO. That is not the case - on closer
inspection that code gracefully falls back to placing IUBs in globablmem
if necessary. So, we can afford to be less strict about our IUB sizing
and only require that IUBs follow the existing limit imposed by
maxInlineUniformBlockSize.

Fixes: ff7f785f09 ("nvk: Fix maxVariableDescriptorCount with iub")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit 77cd629b34)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:48 -07:00
Emma Anholt
5dcc65643c nir/shrink_stores: Don't shrink stores to an invalid num_components.
Avoids a regression in the CL CTS on the next commit.

Fixes: 2dba7e6056 ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors")
(cherry picked from commit 537cc4e0ff)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:47 -07:00
Yiwei Zhang
ab7bda0a1b panvk: fix to advance vs res_table properly
Fix a regression from an unfortunate typo.

Fixes: 48e8d6d207 ("panfrost, panvk: The size of resource tables needs to be a multiple of 4.")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 387f75f43d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:29 -07:00
Yiwei Zhang
a02d8d5767 panvk: fix to advance vs driver_set properly
Should only set once outside the multidraw loop so that per draw can
patch its own own desc attribs when needed.

Fixes: a5a0dd3ccc ("panvk: Implement multiDrawIndirect for v10+")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 800c4d3430)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:29 -07:00
Timur Kristóf
13fa1460dd ac/nir/ngg_mesh: Lower num_subgroups to constant
Mesh shader workgroups always have the same amount of subgroups.

When the API workgroup size is the same as the real workgroup
size, this is a small optimization (using a constant instead of
a shader arg).

When the API workgroup size is smaller than the real workgroup
size (eg. when the number of output vertices or primitves is
greater than the API workgroup size on RDNA 2), this fixes a
potential bug because num_subgroups would return the "real"
workgroup size instead of the API one.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
(cherry picked from commit d20049b430)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:29 -07:00
Patrick Lerda
14544ef278 r600: update nplanes support
This change fixes "piglit/bin/ext_image_dma_buf_import-export -auto".

Fixes: 02aaf360ae ("r600: Implement resource_get_param")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit 84dc9af3d4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:28 -07:00
Patrick Lerda
602b4a2924 r600: fix r600_draw_rectangle refcnt imbalance
The object buf is referenced at the beginning of the
r600_draw_rectangle() function and should be freed
at the end. This issue was introduced with cbb6e0277f.

Fixes: cbb6e0277f ("r600: stop using util_set_vertex_buffers")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit 3b1e3a40a8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:28 -07:00
Jose Maria Casanova Crespo
717e8a8caf v3d: mark FRAG_RESULT_COLOR as output_written on SAND blits FS
With the introduction of "v3d: Add support for 16bit normalised
formats" https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35820
nir_lower_fragcolor is always called if shaders outputs_written shows
that FRAG_RESULT_COLOR is used.

But on SAND8/30 blit fragment shaders although the FRAG_RESULT_COLOR
is used, it was not marked as output_written so the lowering was not
applied.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14141
Fixes: ee48e81b26 ("v3d: Always lower frag color")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit a131530dd1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:27 -07:00
Emma Anholt
40ff53c5b8 wsi: Fix the flagging of dma_buf_sync_file for the amdgpu workaround.
In my regression fix, I covered one of the two paths that had stopped
setting the implicit_sync flag and thus triggered the amdgpu behavior we
don't want, but probably the less common one.

Fixes: f7cbc7b1c5 ("radv: Allocate BOs as implicit sync even if the WSI is doing implicit sync.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13942
(cherry picked from commit aa96444149)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:27 -07:00
Marek Olšák
bf9e1f2e37 winsys/radeon: fix completely broken tessellation for gfx6-7
The info was moved to radeon_info, but it was only set for the amdgpu
kernel driver. It was uninitialized for radeon.

Fixes: d82eda72a1 - ac/gpu_info: move HS info into radeon_info

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit f5b648f6d3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:26 -07:00
Benjamin Cheng
c3cf272a04 radv/video: Fill maxCodedExtent caps first
Later code (i.e. max qp map extent filling) depends on this.

Fixes: ae6ea69c85 ("radv: Implement VK_KHR_video_encode_quantization_map")
Reviewed-by: David Rosca <david.rosca@amd.com>
(cherry picked from commit b1370e1935)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:25 -07:00
Dylan Baker
30ba8880b4 .pick_status.json: Update to 28fbc6addb
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:24 -07:00
Job Noorman
42ab1c6f3c nir: mark fneg distribution through fadd/ffma as nsz
df1876f615 ("nir: Mark negative re-distribution on fadd as imprecise")
fixed the fadd case by marking it as imprecise. This commit fixes the
ffma case for the same reason.

However, "imprecise" isn't necessary and nowadays we have "nsz" which is
more accurate here. Use that for both fadd and ffma.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit ad421cdf2e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:30 -07:00
Josh Simmons
674e2a702a radv: Fix crash in sqtt due to uninitalized value
Fixes: 772b9ce411 ("radv: Remove qf from radv_spm/sqtt/perfcounter where applicable")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit b10c1a1952)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:29 -07:00
Mike Blumenkrantz
756618ee3b zink: consistently set/unset msrtss in begin_rendering
this has to always be set or unset, never persistent from previous renderpass

Fixes: 5080f2b6f5 ("zink: disable msrtss handling when blitting")
(cherry picked from commit f74cf45078)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:28 -07:00
Marek Olšák
ca7d2daf5f r300: fix DXTC blits
Fixes: 9d359c6d10 - gallium: delete pipe_surface::width and pipe_surface::height
(cherry picked from commit 733ba77bfe)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:27 -07:00
Xaver Hugl
45aafef631 vulkan/wsi: require extended target volume support for scRGB
It's hardly going to be useful without that

Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Fixes: 4b663d56 ("vulkan/wsi: implement support for VK_EXT_hdr_metadata on Wayland")
(cherry picked from commit 892cf427a0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:25 -07:00
Dylan Baker
8711394383 .pick_status.json: Mark c20e2733bf as denominated
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:23 -07:00
Dylan Baker
289c768e88 .pick_status.json: Update to ad421cdf2e
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:40:05 -07:00
Lionel Landwerlin
84655b4b5d anv: fix image-to-image copies of TileW images
The intermediate buffer between the 2 images is linear, its stride
should be a function of the tile's logical width.

Normally this should map to the values reported by ISL except for
TileW where for some reason it was decided to report 128 for TileW
instead of the actual 64 size (see isl_tiling_get_info() ISL_TILING_W
case)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 77fb8fb062)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-16 11:37:38 -07:00
Valentine Burley
fd6b9c70b6 docs: Update LAVA caching setup
After a recent change, `piglit-traces.sh` automatically sets the caching
proxy, so update the docs to reflect this.

Also update the name of the variable from `FDO_HTTP_CACHE_URI` to
`LAVA_HTTP_CACHE_URI`.

Fixes: fa74e939bf ("ci/piglit: automatically use LAVA proxy")

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit 28e73a6239)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-16 11:37:37 -07:00
Lionel Landwerlin
9bb7bf9c66 Revert "wsi: Implements scaling controls for DRI3 presentation."
This reverts commit a219308867.

It's failing most of the tests on Anv :

$ ./deqp-vk -n dEQP-VK.wsi.xlib.maintenance1.scaling.*

Test run totals:
  Passed:        88/2422 (3.6%)
  Failed:        576/2422 (23.8%)
  Not supported: 1758/2422 (72.6%)
  Warnings:      0/2422 (0.0%)
  Waived:        0/2422 (0.0%)

The only passing tests seem to be with this pattern :

 dEQP-VK.wsi.xlib.maintenance1.scaling.*.same_size_and_aspect

(cherry picked from commit 2baa3b8c06)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-16 11:37:36 -07:00
Dylan Baker
f510e6a1bd .pick_status.json: Update to 3b2f7ed918
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-16 11:37:29 -07:00
Dylan Baker
40f7bef16c VERSION: bump for 25.3.0-rc1
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-10-15 20:56:25 -07:00
147 changed files with 9140 additions and 1853 deletions

7682
.pick_status.json Normal file

File diff suppressed because it is too large Load diff

View file

@ -1 +1 @@
25.3.0-devel
25.3.0-rc4

View file

@ -122,9 +122,8 @@ Enable the site and restart nginx:
# Second download should be cached.
wget http://localhost/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo-v2.trace
Now, set ``download-url`` in your ``traces-*.yml`` entry to something like
``http://caching-proxy/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public``
and you should have cached downloads for traces. Add it to
``FDO_HTTP_CACHE_URI=`` in your ``config.toml`` runner environment lines and you
can use it for cached artifact downloads instead of going all the way to
freedesktop.org on each job.
The trace runner script automatically sets the caching proxy, so there's no
need to modify anything in the Mesa CI YAML files.
Add ``LAVA_HTTP_CACHE_URI=http://localhost/cache/?uri=`` to your ``config.toml``
runner environment lines and you can use it for cached artifact downloads
instead of going all the way to freedesktop.org on each job.

View file

@ -1489,8 +1489,6 @@ struct drm_amdgpu_info_hw_ip {
__u32 available_rings;
/** version info: bits 23:16 major, 15:8 minor, 7:0 revision */
__u32 ip_discovery_version;
/* Userq available slots */
__u32 userq_num_slots;
};
/* GFX metadata BO sizes and alignment info (in bytes) */

View file

@ -315,8 +315,6 @@ ac_query_gpu_info(int fd, void *dev_p, struct radeon_info *info,
info->ip[ip_type].num_queues = 1;
} else if (ip_info.available_rings) {
info->ip[ip_type].num_queues = util_bitcount(ip_info.available_rings);
} else if (ip_info.userq_num_slots) {
info->ip[ip_type].num_queue_slots = ip_info.userq_num_slots;
} else {
continue;
}
@ -1696,11 +1694,11 @@ void ac_print_gpu_info(const struct radeon_info *info, FILE *f)
fprintf(f, " clock_crystal_freq = %i KHz\n", info->clock_crystal_freq);
for (unsigned i = 0; i < AMD_NUM_IP_TYPES; i++) {
if (info->ip[i].num_queues || info->ip[i].num_queue_slots) {
fprintf(f, " IP %-7s %2u.%u \tqueues:%u \tqueue_slots:%u \talign:%u \tpad_dw:0x%x\n",
if (info->ip[i].num_queues) {
fprintf(f, " IP %-7s %2u.%u \tqueues:%u \talign:%u \tpad_dw:0x%x\n",
ac_get_ip_type_string(info, i),
info->ip[i].ver_major, info->ip[i].ver_minor, info->ip[i].num_queues,
info->ip[i].num_queue_slots,info->ip[i].ib_alignment, info->ip[i].ib_pad_dw_mask);
info->ip[i].ib_alignment, info->ip[i].ib_pad_dw_mask);
}
}

View file

@ -26,7 +26,6 @@ struct amd_ip_info {
uint8_t ver_minor;
uint8_t ver_rev;
uint8_t num_queues;
uint8_t num_queue_slots;
uint8_t num_instances;
uint32_t ib_alignment;
uint32_t ib_pad_dw_mask;

View file

@ -194,7 +194,6 @@ struct drm_amdgpu_info_hw_ip {
uint32_t ib_size_alignment;
uint32_t available_rings;
uint32_t ip_discovery_version;
uint32_t userq_num_slots;
};
struct drm_amdgpu_info_uq_fw_areas_gfx {

View file

@ -109,23 +109,37 @@ lower_mem_access_cb(nir_intrinsic_op intrin, uint8_t bytes, uint8_t bit_size, ui
nir_mem_access_size_align res;
if (intrin == nir_intrinsic_load_shared || intrin == nir_intrinsic_store_shared) {
/* Split unsupported shared access. */
res.bit_size = MIN2(bit_size, combined_align * 8ull);
res.align = res.bit_size / 8;
/* Don't use >64-bit LDS loads for performance reasons. */
unsigned max_bytes = intrin == nir_intrinsic_store_shared && cb_data->gfx_level >= GFX7 ? 16 : 8;
bytes = MIN3(bytes, combined_align, max_bytes);
bytes = bytes == 12 ? bytes : round_down_to_power_of_2(bytes);
/* Split unsupported shared access. */
res.bit_size = MIN2(bit_size, bytes * 8ull);
res.align = res.bit_size / 8;
res.num_components = bytes / res.align;
res.shift = nir_mem_access_shift_method_bytealign_amd;
return res;
}
const bool is_buffer_load = intrin == nir_intrinsic_load_ubo ||
intrin == nir_intrinsic_load_ssbo ||
intrin == nir_intrinsic_load_constant;
if (is_smem) {
const bool supported_subdword = cb_data->gfx_level >= GFX12 &&
intrin != nir_intrinsic_load_push_constant &&
(!cb_data->use_llvm || intrin != nir_intrinsic_load_ubo);
/* Round up subdword loads if unsupported. */
const bool supported_subdword = cb_data->gfx_level >= GFX12 && intrin != nir_intrinsic_load_push_constant;
if (bit_size < 32 && (bytes >= 3 || !supported_subdword))
if (bytes <= 2 && combined_align % bytes == 0 && supported_subdword) {
bit_size = bytes * 8;
} else if (bytes % 4 || combined_align % 4) {
if (is_buffer_load)
bytes += 4 - MIN2(combined_align, 4);
bytes = align(bytes, 4);
bit_size = 32;
}
/* Generally, require an alignment of 4. */
res.align = MIN2(4, bytes);
@ -138,9 +152,6 @@ lower_mem_access_cb(nir_intrinsic_op intrin, uint8_t bytes, uint8_t bit_size, ui
if (!util_is_power_of_two_nonzero(bytes) && (cb_data->gfx_level < GFX12 || bytes != 12)) {
const uint8_t larger = util_next_power_of_two(bytes);
const uint8_t smaller = larger / 2;
const bool is_buffer_load = intrin == nir_intrinsic_load_ubo ||
intrin == nir_intrinsic_load_ssbo ||
intrin == nir_intrinsic_load_constant;
const bool is_aligned = align_mul % smaller == 0;
/* Overfetch up to 1 dword if this is a bounds-checked buffer load or the access is aligned. */
@ -185,8 +196,8 @@ lower_mem_access_cb(nir_intrinsic_op intrin, uint8_t bytes, uint8_t bit_size, ui
const uint32_t max_pad = 4 - MIN2(combined_align, 4);
/* Global loads don't have bounds checking, so increasing the size might not be safe. */
if (intrin == nir_intrinsic_load_global || intrin == nir_intrinsic_load_global_constant) {
/* Global/scratch loads don't have bounds checking, so increasing the size might not be safe. */
if (!is_buffer_load) {
if (align_mul < 4) {
/* If we split the load, only lower it to 32-bit if this is a SMEM load. */
const unsigned chunk_bytes = align(bytes, 4) - max_pad;

View file

@ -508,6 +508,8 @@ lower_ms_intrinsic(nir_builder *b, nir_instr *instr, void *state)
return update_ms_barrier(b, intrin, s);
case nir_intrinsic_load_workgroup_index:
return lower_ms_load_workgroup_index(b, intrin, s);
case nir_intrinsic_load_num_subgroups:
return nir_imm_int(b, DIV_ROUND_UP(s->api_workgroup_size, s->wave_size));
case nir_intrinsic_set_vertex_and_primitive_count:
return lower_ms_set_vertex_and_primitive_count(b, intrin, s);
default:
@ -529,6 +531,7 @@ filter_ms_intrinsic(const nir_instr *instr,
intrin->intrinsic == nir_intrinsic_store_per_primitive_output ||
intrin->intrinsic == nir_intrinsic_barrier ||
intrin->intrinsic == nir_intrinsic_load_workgroup_index ||
intrin->intrinsic == nir_intrinsic_load_num_subgroups ||
intrin->intrinsic == nir_intrinsic_set_vertex_and_primitive_count;
}

View file

@ -214,6 +214,8 @@ select_rt_prolog(Program* program, ac_shader_config* config,
bld.sop2(Builder::s_cselect, Definition(vcc, bld.lm),
Operand::c32_or_c64(-1u, program->wave_size == 64),
Operand::c32_or_c64(0, program->wave_size == 64), Operand(scc, s1));
bld.sop2(aco_opcode::s_cselect_b32, Definition(out_launch_size_y, s1),
Operand(out_launch_size_y, s1), Operand::c32(1), Operand(scc, s1));
bld.vop2(aco_opcode::v_cndmask_b32, Definition(out_launch_ids[0], v1),
Operand(tmp_invocation_idx, v1), Operand(out_launch_ids[0], v1), Operand(vcc, bld.lm));
bld.vop2(aco_opcode::v_cndmask_b32, Definition(out_launch_ids[1], v1), Operand::zero(),

View file

@ -1329,7 +1329,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0x1,
.ip_discovery_version = 0xb0000,
.userq_num_slots = 2,
},
.hw_ip_compute = {
.hw_ip_version_major = 11,
@ -1339,7 +1338,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0xf,
.ip_discovery_version = 0xb0000,
.userq_num_slots = 16,
},
.fw_gfx_me = {
.ver = 1486,
@ -1460,7 +1458,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0x1,
.ip_discovery_version = 0xb0002,
.userq_num_slots = 0x0,
},
.hw_ip_compute = {
.hw_ip_version_major = 11,
@ -1470,7 +1467,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0xf,
.ip_discovery_version = 0xb0002,
.userq_num_slots = 0x0,
},
.fw_gfx_me = {
.ver = 2390,
@ -2070,7 +2066,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0x1,
.ip_discovery_version = 0xb0500,
.userq_num_slots = 2,
},
.hw_ip_compute = {
.hw_ip_version_major = 11,
@ -2080,7 +2075,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0xf,
.ip_discovery_version = 0xb0500,
.userq_num_slots = 16,
},
.fw_gfx_me = {
.ver = 29,
@ -2201,7 +2195,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0x1,
.ip_discovery_version = 0xc0001,
.userq_num_slots = 8,
},
.hw_ip_compute = {
.hw_ip_version_major = 12,
@ -2211,7 +2204,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0xf,
.ip_discovery_version = 0xc0001,
.userq_num_slots = 8,
},
.fw_gfx_me = {
.ver = 2590,

View file

@ -379,7 +379,6 @@ amdgpu_dump_hw_ips(int fd)
printf(" .ib_size_alignment = %u,\n", info.ib_size_alignment);
printf(" .available_rings = 0x%x,\n", info.available_rings);
printf(" .ip_discovery_version = 0x%04x,\n", info.ip_discovery_version);
printf(" .userq_num_slots = 0x%x,\n", info.userq_num_slots);
printf("},\n");
}
}

View file

@ -0,0 +1,35 @@
/*
* Copyright © 2025 Valve Corporation
*
* SPDX-License-Identifier: MIT
*/
#include "radv_device.h"
#include "radv_entrypoints.h"
#include "radv_image_view.h"
VKAPI_ATTR VkResult VKAPI_CALL
no_mans_sky_CreateImageView(VkDevice _device, const VkImageViewCreateInfo *pCreateInfo,
const VkAllocationCallbacks *pAllocator, VkImageView *pView)
{
VK_FROM_HANDLE(radv_device, device, _device);
VkResult result;
result = device->layer_dispatch.app.CreateImageView(_device, pCreateInfo, pAllocator, pView);
if (result != VK_SUCCESS)
return result;
VK_FROM_HANDLE(radv_image_view, iview, *pView);
if ((iview->vk.aspects == (VK_IMAGE_ASPECT_DEPTH_BIT | VK_IMAGE_ASPECT_STENCIL_BIT)) &&
(iview->vk.usage &
(VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_STORAGE_BIT | VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT))) {
/* No Man's Sky creates descriptors with depth/stencil aspects (only when Intel XESS is
* enabled apparently). and this is illegal in Vulkan. Ignore them by using NULL descriptors
* to workaroud GPU hangs.
*/
memset(&iview->descriptor, 0, sizeof(iview->descriptor));
}
return result;
}

View file

@ -21,6 +21,7 @@ radv_entrypoints_gen_command += [
'--device-prefix', 'metro_exodus',
'--device-prefix', 'rage2',
'--device-prefix', 'quantic_dream',
'--device-prefix', 'no_mans_sky',
# Command buffer annotation layer entrypoints
'--device-prefix', 'annotate',
@ -40,6 +41,7 @@ libradv_files = files(
'layers/radv_metro_exodus.c',
'layers/radv_rage2.c',
'layers/radv_quantic_dream.c',
'layers/radv_no_mans_sky.c',
'layers/radv_rmv_layer.c',
'layers/radv_rra_layer.c',
'layers/radv_sqtt_layer.c',

View file

@ -6111,6 +6111,13 @@ radv_emit_tess_domain_origin_state(struct radv_cmd_buffer *cmd_buffer)
radeon_end();
}
static bool
radv_is_dual_src_enabled(const struct radv_dynamic_state *dynamic_state)
{
/* Dual-source blending must be ignored if blending isn't enabled for MRT0. */
return dynamic_state->blend_eq.mrt0_is_dual_src && !!(dynamic_state->color_blend_enable & 1u);
}
static struct radv_shader_part *
lookup_ps_epilog(struct radv_cmd_buffer *cmd_buffer)
{
@ -6144,7 +6151,7 @@ lookup_ps_epilog(struct radv_cmd_buffer *cmd_buffer)
state.color_write_mask = d->color_write_mask;
state.color_blend_enable = d->color_blend_enable;
state.mrt0_is_dual_src = d->blend_eq.mrt0_is_dual_src;
state.mrt0_is_dual_src = radv_is_dual_src_enabled(&cmd_buffer->state.dynamic);
if (d->vk.ms.alpha_to_coverage_enable) {
/* Select a color export format with alpha when alpha to coverage is enabled. */
@ -8114,6 +8121,8 @@ radv_mark_descriptors_dirty(struct radv_cmd_buffer *cmd_buffer, VkPipelineBindPo
struct radv_descriptor_state *descriptors_state = radv_get_descriptors_state(cmd_buffer, bind_point);
descriptors_state->dirty |= descriptors_state->valid;
if (descriptors_state->dynamic_offset_count)
descriptors_state->dirty_dynamic = true;
}
static void
@ -8642,7 +8651,6 @@ radv_CmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipeline
if (cmd_buffer->state.compute_pipeline == compute_pipeline)
return;
radv_mark_descriptors_dirty(cmd_buffer, pipelineBindPoint);
radv_bind_shader(cmd_buffer, compute_pipeline->base.shaders[MESA_SHADER_COMPUTE], MESA_SHADER_COMPUTE);
@ -8656,7 +8664,6 @@ radv_CmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipeline
if (cmd_buffer->state.rt_pipeline == rt_pipeline)
return;
radv_mark_descriptors_dirty(cmd_buffer, pipelineBindPoint);
radv_bind_shader(cmd_buffer, rt_pipeline->base.base.shaders[MESA_SHADER_INTERSECTION], MESA_SHADER_INTERSECTION);
radv_bind_rt_prolog(cmd_buffer, rt_pipeline->prolog);
@ -8690,7 +8697,6 @@ radv_CmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipeline
if (cmd_buffer->state.graphics_pipeline == graphics_pipeline)
return;
radv_mark_descriptors_dirty(cmd_buffer, pipelineBindPoint);
radv_foreach_stage (
stage, (cmd_buffer->state.active_stages | graphics_pipeline->active_stages) & RADV_GRAPHICS_STAGE_BITS) {
@ -8744,6 +8750,8 @@ radv_CmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipeline
cmd_buffer->descriptors[vk_to_bind_point(pipelineBindPoint)].dynamic_offset_count = pipeline->dynamic_offset_count;
cmd_buffer->descriptors[vk_to_bind_point(pipelineBindPoint)].need_indirect_descriptors =
pipeline->need_indirect_descriptors;
radv_mark_descriptors_dirty(cmd_buffer, pipelineBindPoint);
}
VKAPI_ATTR void VKAPI_CALL
@ -11688,7 +11696,7 @@ radv_emit_cb_render_state(struct radv_cmd_buffer *cmd_buffer)
const struct radv_rendering_state *render = &cmd_buffer->state.render;
const struct radv_dynamic_state *d = &cmd_buffer->state.dynamic;
unsigned cb_blend_control[MAX_RTS], sx_mrt_blend_opt[MAX_RTS];
const bool mrt0_is_dual_src = d->blend_eq.mrt0_is_dual_src;
const bool mrt0_is_dual_src = radv_is_dual_src_enabled(&cmd_buffer->state.dynamic);
uint32_t cb_color_control = 0;
const uint32_t cb_target_mask = d->color_write_enable & d->color_write_mask;

View file

@ -792,6 +792,8 @@ init_dispatch_tables(struct radv_device *device, struct radv_physical_device *pd
add_entrypoints(&b, &rage2_device_entrypoints, RADV_APP_DISPATCH_TABLE);
} else if (!strcmp(instance->drirc.debug.app_layer, "quanticdream")) {
add_entrypoints(&b, &quantic_dream_device_entrypoints, RADV_APP_DISPATCH_TABLE);
} else if (!strcmp(instance->drirc.debug.app_layer, "no_mans_sky")) {
add_entrypoints(&b, &no_mans_sky_device_entrypoints, RADV_APP_DISPATCH_TABLE);
}
if (instance->vk.trace_mode & RADV_TRACE_MODE_RGP)

View file

@ -200,6 +200,7 @@ static const driOptionDescription radv_dri_options[] = {
DRI_CONF_RADV_EMULATE_RT(false)
DRI_CONF_RADV_ENABLE_FLOAT16_GFX8(false)
DRI_CONF_RADV_COOPERATIVE_MATRIX2_NV(false)
DRI_CONF_RADV_NO_IMPLICIT_VARYING_SUBGROUP_SIZE(false)
DRI_CONF_SECTION_END
};
// clang-format on
@ -236,6 +237,8 @@ radv_init_dri_debug_options(struct radv_instance *instance)
drirc->debug.ssbo_non_uniform = driQueryOptionb(&drirc->options, "radv_ssbo_non_uniform");
drirc->debug.tex_non_uniform = driQueryOptionb(&drirc->options, "radv_tex_non_uniform");
drirc->debug.zero_vram = driQueryOptionb(&drirc->options, "radv_zero_vram");
drirc->debug.no_implicit_varying_subgroup_size =
driQueryOptionb(&drirc->options, "radv_no_implicit_varying_subgroup_size");
drirc->debug.app_layer = driQueryOptionstr(&drirc->options, "radv_app_layer");
drirc->debug.override_uniform_offset_alignment =

View file

@ -57,6 +57,7 @@ struct radv_drirc {
bool ssbo_non_uniform;
bool tex_non_uniform;
bool zero_vram;
bool no_implicit_varying_subgroup_size;
char *app_layer;
int override_uniform_offset_alignment;
} debug;

View file

@ -252,6 +252,7 @@ radv_physical_device_init_cache_key(struct radv_physical_device *pdev)
key->use_llvm = pdev->use_llvm;
key->use_ngg = pdev->use_ngg;
key->use_ngg_culling = pdev->use_ngg_culling;
key->no_implicit_varying_subgroup_size = instance->drirc.debug.no_implicit_varying_subgroup_size;
}
static int

View file

@ -64,8 +64,9 @@ struct radv_physical_device_cache_key {
uint32_t use_llvm : 1;
uint32_t use_ngg : 1;
uint32_t use_ngg_culling : 1;
uint32_t no_implicit_varying_subgroup_size : 1;
uint32_t reserved : 10;
uint32_t reserved : 9;
};
enum radv_video_enc_hw_ver {

View file

@ -2383,8 +2383,9 @@ radv_GetQueryPoolResults(VkDevice _device, VkQueryPool queryPool, uint32_t first
break;
}
case VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR: {
const bool write_memory = radv_video_write_memory_supported(pdev) == RADV_VIDEO_WRITE_MEMORY_SUPPORT_FULL;
uint32_t *src32 = (uint32_t *)src;
uint32_t ready_idx = radv_video_write_memory_supported(pdev) ? RADV_ENC_FEEDBACK_STATUS_IDX : 1;
uint32_t ready_idx = write_memory ? RADV_ENC_FEEDBACK_STATUS_IDX : 1;
uint32_t value;
do {
value = p_atomic_read(&src32[ready_idx]);

View file

@ -367,6 +367,10 @@ radv_shader_choose_subgroup_size(struct radv_device *device, nir_shader *nir,
.requiredSubgroupSize = stage_key->subgroup_required_size * 32,
};
/* Do not allow for the SPIR-V 1.6 varying subgroup size rules. */
if (pdev->cache_key.no_implicit_varying_subgroup_size)
spirv_version = 0x10000;
vk_set_subgroup_size(&device->vk, nir, spirv_version, rss_info.requiredSubgroupSize ? &rss_info : NULL,
stage_key->subgroup_allow_varying, stage_key->subgroup_require_full);

View file

@ -508,7 +508,9 @@ radv_begin_sqtt(struct radv_queue *queue)
device->sqtt.start_cs[family] = NULL;
}
cs.b = ws->cs_create(ws, radv_queue_ring(queue), false);
radv_init_cmd_stream(&cs, radv_queue_ring(queue));
cs.b = ws->cs_create(ws, cs.hw_ip, false);
if (!cs.b)
return false;
@ -585,7 +587,9 @@ radv_end_sqtt(struct radv_queue *queue)
device->sqtt.stop_cs[family] = NULL;
}
cs.b = ws->cs_create(ws, radv_queue_ring(queue), false);
radv_init_cmd_stream(&cs, radv_queue_ring(queue));
cs.b = ws->cs_create(ws, cs.hw_ip, false);
if (!cs.b)
return false;

View file

@ -149,10 +149,16 @@ radv_vcn_write_memory(struct radv_cmd_buffer *cmd_buffer, uint64_t va, unsigned
struct radv_physical_device *pdev = radv_device_physical(device);
struct rvcn_sq_var sq;
struct radv_cmd_stream *cs = cmd_buffer->cs;
enum radv_video_write_memory_support support = radv_video_write_memory_supported(pdev);
if (!radv_video_write_memory_supported(pdev))
if (support == RADV_VIDEO_WRITE_MEMORY_SUPPORT_NONE)
return;
if (support == RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS) {
fprintf(stderr, "radv: VCN WRITE_MEMORY requires PCIe atomics support. Expect issues "
"if PCIe atomics are not enabled on current device.\n");
}
bool separate_queue = pdev->vid_decode_ip != AMD_IP_VCN_UNIFIED;
if (cmd_buffer->qf == RADV_QUEUE_VIDEO_DEC && separate_queue && pdev->vid_dec_reg.data2) {
radeon_check_space(device->ws, cs->b, 8);
@ -819,6 +825,32 @@ radv_GetPhysicalDeviceVideoCapabilitiesKHR(VkPhysicalDevice physicalDevice, cons
if (cap && !cap->valid)
cap = NULL;
if (cap) {
pCapabilities->maxCodedExtent.width = cap->max_width;
pCapabilities->maxCodedExtent.height = cap->max_height;
} else {
switch (pVideoProfile->videoCodecOperation) {
case VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR:
pCapabilities->maxCodedExtent.width = (pdev->info.family < CHIP_TONGA) ? 2048 : 4096;
pCapabilities->maxCodedExtent.height = (pdev->info.family < CHIP_TONGA) ? 1152 : 4096;
break;
case VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR:
pCapabilities->maxCodedExtent.width =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 2048 : 4096) : 8192;
pCapabilities->maxCodedExtent.height =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 1152 : 4096) : 4352;
break;
case VK_VIDEO_CODEC_OPERATION_DECODE_VP9_BIT_KHR:
pCapabilities->maxCodedExtent.width =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 2048 : 4096) : 8192;
pCapabilities->maxCodedExtent.height =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 1152 : 4096) : 4352;
break;
default:
break;
}
}
pCapabilities->flags = 0;
pCapabilities->pictureAccessGranularity.width = VK_VIDEO_H264_MACROBLOCK_WIDTH;
pCapabilities->pictureAccessGranularity.height = VK_VIDEO_H264_MACROBLOCK_HEIGHT;
@ -1126,32 +1158,6 @@ radv_GetPhysicalDeviceVideoCapabilitiesKHR(VkPhysicalDevice physicalDevice, cons
break;
}
if (cap) {
pCapabilities->maxCodedExtent.width = cap->max_width;
pCapabilities->maxCodedExtent.height = cap->max_height;
} else {
switch (pVideoProfile->videoCodecOperation) {
case VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR:
pCapabilities->maxCodedExtent.width = (pdev->info.family < CHIP_TONGA) ? 2048 : 4096;
pCapabilities->maxCodedExtent.height = (pdev->info.family < CHIP_TONGA) ? 1152 : 4096;
break;
case VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR:
pCapabilities->maxCodedExtent.width =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 2048 : 4096) : 8192;
pCapabilities->maxCodedExtent.height =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 1152 : 4096) : 4352;
break;
case VK_VIDEO_CODEC_OPERATION_DECODE_VP9_BIT_KHR:
pCapabilities->maxCodedExtent.width =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 2048 : 4096) : 8192;
pCapabilities->maxCodedExtent.height =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 1152 : 4096) : 4352;
break;
default:
break;
}
}
return VK_SUCCESS;
}

View file

@ -73,6 +73,19 @@ struct radv_video_session {
bool session_initialized;
};
/**
* WRITE_MEMORY support in FW.
*
* none: Not supported at all. Old VCN FW and all UVD.
* pcie_atomics: Supported, relies on PCIe atomics.
* full: Supported, works also without PCIe atomics.
*/
enum radv_video_write_memory_support {
RADV_VIDEO_WRITE_MEMORY_SUPPORT_NONE = 0,
RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS,
RADV_VIDEO_WRITE_MEMORY_SUPPORT_FULL,
};
VK_DEFINE_NONDISP_HANDLE_CASTS(radv_video_session, vk.base, VkVideoSessionKHR, VK_OBJECT_TYPE_VIDEO_SESSION_KHR)
void radv_init_physical_device_decoder(struct radv_physical_device *pdev);
@ -98,7 +111,7 @@ void radv_video_get_enc_dpb_image(struct radv_device *device, const struct VkVid
bool radv_video_decode_vp9_supported(const struct radv_physical_device *pdev);
bool radv_video_encode_av1_supported(const struct radv_physical_device *pdev);
bool radv_video_encode_qp_map_supported(const struct radv_physical_device *pdev);
bool radv_video_write_memory_supported(const struct radv_physical_device *pdev);
enum radv_video_write_memory_support radv_video_write_memory_supported(const struct radv_physical_device *pdev);
uint32_t radv_video_get_qp_map_texel_size(VkVideoCodecOperationFlagBitsKHR codec);
bool radv_check_vcn_fw_version(const struct radv_physical_device *pdev, uint32_t dec, uint32_t enc, uint32_t rev);

View file

@ -890,7 +890,6 @@ radv_enc_slice_header(struct radv_cmd_buffer *cmd_buffer, const VkVideoEncodeInf
uint32_t num_bits[RENCODE_SLICE_HEADER_TEMPLATE_MAX_NUM_INSTRUCTIONS] = {0};
const struct VkVideoEncodeH264PictureInfoKHR *h264_picture_info =
vk_find_struct_const(enc_info->pNext, VIDEO_ENCODE_H264_PICTURE_INFO_KHR);
int slice_count = h264_picture_info->naluSliceEntryCount;
const StdVideoEncodeH264PictureInfo *pic = h264_picture_info->pStdPictureInfo;
const StdVideoH264SequenceParameterSet *sps =
vk_video_find_h264_enc_std_sps(cmd_buffer->video.params, pic->seq_parameter_set_id);
@ -903,8 +902,6 @@ radv_enc_slice_header(struct radv_cmd_buffer *cmd_buffer, const VkVideoEncodeInf
unsigned int cdw_filled = 0;
unsigned int bits_copied = 0;
assert(slice_count <= 1);
struct radv_device *device = radv_cmd_buffer_device(cmd_buffer);
const struct radv_physical_device *pdev = radv_device_physical(device);
struct radv_cmd_stream *cs = cmd_buffer->cs;
@ -2861,7 +2858,8 @@ radv_vcn_encode_video(struct radv_cmd_buffer *cmd_buffer, const VkVideoEncodeInf
if (pdev->enc_hw_ver >= RADV_VIDEO_ENC_HW_2) {
radv_vcn_sq_tail(cs, &cmd_buffer->video.sq);
radv_vcn_write_memory(cmd_buffer, feedback_query_va + RADV_ENC_FEEDBACK_STATUS_IDX * sizeof(uint32_t), 1);
if (radv_video_write_memory_supported(pdev) == RADV_VIDEO_WRITE_MEMORY_SUPPORT_FULL)
radv_vcn_write_memory(cmd_buffer, feedback_query_va + RADV_ENC_FEEDBACK_STATUS_IDX * sizeof(uint32_t), 1);
}
}
@ -3166,6 +3164,36 @@ radv_video_patch_encode_session_parameters(struct radv_device *device, struct vk
}
break;
case VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR: {
for (unsigned i = 0; i < params->h265_enc.h265_sps_count; i++) {
uint32_t pic_width_in_luma_samples =
params->h265_enc.h265_sps[i].base.pic_width_in_luma_samples;
uint32_t pic_height_in_luma_samples =
params->h265_enc.h265_sps[i].base.pic_height_in_luma_samples;
uint32_t aligned_pic_width = align(pic_width_in_luma_samples, 64);
uint32_t aligned_pic_height = align(pic_height_in_luma_samples, 16);
/* Override the unaligned pic_{width,height} and make up for it with conformance window
* cropping */
params->h265_enc.h265_sps[i].base.pic_width_in_luma_samples = aligned_pic_width;
params->h265_enc.h265_sps[i].base.pic_height_in_luma_samples = aligned_pic_height;
if (aligned_pic_width != pic_width_in_luma_samples ||
aligned_pic_height != pic_height_in_luma_samples) {
params->h265_enc.h265_sps[i].base.flags.conformance_window_flag = 1;
params->h265_enc.h265_sps[i].base.conf_win_right_offset +=
(aligned_pic_width - pic_width_in_luma_samples) / 2;
params->h265_enc.h265_sps[i].base.conf_win_bottom_offset +=
(aligned_pic_height - pic_height_in_luma_samples) / 2;
}
/* VCN supports only the following block sizes (resulting in 64x64 CTBs with any coding
* block size) */
params->h265_enc.h265_sps[i].base.log2_min_luma_coding_block_size_minus3 = 0;
params->h265_enc.h265_sps[i].base.log2_diff_max_min_luma_coding_block_size = 3;
params->h265_enc.h265_sps[i].base.log2_min_luma_transform_block_size_minus2 = 0;
params->h265_enc.h265_sps[i].base.log2_diff_max_min_luma_transform_block_size = 3;
}
for (unsigned i = 0; i < params->h265_enc.h265_pps_count; i++) {
/* cu_qp_delta needs to be enabled if rate control is enabled. VCN2 and newer can also enable
* it with rate control disabled. Since we don't know what rate control will be used, we
@ -3268,6 +3296,14 @@ radv_GetEncodedVideoSessionParametersKHR(VkDevice device,
assert(sps);
char *data_ptr = pData ? (char *)pData + vps_size : NULL;
vk_video_encode_h265_sps(sps, size_limit, &sps_size, data_ptr);
if (pFeedbackInfo) {
struct VkVideoEncodeH265SessionParametersFeedbackInfoKHR *h265_feedback_info =
vk_find_struct(pFeedbackInfo->pNext, VIDEO_ENCODE_H265_SESSION_PARAMETERS_FEEDBACK_INFO_KHR);
pFeedbackInfo->hasOverrides = VK_TRUE;
if (h265_feedback_info)
h265_feedback_info->hasStdSPSOverrides = VK_TRUE;
}
}
if (h265_get_info->writeStdPPS) {
const StdVideoH265PictureParameterSet *pps = vk_video_find_h265_enc_std_pps(templ, h265_get_info->stdPPSId);
@ -3421,17 +3457,20 @@ radv_video_encode_qp_map_supported(const struct radv_physical_device *pdev)
return true;
}
bool
enum radv_video_write_memory_support
radv_video_write_memory_supported(const struct radv_physical_device *pdev)
{
if (pdev->info.vcn_ip_version >= VCN_5_0_0)
return true;
else if (pdev->info.vcn_ip_version >= VCN_4_0_0)
return pdev->info.vcn_enc_minor_version >= 22;
else if (pdev->info.vcn_ip_version >= VCN_3_0_0)
return pdev->info.vcn_enc_minor_version >= 33;
else if (pdev->info.vcn_ip_version >= VCN_2_0_0)
return pdev->info.vcn_enc_minor_version >= 24;
else /* VCN 1 and UVD */
return false;
if (pdev->info.vcn_ip_version >= VCN_5_0_0) {
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS;
} else if (pdev->info.vcn_ip_version >= VCN_4_0_0) {
if (pdev->info.vcn_enc_minor_version >= 22)
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS;
} else if (pdev->info.vcn_ip_version >= VCN_3_0_0) {
if (pdev->info.vcn_enc_minor_version >= 33)
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS;
} else if (pdev->info.vcn_ip_version >= VCN_2_0_0) {
if (pdev->info.vcn_enc_minor_version >= 24)
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS;
}
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_NONE;
}

File diff suppressed because it is too large Load diff

View file

@ -24,7 +24,8 @@ ail_initialize_linear(struct ail_layout *layout)
layout->layer_stride_B = align64(
(uint64_t)layout->linear_stride_B * layout->height_px, AIL_CACHELINE);
layout->size_B = layout->layer_stride_B * layout->depth_px;
layout->size_B =
layout->level_offsets_B[0] + (layout->layer_stride_B * layout->depth_px);
}
/*
@ -341,6 +342,7 @@ ail_make_miptree(struct ail_layout *layout)
assert(layout->linear_stride_B == 0 && "Invalid nonlinear layout");
assert(layout->levels >= 1 && "Invalid dimensions");
assert(layout->sample_count_sa >= 1 && "Invalid sample count");
assert(layout->level_offsets_B[0] == 0 && "Invalid offset");
}
assert(!(layout->writeable_image && layout->compressed) &&

View file

@ -133,6 +133,7 @@ agx_virtio_bo_bind(struct agx_device *dev, struct drm_asahi_gem_bind_op *ops,
memcpy(req->payload, ops, payload_size);
int ret = vdrm_send_req(dev->vdrm, &req->hdr, false);
free(req);
if (ret) {
fprintf(stderr, "ASAHI_CCMD_GEM_BIND failed: %d\n", ret);
}

View file

@ -992,28 +992,34 @@ hk_CmdEndRendering(VkCommandBuffer commandBuffer)
}
}
static void
hk_init_heap(const void *data) {
struct hk_cmd_buffer *cmd = (struct hk_cmd_buffer *) data;
struct hk_device *dev = hk_cmd_buffer_device(cmd);
perf_debug(cmd, "Allocating heap");
size_t size = 128 * 1024 * 1024;
dev->heap = agx_bo_create(&dev->dev, size, 0, 0, "Geometry heap");
/* The geometry state buffer is initialized here and then is treated by
* the CPU as rodata, even though the GPU uses it for scratch internally.
*/
off_t off = dev->rodata.heap - dev->rodata.bo->va->addr;
struct agx_heap *map = agx_bo_map(dev->rodata.bo) + off;
*map = (struct agx_heap){
.base = dev->heap->va->addr,
.size = size,
};
}
static uint64_t
hk_heap(struct hk_cmd_buffer *cmd)
{
struct hk_device *dev = hk_cmd_buffer_device(cmd);
if (unlikely(!dev->heap)) {
perf_debug(cmd, "Allocating heap");
size_t size = 128 * 1024 * 1024;
dev->heap = agx_bo_create(&dev->dev, size, 0, 0, "Geometry heap");
/* The geometry state buffer is initialized here and then is treated by
* the CPU as rodata, even though the GPU uses it for scratch internally.
*/
off_t off = dev->rodata.heap - dev->rodata.bo->va->addr;
struct agx_heap *map = agx_bo_map(dev->rodata.bo) + off;
*map = (struct agx_heap){
.base = dev->heap->va->addr,
.size = size,
};
}
util_call_once_data(&dev->heap_init_once, hk_init_heap, cmd);
/* We need to free all allocations after each command buffer execution */
if (!cmd->uses_heap) {

View file

@ -330,6 +330,7 @@ hk_GetDescriptorSetLayoutSupport(
uint64_t non_variable_size = 0;
uint32_t variable_stride = 0;
uint32_t variable_count = 0;
bool variable_is_inline_uniform_block = false;
uint8_t dynamic_buffer_count = 0;
for (uint32_t i = 0; i < pCreateInfo->bindingCount; i++) {
@ -362,6 +363,10 @@ hk_GetDescriptorSetLayoutSupport(
*/
variable_count = MAX2(1, binding->descriptorCount);
variable_stride = stride;
variable_is_inline_uniform_block =
binding->descriptorType ==
VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK;
} else {
/* Since we're aligning to the maximum and since this is just a
* check for whether or not the max buffer size is big enough, we
@ -393,12 +398,21 @@ hk_GetDescriptorSetLayoutSupport(
switch (ext->sType) {
case VK_STRUCTURE_TYPE_DESCRIPTOR_SET_VARIABLE_DESCRIPTOR_COUNT_LAYOUT_SUPPORT: {
VkDescriptorSetVariableDescriptorCountLayoutSupport *vs = (void *)ext;
uint32_t max_var_count;
if (variable_stride > 0) {
vs->maxVariableDescriptorCount =
max_var_count =
(max_buffer_size - non_variable_size) / variable_stride;
} else {
vs->maxVariableDescriptorCount = 0;
max_var_count = 0;
}
if (variable_is_inline_uniform_block) {
max_var_count =
MIN2(max_var_count, HK_MAX_INLINE_UNIFORM_BLOCK_SIZE);
}
vs->maxVariableDescriptorCount = max_var_count;
break;
}

View file

@ -92,6 +92,7 @@ struct hk_device {
* expected to be a legitimate problem. If it is, we can rework later.
*/
struct agx_bo *heap;
util_once_flag heap_init_once;
struct {
struct agx_scratch vs, fs, cs;

View file

@ -67,7 +67,7 @@ get_drm_format_modifier_properties_list(
{
*out_props = (VkDrmFormatModifierPropertiesEXT){
.drmFormatModifier = mod,
.drmFormatModifierPlaneCount = 1 /* no planar mods */,
.drmFormatModifierPlaneCount = vk_format_get_plane_count(vk_format),
.drmFormatModifierTilingFeatures = flags,
};
};
@ -96,7 +96,7 @@ get_drm_format_modifier_properties_list_2(
{
*out_props = (VkDrmFormatModifierProperties2EXT){
.drmFormatModifier = mod,
.drmFormatModifierPlaneCount = 1, /* no planar mods */
.drmFormatModifierPlaneCount = vk_format_get_plane_count(vk_format),
.drmFormatModifierTilingFeatures = flags,
};
};

View file

@ -1424,6 +1424,13 @@ hk_copy_memory_to_image(struct hk_device *device, struct hk_image *dst_image,
uint32_t src_height = info->memoryImageHeight ?: extent.height;
uint32_t blocksize_B = util_format_get_blocksize(layout->format);
/* Align width and height to block */
src_width =
DIV_ROUND_UP(src_width, util_format_get_blockwidth(layout->format));
src_height =
DIV_ROUND_UP(src_height, util_format_get_blockheight(layout->format));
uint32_t src_pitch = src_width * blocksize_B;
unsigned start_layer = (dst_image->vk.image_type == VK_IMAGE_TYPE_3D)
@ -1496,6 +1503,13 @@ hk_copy_image_to_memory(struct hk_device *device, struct hk_image *src_image,
#endif
uint32_t blocksize_B = util_format_get_blocksize(layout->format);
/* Align width and height to block */
dst_width =
DIV_ROUND_UP(dst_width, util_format_get_blockwidth(layout->format));
dst_height =
DIV_ROUND_UP(dst_height, util_format_get_blockheight(layout->format));
uint32_t dst_pitch = dst_width * blocksize_B;
unsigned start_layer = (src_image->vk.image_type == VK_IMAGE_TYPE_3D)
@ -1649,11 +1663,6 @@ hk_copy_image_to_image_cpu(struct hk_device *device, struct hk_image *src_image,
&device->physical_device->ubwc_config);
#endif
} else {
/* Work tile-by-tile, holding the unswizzled tile in a temporary
* buffer.
*/
char temp_tile[16384];
unsigned src_level = info->srcSubresource.mipLevel;
unsigned dst_level = info->dstSubresource.mipLevel;
uint32_t block_width = src_layout->tilesize_el[src_level].width_el;
@ -1667,6 +1676,12 @@ hk_copy_image_to_image_cpu(struct hk_device *device, struct hk_image *src_image,
}
uint32_t temp_pitch = block_width * src_block_B;
size_t temp_tile_size = temp_pitch * (src_offset.y + extent.height);
/* Work tile-by-tile, holding the unswizzled tile in a temporary
* buffer.
*/
char *temp_tile = malloc(temp_tile_size);
for (unsigned by = src_offset.y / block_height;
by * block_height < src_offset.y + extent.height; by++) {
@ -1683,14 +1698,14 @@ hk_copy_image_to_image_cpu(struct hk_device *device, struct hk_image *src_image,
MIN2((bx + 1) * block_width, src_offset.x + extent.width) -
src_x_start;
assert(height * temp_pitch <= ARRAY_SIZE(temp_tile));
ail_detile((void *)src, temp_tile, src_layout, src_level,
temp_pitch, src_x_start, src_y_start, width, height);
ail_tile(dst, temp_tile, dst_layout, dst_level, temp_pitch,
dst_x_start, dst_y_start, width, height);
}
}
free(temp_tile);
}
}
}

View file

@ -859,7 +859,7 @@ hk_get_device_properties(const struct agx_device *dev,
.maxSubgroupSize = 32,
.maxComputeWorkgroupSubgroups = 1024 / 32,
.requiredSubgroupSizeStages = 0,
.maxInlineUniformBlockSize = 1 << 16,
.maxInlineUniformBlockSize = HK_MAX_INLINE_UNIFORM_BLOCK_SIZE,
.maxPerStageDescriptorInlineUniformBlocks = 32,
.maxPerStageDescriptorUpdateAfterBindInlineUniformBlocks = 32,
.maxDescriptorSetInlineUniformBlocks = 6 * 32,
@ -953,7 +953,7 @@ hk_get_device_properties(const struct agx_device *dev,
.robustUniformBufferAccessSizeAlignment = HK_MIN_UBO_ALIGNMENT,
/* VK_EXT_sample_locations */
.sampleLocationSampleCounts = sample_counts,
.sampleLocationSampleCounts = sample_counts & ~VK_SAMPLE_COUNT_1_BIT,
.maxSampleLocationGridSize = (VkExtent2D){1, 1},
.sampleLocationCoordinateRange[0] = 0.0f,
.sampleLocationCoordinateRange[1] = 0.9375f,

View file

@ -12,18 +12,19 @@
#include "vk_log.h"
#include "vk_util.h"
#define HK_MAX_SETS 8
#define HK_MAX_PUSH_SIZE 256
#define HK_MAX_DYNAMIC_BUFFERS 64
#define HK_MAX_RTS 8
#define HK_MIN_SSBO_ALIGNMENT 16
#define HK_MIN_TEXEL_BUFFER_ALIGNMENT 16
#define HK_MIN_UBO_ALIGNMENT 64
#define HK_MAX_VIEWPORTS 16
#define HK_MAX_DESCRIPTOR_SIZE 64
#define HK_MAX_PUSH_DESCRIPTORS 32
#define HK_MAX_DESCRIPTOR_SET_SIZE (1u << 30)
#define HK_MAX_DESCRIPTORS (1 << 20)
#define HK_MAX_SETS 8
#define HK_MAX_PUSH_SIZE 256
#define HK_MAX_DYNAMIC_BUFFERS 64
#define HK_MAX_RTS 8
#define HK_MIN_SSBO_ALIGNMENT 16
#define HK_MIN_TEXEL_BUFFER_ALIGNMENT 16
#define HK_MIN_UBO_ALIGNMENT 64
#define HK_MAX_VIEWPORTS 16
#define HK_MAX_DESCRIPTOR_SIZE 64
#define HK_MAX_PUSH_DESCRIPTORS 32
#define HK_MAX_DESCRIPTOR_SET_SIZE (1u << 30)
#define HK_MAX_INLINE_UNIFORM_BLOCK_SIZE (1u << 16)
#define HK_MAX_DESCRIPTORS (1 << 20)
#define HK_PUSH_DESCRIPTOR_SET_SIZE \
(HK_MAX_PUSH_DESCRIPTORS * HK_MAX_DESCRIPTOR_SIZE)
#define HK_SSBO_BOUNDS_CHECK_ALIGNMENT 4

View file

@ -812,11 +812,6 @@ queue_submit(struct hk_device *dev, struct hk_queue *queue,
/* Now setup the command structs */
struct util_dynarray payload;
util_dynarray_init(&payload, NULL);
union drm_asahi_cmd *cmds = malloc(sizeof(*cmds) * command_count);
if (cmds == NULL) {
free(cmds);
return vk_error(dev, VK_ERROR_OUT_OF_HOST_MEMORY);
}
unsigned nr_vdm = 0, nr_cdm = 0;

View file

@ -319,14 +319,10 @@ visit_intrinsic(nir_intrinsic_instr *instr, struct divergence_state *state)
case nir_intrinsic_load_base_global_invocation_id:
case nir_intrinsic_load_base_workgroup_id:
case nir_intrinsic_load_alpha_reference_amd:
case nir_intrinsic_load_ubo_uniform_block_intel:
case nir_intrinsic_load_ssbo_uniform_block_intel:
case nir_intrinsic_load_shared_uniform_block_intel:
case nir_intrinsic_load_barycentric_optimize_amd:
case nir_intrinsic_load_poly_line_smooth_enabled:
case nir_intrinsic_load_rasterization_primitive_amd:
case nir_intrinsic_unit_test_uniform_amd:
case nir_intrinsic_load_global_constant_uniform_block_intel:
case nir_intrinsic_load_debug_log_desc_amd:
case nir_intrinsic_load_xfb_state_address_gfx12_amd:
case nir_intrinsic_cmat_length:
@ -364,6 +360,24 @@ visit_intrinsic(nir_intrinsic_instr *instr, struct divergence_state *state)
is_divergent = false;
break;
case nir_intrinsic_load_ubo_uniform_block_intel:
case nir_intrinsic_load_ssbo_uniform_block_intel:
case nir_intrinsic_load_shared_uniform_block_intel:
case nir_intrinsic_load_global_constant_uniform_block_intel:
if (options & (nir_divergence_across_subgroups |
nir_divergence_multiple_workgroup_per_compute_subgroup)) {
unsigned num_srcs = nir_intrinsic_infos[instr->intrinsic].num_srcs;
for (unsigned i = 0; i < num_srcs; i++) {
if (src_divergent(instr->src[i], state)) {
is_divergent = true;
break;
}
}
} else {
is_divergent = false;
}
break;
/* This is divergent because it specifically loads sequential values into
* successive SIMD lanes.
*/

View file

@ -1069,6 +1069,7 @@ nir_get_io_index_src_number(const nir_intrinsic_instr *instr)
IMG_CASE(atomic):
IMG_CASE(atomic_swap):
IMG_CASE(size):
IMG_CASE(levels):
IMG_CASE(samples):
IMG_CASE(texel_address):
IMG_CASE(samples_identical):

View file

@ -1228,8 +1228,16 @@ wrap_instr(nir_builder *b, nir_instr *instr, void *data)
static bool
wrap_instrs(nir_shader *shader, wrap_instr_callback callback)
{
return nir_shader_instructions_pass(shader, wrap_instr,
nir_metadata_none, callback);
bool progress = nir_shader_instructions_pass(shader, wrap_instr,
nir_metadata_none, callback);
/* Wrapping jump instructions that are located inside ifs can break SSA
* invariants because the else block no longer dominates the merge block.
* Repair the SSA to make the validator happy again.
*/
if (progress)
nir_repair_ssa(shader);
return progress;
}
static bool

View file

@ -4096,9 +4096,9 @@ distribute_src_mods = [
(('fneg', ('fmul(is_used_once)', a, b)), ('fmul', ('fneg', a), b)),
(('fabs', ('fmul(is_used_once)', a, b)), ('fmul', ('fabs', a), ('fabs', b))),
(('fneg', ('ffma(is_used_once)', a, b, c)), ('ffma', ('fneg', a), b, ('fneg', c))),
(('fneg', ('ffma(is_used_once,nsz)', a, b, c)), ('ffma', ('fneg', a), b, ('fneg', c))),
(('fneg', ('flrp(is_used_once)', a, b, c)), ('flrp', ('fneg', a), ('fneg', b), c)),
(('fneg', ('~fadd(is_used_once)', a, b)), ('fadd', ('fneg', a), ('fneg', b))),
(('fneg', ('fadd(is_used_once,nsz)', a, b)), ('fadd', ('fneg', a), ('fneg', b))),
# Note that fmin <-> fmax. I don't think there is a way to distribute
# fabs() into fmin or fmax.

View file

@ -82,7 +82,9 @@ opt_shrink_store_instr(nir_builder *b, nir_intrinsic_instr *instr, bool shrink_i
/* Trim the num_components stored according to the write mask. */
unsigned write_mask = nir_intrinsic_write_mask(instr);
unsigned last_bit = util_last_bit(write_mask);
/* Don't trim down to an invalid number of components, though. */
unsigned last_bit = nir_round_up_components(util_last_bit(write_mask));
if (last_bit < instr->num_components) {
nir_def *def = nir_trim_vector(b, instr->src[0].ssa, last_bit);
nir_src_rewrite(&instr->src[0], def);

View file

@ -652,6 +652,7 @@ nir_precompiled_build_variant(const nir_function *libfunc,
assert(libfunc->workgroup_size[0] != 0 && "must set workgroup size");
b.shader->info.workgroup_size_variable = false;
b.shader->info.workgroup_size[0] = libfunc->workgroup_size[0];
b.shader->info.workgroup_size[1] = libfunc->workgroup_size[1];
b.shader->info.workgroup_size[2] = libfunc->workgroup_size[2];

View file

@ -506,8 +506,8 @@ vtn_pointer_dereference(struct vtn_builder *b,
type = type->array_element;
}
tail = nir_build_deref_array(&b->nb, tail, arr_index);
tail->arr.in_bounds = deref_chain->in_bounds;
}
tail->arr.in_bounds = deref_chain->in_bounds;
access |= type->access;
}

View file

@ -564,7 +564,7 @@ tiled_to_linear_2cpp(char *_tiled, char *_linear, uint32_t linear_pitch)
"v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15");
}
#else
memcpy_small<2, LINEAR_TO_TILED, FDL_MACROTILE_4_CHANNEL>(
memcpy_small<2, TILED_TO_LINEAR, FDL_MACROTILE_4_CHANNEL>(
0, 0, 32, 4, _tiled, _linear, linear_pitch, 0, 0, 0);
#endif
}

View file

@ -2300,6 +2300,17 @@ insert_live_out_moves(struct ra_ctx *ctx)
insert_file_live_out_moves(ctx, &ctx->shared);
}
static bool
has_merge_set_preferred_reg(struct ir3_register *reg)
{
assert(reg->merge_set);
assert(reg->num != INVALID_REG);
return reg->merge_set->preferred_reg != (physreg_t)~0 &&
ra_reg_get_physreg(reg) ==
reg->merge_set->preferred_reg + reg->merge_set_offset;
}
static void
handle_block(struct ra_ctx *ctx, struct ir3_block *block)
{
@ -2338,17 +2349,15 @@ handle_block(struct ra_ctx *ctx, struct ir3_block *block)
struct ir3_register *dst = input->dsts[0];
assert(dst->num != INVALID_REG);
physreg_t dst_start = ra_reg_get_physreg(dst);
physreg_t dst_end;
if (dst->merge_set) {
if (dst->merge_set && has_merge_set_preferred_reg(dst)) {
/* Take the whole merge set into account to prevent its range being
* allocated for defs not part of the merge set.
*/
assert(dst_start >= dst->merge_set_offset);
dst_end = dst_start - dst->merge_set_offset + dst->merge_set->size;
dst_end = dst->merge_set->preferred_reg + dst->merge_set->size;
} else {
dst_end = dst_start + reg_size(dst);
dst_end = ra_reg_get_physreg(dst) + reg_size(dst);
}
struct ra_file *file = ra_get_file(ctx, dst);

View file

@ -1461,6 +1461,15 @@ r3d_dst_gmem(struct tu_cmd_buffer *cmd, struct tu_cs *cs,
gmem_offset = tu_attachment_gmem_offset(cmd, att, layer);
}
/* On a7xx we must always use FMT6_Z24_UNORM_S8_UINT_AS_R8G8B8A8. See
* blit_base_format().
*/
if (CHIP >= A7XX && att->format == VK_FORMAT_D24_UNORM_S8_UINT) {
RB_MRT_BUF_INFO = pkt_field_set(A6XX_RB_MRT_BUF_INFO_COLOR_FORMAT,
RB_MRT_BUF_INFO,
FMT6_Z24_UNORM_S8_UINT_AS_R8G8B8A8);
}
tu_cs_emit_regs(cs,
RB_MRT_BUF_INFO(CHIP, 0, .dword = RB_MRT_BUF_INFO),
A6XX_RB_MRT_PITCH(0, 0),
@ -1533,7 +1542,8 @@ r3d_setup(struct tu_cmd_buffer *cmd,
tu_cs_emit_call(cs, cmd->device->dbg_renderpass_stomp_cs);
}
enum a6xx_format fmt = blit_base_format<CHIP>(dst_format, ubwc, false);
enum a6xx_format fmt = blit_base_format<CHIP>(dst_format, ubwc,
blit_param & R3D_DST_GMEM);
fixup_dst_format(src_format, &dst_format, &fmt);
if (!cmd->state.pass) {
@ -4638,7 +4648,7 @@ clear_sysmem_attachment(struct tu_cmd_buffer *cmd,
enum pipe_format format = vk_format_to_pipe_format(vk_format);
const struct tu_framebuffer *fb = cmd->state.framebuffer;
const struct tu_image_view *iview = cmd->state.attachments[a];
const uint32_t clear_views = cmd->state.pass->attachments[a].clear_views;
const uint32_t clear_views = cmd->state.pass->attachments[a].used_views;
const struct blit_ops *ops = &r2d_ops<CHIP>;
const VkClearValue *value = &cmd->state.clear_values[a];
if (cmd->state.pass->attachments[a].samples > 1)
@ -4734,7 +4744,7 @@ tu_clear_gmem_attachment(struct tu_cmd_buffer *cmd,
tu_emit_clear_gmem_attachment<CHIP>(cmd, cs, resolve_group, a, 0,
cmd->state.framebuffer->layers,
attachment->clear_views,
attachment->used_views,
attachment->clear_mask,
&cmd->state.clear_values[a], NULL);
}
@ -4755,7 +4765,7 @@ tu7_generic_clear_attachment(struct tu_cmd_buffer *cmd,
iview->view.ubwc_enabled, att->samples);
enum pipe_format format = vk_format_to_pipe_format(att->format);
for_each_layer(i, att->clear_views, cmd->state.framebuffer->layers) {
for_each_layer(i, att->used_views, cmd->state.framebuffer->layers) {
uint32_t layer = i + 0;
uint32_t mask =
aspect_write_mask_generic_clear(format, att->clear_mask);
@ -4836,7 +4846,7 @@ tu_emit_blit(struct tu_cmd_buffer *cmd,
uint32_t buffer_id = tu_resolve_group_include_buffer<CHIP>(resolve_group, format);
event_blit_setup(cs, buffer_id, attachment, blit_event_type, clear_mask);
for_each_layer(i, attachment->clear_views, cmd->state.framebuffer->layers) {
for_each_layer(i, attachment->used_views, cmd->state.framebuffer->layers) {
event_blit_dst_view blt_view = blt_view_from_tu_view(iview, i);
event_blit_run<CHIP>(cmd, cs, attachment, &blt_view, separate_stencil);
}
@ -4951,7 +4961,7 @@ load_3d_blit(struct tu_cmd_buffer *cmd,
/* Wait for CACHE_INVALIDATE to land */
tu_cs_emit_wfi(cs);
for_each_layer(i, att->clear_views, cmd->state.framebuffer->layers) {
for_each_layer(i, att->used_views, cmd->state.framebuffer->layers) {
if (cmd->state.pass->has_fdm) {
struct apply_load_coords_state state = {
.view = i,

View file

@ -1616,7 +1616,7 @@ tu6_emit_gmem_stores(struct tu_cmd_buffer *cmd,
scissor_emitted = true;
}
tu_store_gmem_attachment<CHIP>(cmd, cs, resolve_group, a, a,
fb->layers, subpass->multiview_mask,
fb->layers, att->used_views,
cond_exec_allowed);
}
}

View file

@ -208,8 +208,8 @@ tu_CreateDescriptorSetLayout(
if (binding->descriptorType == VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK)
set_layout->has_inline_uniforms = true;
if (variable_flags && binding->binding < variable_flags->bindingCount &&
(variable_flags->pBindingFlags[binding->binding] &
if (variable_flags && j < variable_flags->bindingCount &&
(variable_flags->pBindingFlags[j] &
VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT)) {
assert(!binding->pImmutableSamplers); /* Terribly ill defined how
many samplers are valid */
@ -377,7 +377,7 @@ tu_GetDescriptorSetLayoutSupport(
uint64_t max_count = MAX_SET_SIZE;
unsigned descriptor_count = binding->descriptorCount;
if (binding->descriptorType == VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK) {
max_count = MAX_SET_SIZE - size;
max_count = MAX_INLINE_UBO_RANGE - size;
descriptor_count = descriptor_sz;
descriptor_sz = 1;
} else if (descriptor_sz) {
@ -388,9 +388,9 @@ tu_GetDescriptorSetLayoutSupport(
supported = false;
}
if (variable_flags && binding->binding < variable_flags->bindingCount &&
if (variable_flags && i < variable_flags->bindingCount &&
variable_count &&
(variable_flags->pBindingFlags[binding->binding] &
(variable_flags->pBindingFlags[i] &
VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT)) {
variable_count->maxVariableDescriptorCount =
MIN2(UINT32_MAX, max_count);

View file

@ -417,7 +417,8 @@ tu_render_pass_patch_input_gmem(struct tu_render_pass *pass)
uint32_t a = subpass->input_attachments[j].attachment;
if (a == VK_ATTACHMENT_UNUSED)
continue;
subpass->input_attachments[j].patch_input_gmem = written[a];
subpass->input_attachments[j].patch_input_gmem =
written[a] && pass->attachments[a].gmem;
}
for (unsigned j = 0; j < subpass->color_count; j++) {
@ -884,7 +885,7 @@ tu_subpass_use_attachment(struct tu_render_pass *pass, int i, uint32_t a, const
att->gmem = true;
update_samples(subpass, pCreateInfo->pAttachments[a].samples);
att->clear_views |= subpass->multiview_mask;
att->used_views |= subpass->multiview_mask;
/* Loads and clears are emitted at the start of the subpass that needs them. */
att->first_subpass_idx = MIN2(i, att->first_subpass_idx);
@ -1126,6 +1127,7 @@ tu_CreateRenderPass2(VkDevice _device,
if (!att->gmem) {
att->clear_mask = 0;
att->load = false;
att->load_stencil = false;
}
}
@ -1235,7 +1237,7 @@ tu_setup_dynamic_render_pass(struct tu_cmd_buffer *cmd_buffer,
VK_FROM_HANDLE(tu_image_view, view, att_info->imageView);
tu_setup_dynamic_attachment(att, view);
att->gmem = true;
att->clear_views = info->viewMask;
att->used_views = info->viewMask;
attachment_set_ops(device, att, att_info->loadOp,
VK_ATTACHMENT_LOAD_OP_DONT_CARE, att_info->storeOp,
VK_ATTACHMENT_STORE_OP_DONT_CARE);
@ -1279,7 +1281,7 @@ tu_setup_dynamic_render_pass(struct tu_cmd_buffer *cmd_buffer,
struct tu_render_pass_attachment *att = &pass->attachments[a];
tu_setup_dynamic_attachment(att, view);
att->gmem = true;
att->clear_views = info->viewMask;
att->used_views = info->viewMask;
subpass->depth_stencil_attachment.attachment = a++;
subpass->input_attachments[0].attachment =
subpass->depth_stencil_attachment.attachment;

View file

@ -94,7 +94,19 @@ struct tu_render_pass_attachment
VkSampleCountFlagBits samples;
uint32_t cpp;
VkImageAspectFlags clear_mask;
uint32_t clear_views;
/* All views that are used with the attachment in all subpasses. Used to
* determine which views to apply loadOp/storeOp to.
*/
uint32_t used_views;
/* The internal MSRTSS attachment to clear when the user says to clear
* this attachment. Clear values must be remapped to this attachment.
*/
uint32_t remapped_clear_att;
/* For internal attachments created for MSRTSS, the original user attachment
* which it is resolved/unresolved to.
*/
uint32_t user_att;
bool load;
bool store;
bool gmem;

View file

@ -3157,8 +3157,6 @@ tu6_emit_blend(struct tu_cs *cs,
bool dual_src_blend = tu_blend_state_is_dual_src(cb);
tu_cs_emit_regs(cs, A6XX_SP_PS_MRT_CNTL(.mrt = num_rts));
tu_cs_emit_regs(cs, A6XX_RB_PS_MRT_CNTL(.mrt = num_rts));
tu_cs_emit_regs(cs, A6XX_SP_BLEND_CNTL(.enable_blend = blend_enable_mask,
.unk8 = true,
.dual_color_in_enable =
@ -3180,10 +3178,12 @@ tu6_emit_blend(struct tu_cs *cs,
.alpha_to_one = alpha_to_one_enable,
.sample_mask = sample_mask));
unsigned num_remapped_rts = 0;
for (unsigned i = 0; i < num_rts; i++) {
if (cal->color_map[i] == MESA_VK_ATTACHMENT_UNUSED)
continue;
unsigned remapped_idx = cal->color_map[i];
num_remapped_rts = MAX2(num_remapped_rts, remapped_idx + 1);
const struct vk_color_blend_attachment_state *att = &cb->attachments[i];
if ((cb->color_write_enables & (1u << i)) && i < cb->attachment_count) {
const enum a3xx_rb_blend_opcode color_op = tu6_blend_op(att->color_blend_op);
@ -3227,6 +3227,8 @@ tu6_emit_blend(struct tu_cs *cs,
A6XX_RB_MRT_BLEND_CONTROL(remapped_idx,));
}
}
tu_cs_emit_regs(cs, A6XX_SP_PS_MRT_CNTL(.mrt = num_remapped_rts));
tu_cs_emit_regs(cs, A6XX_RB_PS_MRT_CNTL(.mrt = num_remapped_rts));
}
static const enum mesa_vk_dynamic_graphics_state tu_blend_constants_state[] = {

View file

@ -88,10 +88,9 @@
#define LLVMCreateBuilder ILLEGAL_LLVM_FUNCTION
typedef struct lp_context_ref {
#if GALLIVM_USE_ORCJIT
LLVMOrcThreadSafeContextRef ref;
#else
LLVMContextRef ref;
#if GALLIVM_USE_ORCJIT
LLVMOrcThreadSafeContextRef tsref;
#endif
bool owned;
} lp_context_ref;
@ -101,18 +100,21 @@ lp_context_create(lp_context_ref *context)
{
assert(context != NULL);
#if GALLIVM_USE_ORCJIT
context->ref = LLVMOrcCreateNewThreadSafeContext();
#if LLVM_VERSION_MAJOR >= 21
context->ref = LLVMContextCreate();
/* Ownership of ref is then transferred to tsref */
context->tsref = LLVMOrcCreateNewThreadSafeContextFromLLVMContext(context->ref);
#else
context->tsref = LLVMOrcCreateNewThreadSafeContext();
context->ref = LLVMOrcThreadSafeContextGetContext(context->tsref);
#endif
#else
context->ref = LLVMContextCreate();
#endif
context->owned = true;
#if LLVM_VERSION_MAJOR == 15
if (context->ref) {
#if GALLIVM_USE_ORCJIT
LLVMContextSetOpaquePointers(LLVMOrcThreadSafeContextGetContext(context->ref), false);
#else
LLVMContextSetOpaquePointers(context->ref, false);
#endif
}
#endif
}
@ -123,7 +125,7 @@ lp_context_destroy(lp_context_ref *context)
assert(context != NULL);
if (context->owned) {
#if GALLIVM_USE_ORCJIT
LLVMOrcDisposeThreadSafeContext(context->ref);
LLVMOrcDisposeThreadSafeContext(context->tsref);
#else
LLVMContextDispose(context->ref);
#endif

View file

@ -555,8 +555,8 @@ init_gallivm_state(struct gallivm_state *gallivm, const char *name,
gallivm->cache = cache;
gallivm->_ts_context = context->ref;
gallivm->context = LLVMContextCreate();
gallivm->_ts_context = context->tsref;
gallivm->context = context->ref;
gallivm->module_name = LPJit::get_unique_name(name);
gallivm->module = LLVMModuleCreateWithNameInContext(gallivm->module_name,

View file

@ -3163,7 +3163,7 @@ do_int_divide(struct lp_build_nir_soa_context *bld,
static LLVMValueRef
do_int_mod(struct lp_build_nir_soa_context *bld,
bool is_unsigned, unsigned src_bit_size,
bool is_unsigned, bool use_src2_sign, unsigned src_bit_size,
LLVMValueRef src, LLVMValueRef src2)
{
struct gallivm_state *gallivm = bld->base.gallivm;
@ -3180,8 +3180,18 @@ do_int_mod(struct lp_build_nir_soa_context *bld,
divisor = get_signed_divisor(gallivm, int_bld, mask_bld,
src_bit_size, src, divisor);
}
LLVMValueRef result = lp_build_mod(int_bld, src, divisor);
return LLVMBuildOr(builder, div_mask, result, "");
LLVMValueRef rem = lp_build_mod(int_bld, src, divisor);
rem = LLVMBuildOr(builder, div_mask, rem, "");
if (use_src2_sign) {
LLVMValueRef add_src2 = LLVMBuildICmp(builder, LLVMIntNE, rem, int_bld->zero, "");
LLVMValueRef signs_different = LLVMBuildXor(builder, LLVMBuildICmp(builder, LLVMIntSLT, src, int_bld->zero, ""),
LLVMBuildICmp(builder, LLVMIntSLT, src2, int_bld->zero, ""), "");
add_src2 = LLVMBuildAnd(builder, add_src2, signs_different, "");
rem = LLVMBuildSelect(builder, add_src2, LLVMBuildAdd(builder, rem, src2, ""), rem, "");
}
return rem;
}
static LLVMValueRef
@ -3493,7 +3503,7 @@ do_alu_action(struct lp_build_nir_soa_context *bld,
break;
case nir_op_imod:
case nir_op_irem:
result = do_int_mod(bld, false, src_bit_size[0], src[0], src[1]);
result = do_int_mod(bld, false, instr->op == nir_op_imod, src_bit_size[0], src[0], src[1]);
break;
case nir_op_ishl: {
if (src_bit_size[0] == 64)
@ -3592,7 +3602,7 @@ do_alu_action(struct lp_build_nir_soa_context *bld,
result = lp_build_min(uint_bld, src[0], src[1]);
break;
case nir_op_umod:
result = do_int_mod(bld, true, src_bit_size[0], src[0], src[1]);
result = do_int_mod(bld, true, false, src_bit_size[0], src[0], src[1]);
break;
case nir_op_umul_high: {
LLVMValueRef hi_bits;

View file

@ -634,8 +634,8 @@ asahi_add_attachment(struct attachments *att, struct agx_resource *rsrc)
assert(att->count < MAX_ATTACHMENTS);
att->list[att->count++] = (struct drm_asahi_attachment){
.size = rsrc->layout.size_B,
.pointer = rsrc->bo->va->addr,
.size = rsrc->layout.size_B - rsrc->layout.level_offsets_B[0],
.pointer = agx_map_gpu(rsrc),
};
}

View file

@ -210,13 +210,13 @@ agx_resource_from_handle(struct pipe_screen *pscreen,
if (rsc->layout.tiling == AIL_TILING_LINEAR) {
rsc->layout.linear_stride_B = whandle->stride;
} else if (whandle->stride != ail_get_wsi_stride_B(&rsc->layout, 0)) {
rsc->layout.level_offsets_B[0] = whandle->offset;
} else if (whandle->stride != ail_get_wsi_stride_B(&rsc->layout, 0) ||
whandle->offset != 0) {
FREE(rsc);
return NULL;
}
assert(whandle->offset == 0);
ail_make_miptree(&rsc->layout);
if (prsc->target == PIPE_BUFFER) {
@ -301,7 +301,8 @@ agx_resource_get_param(struct pipe_screen *pscreen, struct pipe_context *pctx,
enum pipe_resource_param param, unsigned usage,
uint64_t *value)
{
struct agx_resource *rsrc = (struct agx_resource *)prsc;
struct agx_resource *rsrc =
(struct agx_resource *)util_resource_at_index(prsc, plane);
switch (param) {
case PIPE_RESOURCE_PARAM_STRIDE:
@ -1292,7 +1293,7 @@ agx_cmdbuf(struct agx_device *dev, struct drm_asahi_cmd_render *c,
if (zres->layout.compressed) {
c->depth.comp_base =
agx_map_texture_gpu(zres, 0) + zres->layout.metadata_offset_B +
agx_map_gpu(zres) + zres->layout.metadata_offset_B +
(first_layer * zres->layout.compression_layer_stride_B) +
zres->layout.level_offsets_compressed_B[level];
@ -1329,7 +1330,7 @@ agx_cmdbuf(struct agx_device *dev, struct drm_asahi_cmd_render *c,
if (sres->layout.compressed) {
c->stencil.comp_base =
agx_map_texture_gpu(sres, 0) + sres->layout.metadata_offset_B +
agx_map_gpu(sres) + sres->layout.metadata_offset_B +
(first_layer * sres->layout.compression_layer_stride_B) +
sres->layout.level_offsets_compressed_B[level];

View file

@ -503,7 +503,7 @@ agx_get_query_result_resource_gpu(struct agx_context *ctx,
: 0;
libagx_copy_query_gl(batch, agx_1d(1), AGX_BARRIER_ALL, query->ptr.gpu,
rsrc->bo->va->addr + offset, result_type, bool_size);
agx_map_gpu(rsrc) + offset, result_type, bool_size);
return true;
}

View file

@ -726,7 +726,7 @@ agx_pack_texture(void *out, struct agx_resource *rsrc,
if (rsrc->layout.compressed) {
cfg.acceleration_buffer =
agx_map_texture_gpu(rsrc, 0) + rsrc->layout.metadata_offset_B +
agx_map_gpu(rsrc) + rsrc->layout.metadata_offset_B +
(first_layer * rsrc->layout.compression_layer_stride_B);
}
@ -1262,7 +1262,7 @@ agx_batch_upload_pbe(struct agx_batch *batch, struct agx_pbe_packed *out,
cfg.extended = true;
cfg.acceleration_buffer =
agx_map_texture_gpu(tex, 0) + tex->layout.metadata_offset_B +
agx_map_gpu(tex) + tex->layout.metadata_offset_B +
(layer * tex->layout.compression_layer_stride_B);
}
@ -3756,8 +3756,9 @@ agx_index_buffer_rsrc_ptr(struct agx_batch *batch,
struct agx_resource *rsrc = agx_resource(info->index.resource);
agx_batch_reads(batch, rsrc);
*extent = ALIGN_POT(rsrc->layout.size_B, 4);
return rsrc->bo->va->addr;
*extent =
ALIGN_POT(rsrc->layout.size_B - rsrc->layout.level_offsets_B[0], 4);
return agx_map_gpu(rsrc);
}
static uint64_t
@ -3948,7 +3949,7 @@ agx_batch_geometry_params(struct agx_batch *batch, uint64_t input_index_buffer,
params.xfb_size[i] = size;
if (rsrc) {
params.xfb_offs_ptrs[i] = rsrc->bo->va->addr;
params.xfb_offs_ptrs[i] = agx_map_gpu(rsrc);
agx_batch_writes(batch, rsrc, 0);
batch->incoherent_writes = true;
}
@ -4054,7 +4055,7 @@ agx_indirect_buffer_ptr(struct agx_batch *batch,
struct agx_resource *rsrc = agx_resource(indirect->buffer);
agx_batch_reads(batch, rsrc);
return rsrc->bo->va->addr + indirect->offset;
return agx_map_gpu(rsrc) + indirect->offset;
}
static void
@ -5388,7 +5389,7 @@ agx_launch_grid(struct pipe_context *pipe, const struct pipe_grid_info *info)
if (info->indirect) {
struct agx_resource *rsrc = agx_resource(info->indirect);
agx_batch_reads(batch, rsrc);
indirect = rsrc->bo->va->addr + info->indirect_offset;
indirect = agx_map_gpu(rsrc) + info->indirect_offset;
}
/* Increment the pipeline stats query.
@ -5493,7 +5494,7 @@ agx_set_global_binding(struct pipe_context *pipe, unsigned first,
struct agx_resource *rsrc = agx_resource(resources[i]);
memcpy(&addr, handles[i], sizeof(addr));
addr += rsrc->bo->va->addr;
addr += agx_map_gpu(rsrc);
memcpy(handles[i], &addr, sizeof(addr));
} else {
pipe_resource_reference(res, NULL);
@ -5534,7 +5535,7 @@ agx_decompress_inplace(struct agx_batch *batch, struct pipe_surface *surf,
surf->last_layer - surf->first_layer + 1);
libagx_decompress(batch, grid, AGX_BARRIER_ALL, layout, surf->first_layer,
level, agx_map_texture_gpu(rsrc, 0), images.gpu);
level, agx_map_gpu(rsrc), images.gpu);
}
void

View file

@ -970,10 +970,16 @@ agx_map_texture_cpu(struct agx_resource *rsrc, unsigned level, unsigned z)
ail_get_layer_level_B(&rsrc->layout, z, level);
}
static inline uint64_t
agx_map_gpu(struct agx_resource *rsrc)
{
return rsrc->bo->va->addr + rsrc->layout.level_offsets_B[0];
}
static inline uint64_t
agx_map_texture_gpu(struct agx_resource *rsrc, unsigned z)
{
return rsrc->bo->va->addr +
return agx_map_gpu(rsrc) +
(uint64_t)ail_get_layer_offset_B(&rsrc->layout, z);
}

View file

@ -116,7 +116,7 @@ agx_batch_get_so_address(struct agx_batch *batch, unsigned buffer,
target->buffer_size);
*size = target->buffer_size;
return rsrc->bo->va->addr + target->buffer_offset;
return agx_map_gpu(rsrc) + target->buffer_offset;
}
void

View file

@ -3,12 +3,9 @@
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
#include "asahi/genxml/agx_pack.h"
#include "pipe/p_state.h"
#include "util/format/u_format.h"
#include "util/half_float.h"
#include "util/macros.h"
#include "agx_abi.h"
#include "agx_device.h"
#include "agx_state.h"
#include "pool.h"
@ -19,8 +16,7 @@ agx_const_buffer_ptr(struct agx_batch *batch, struct pipe_constant_buffer *cb)
if (cb->buffer) {
struct agx_resource *rsrc = agx_resource(cb->buffer);
agx_batch_reads(batch, rsrc);
return rsrc->bo->va->addr + cb->buffer_offset;
return agx_map_gpu(rsrc) + cb->buffer_offset;
} else {
return 0;
}
@ -42,8 +38,9 @@ agx_upload_vbos(struct agx_batch *batch)
struct agx_resource *rsrc = agx_resource(vb.buffer.resource);
agx_batch_reads(batch, rsrc);
buffers[vbo] = rsrc->bo->va->addr + vb.buffer_offset;
buf_sizes[vbo] = rsrc->layout.size_B - vb.buffer_offset;
buffers[vbo] = agx_map_gpu(rsrc) + vb.buffer_offset;
buf_sizes[vbo] = rsrc->layout.size_B - vb.buffer_offset -
rsrc->layout.level_offsets_B[0];
}
}
@ -144,7 +141,7 @@ agx_set_ssbo_uniforms(struct agx_batch *batch, mesa_shader_stage stage)
agx_batch_reads(batch, rsrc);
}
unif->ssbo_base[cb] = rsrc->bo->va->addr + sb->buffer_offset;
unif->ssbo_base[cb] = agx_map_gpu(rsrc) + sb->buffer_offset;
unif->ssbo_size[cb] = st->ssbo[cb].buffer_size;
} else {
/* Invalid, so use the sink */

View file

@ -223,9 +223,9 @@ iris_apply_brw_tes_prog_data(struct iris_compiled_shader *shader,
iris_apply_brw_vue_prog_data(&brw->base, &iris->base);
iris->partitioning = brw->partitioning;
iris->output_topology = brw->output_topology;
iris->domain = brw->domain;
iris->partitioning = brw_tess_info_partitioning(brw->tess_info);
iris->output_topology = brw_tess_info_output_topology(brw->tess_info);
iris->domain = brw_tess_info_domain(brw->tess_info);
iris->include_primitive_id = brw->include_primitive_id;
}

View file

@ -338,7 +338,6 @@ attribs_update_simple(struct lp_build_interp_soa_context *bld,
LLVMBuilderRef builder = gallivm->builder;
struct lp_build_context *coeff_bld = &bld->coeff_bld;
struct lp_build_context *setup_bld = &bld->setup_bld;
LLVMValueRef oow = NULL;
LLVMValueRef pixoffx;
LLVMValueRef pixoffy;
LLVMValueRef ptr;
@ -425,25 +424,23 @@ attribs_update_simple(struct lp_build_interp_soa_context *bld,
}
if (interp == LP_INTERP_PERSPECTIVE) {
if (oow == NULL) {
LLVMValueRef w;
assert(attrib != 0);
assert(bld->mask[0] & TGSI_WRITEMASK_W);
if (bld->coverage_samples > 1 &&
(loc == TGSI_INTERPOLATE_LOC_SAMPLE ||
loc == TGSI_INTERPOLATE_LOC_CENTROID)) {
/*
* We can't use the precalculated 1/w since we didn't know
* the actual position yet (we were assuming center).
*/
LLVMValueRef indexw = lp_build_const_int32(gallivm, 3);
w = interp_attrib_linear(bld, 0, indexw, chan_pixoffx, chan_pixoffy);
}
else {
w = bld->attribs[0][3];
}
oow = lp_build_rcp(coeff_bld, w);
LLVMValueRef w;
assert(attrib != 0);
assert(bld->mask[0] & TGSI_WRITEMASK_W);
if (bld->coverage_samples > 1 &&
(loc == TGSI_INTERPOLATE_LOC_SAMPLE ||
loc == TGSI_INTERPOLATE_LOC_CENTROID)) {
/*
* We can't use the precalculated 1/w since we didn't know
* the actual position yet (we were assuming center).
*/
LLVMValueRef indexw = lp_build_const_int32(gallivm, 3);
w = interp_attrib_linear(bld, 0, indexw, chan_pixoffx, chan_pixoffy);
}
else {
w = bld->attribs[0][3];
}
LLVMValueRef oow = lp_build_rcp(coeff_bld, w);
a = lp_build_mul(coeff_bld, a, oow);
}

View file

@ -1236,44 +1236,6 @@ spec@nv_texture_env_combine4@nv_texture_env_combine4-combine,Fail
spec@oes_texture_float@oes_texture_float half,Fail
# Remaining fallout from 9d359c6d10adb1cd2978a0e13714a3f98544aae8
spec@arb_texture_compression@fbo-generatemipmap-formats,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA NPOT,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@gen-compressed-teximage,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT NPOT,Fail
# uprev Piglit in Mesa
spec@!opengl 1.1@teximage-scale-bias,Fail
spec@glsl-1.10@execution@glsl-fs-texture2d-mipmap-const-bias-01,Fail

View file

@ -1280,44 +1280,6 @@ spec@nv_texture_env_combine4@nv_texture_env_combine4-combine,Fail
spec@oes_texture_float@oes_texture_float half,Fail
# Remaining fallout from 9d359c6d10adb1cd2978a0e13714a3f98544aae8
spec@arb_texture_compression@fbo-generatemipmap-formats,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA NPOT,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@gen-compressed-teximage,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT NPOT,Fail
# uprev Piglit in Mesa
spec@!opengl 1.1@teximage-scale-bias,Fail
spec@glsl-1.10@execution@glsl-fs-texture2d-mipmap-const-bias-01,Fail

View file

@ -778,78 +778,6 @@ dEQP-GLES2.functional.texture.mipmap.2d.projected.linear_nearest_repeat,Fail
dEQP-GLES2.functional.texture.mipmap.2d.projected.linear_nearest_mirror,Fail
dEQP-GLES2.functional.texture.mipmap.2d.projected.linear_nearest_clamp,Fail
# Remaining fallout from 9d359c6d10adb1cd2978a0e13714a3f98544aae8
spec@arb_texture_compression@fbo-generatemipmap-formats,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA NPOT,Fail
spec@ati_texture_compression_3dc@fbo-generatemipmap-formats,Fail
spec@ati_texture_compression_3dc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA_3DC_ATI,Fail
spec@ati_texture_compression_3dc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA_3DC_ATI NPOT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT NPOT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT NPOT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT NPOT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_LATC1_EXT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_LATC1_EXT NPOT,Fail
spec@ext_texture_compression_rgtc@compressedteximage gl_compressed_red_green_rgtc2_ext,Fail
spec@ext_texture_compression_rgtc@compressedteximage gl_compressed_signed_red_green_rgtc2_ext,Fail
spec@ext_texture_compression_rgtc@compressedteximage gl_compressed_signed_red_rgtc1_ext,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_RED_RGTC1,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_RED_RGTC1 NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_RG_RGTC2,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_RG_RGTC2 NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RED,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RED NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RED_RGTC1,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RED_RGTC1 NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RG,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RG_RGTC2,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RG_RGTC2 NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RG NPOT,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@gen-compressed-teximage,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT NPOT,Fail
# uprev Piglit in Mesa
spec@!opengl 1.1@teximage-scale-bias,Fail
spec@ext_framebuffer_multisample@accuracy all_samples color depthstencil linear,Fail

View file

@ -947,6 +947,32 @@ r300_set_framebuffer_state(struct pipe_context* pipe,
util_framebuffer_init(pipe, state, r300->fb_cbufs, &r300->fb_zsbuf);
util_copy_framebuffer_state(r300->fb_state.state, state);
/* DXTC blits require that blocks are 2x1 or 4x1 pixels, but
* pipe_surface_width sets the framebuffer width as if blocks were 1x1
* pixels. Override the width to correct that.
*/
if (state->nr_cbufs == 1 && state->cbufs[0].texture &&
state->cbufs[0].format == PIPE_FORMAT_R8G8B8A8_UNORM &&
util_format_is_compressed(state->cbufs[0].texture->format)) {
struct pipe_framebuffer_state *fb =
(struct pipe_framebuffer_state*)r300->fb_state.state;
const struct util_format_description *desc =
util_format_description(state->cbufs[0].texture->format);
unsigned width = u_minify(state->cbufs[0].texture->width0,
state->cbufs[0].level);
assert(desc->block.width == 4 && desc->block.height == 4);
/* Each 64-bit DXT block is 2x1 pixels, and each 128-bit DXT
* block is 4x1 pixels when blitting.
*/
width = align(width, 4); /* align to the DXT block width. */
if (desc->block.bits == 64)
width = DIV_ROUND_UP(width, 2);
fb->width = width;
}
/* Remove trailing NULL colorbuffers. */
while (current_state->nr_cbufs && !current_state->cbufs[current_state->nr_cbufs-1].texture)
current_state->nr_cbufs--;

View file

@ -201,6 +201,7 @@ void r600_draw_rectangle(struct blitter_context *blitter,
rctx->b.set_vertex_buffers(&rctx->b, 1, &vbuffer);
util_draw_arrays_instanced(&rctx->b, R600_PRIM_RECTANGLE_LIST, 0, 3,
0, num_instances);
pipe_resource_reference(&buf, NULL);
}
static void r600_dma_emit_wait_idle(struct r600_common_context *rctx)

View file

@ -14,6 +14,7 @@
#include "util/u_memory.h"
#include "util/u_pack_color.h"
#include "util/u_surface.h"
#include "util/u_resource.h"
#include "util/os_time.h"
#include "frontend/winsys_handle.h"
#include <errno.h>
@ -442,7 +443,7 @@ static bool r600_texture_get_param(struct pipe_screen *screen,
switch (param) {
case PIPE_RESOURCE_PARAM_NPLANES:
*value = 1;
*value = util_resource_num(resource);
return true;
case PIPE_RESOURCE_PARAM_STRIDE:

View file

@ -20,6 +20,16 @@ AluGroup::AluGroup()
m_free_slots = has_t() ? 0x1f : 0xf;
}
void
AluGroup::apply_add_instr(AluInstr *instr)
{
instr->set_parent_group(this);
instr->pin_dest_to_chan();
m_has_kill_op |= instr->is_kill();
m_has_pred_update |= instr->has_alu_flag(alu_update_exec);
assert(!(m_has_kill_op && m_has_pred_update));
}
bool
AluGroup::add_instruction(AluInstr *instr)
{
@ -32,17 +42,13 @@ AluGroup::add_instruction(AluInstr *instr)
ASSERTED auto opinfo = alu_ops.find(instr->opcode());
assert(opinfo->second.can_channel(AluOp::t, s_chip_class));
if (add_trans_instructions(instr)) {
instr->set_parent_group(this);
instr->pin_dest_to_chan();
m_has_kill_op |= instr->is_kill();
apply_add_instr(instr);
return true;
}
}
if (add_vec_instructions(instr) && !instr->has_alu_flag(alu_is_trans)) {
instr->set_parent_group(this);
instr->pin_dest_to_chan();
m_has_kill_op |= instr->is_kill();
apply_add_instr(instr);
return true;
}
@ -51,9 +57,7 @@ AluGroup::add_instruction(AluInstr *instr)
if (s_max_slots > 4 && opinfo->second.can_channel(AluOp::t, s_chip_class) &&
add_trans_instructions(instr)) {
instr->set_parent_group(this);
instr->pin_dest_to_chan();
m_has_kill_op |= instr->is_kill();
apply_add_instr(instr);
return true;
}
@ -128,6 +132,8 @@ AluGroup::add_trans_instructions(AluInstr *instr)
* make sure the corresponding vector channel is used */
assert(instr->has_alu_flag(alu_is_trans) || m_slots[instr->dest_chan()]);
m_has_kill_op |= instr->is_kill();
m_has_pred_update |= instr->has_alu_flag(alu_update_exec);
m_slot_assignemnt_order[m_next_slot_assignemnt++] = 4;
return true;
}
@ -170,17 +176,12 @@ AluGroup::add_vec_instructions(AluInstr *instr)
if (!m_slots[preferred_chan]) {
if (instr->bank_swizzle() != alu_vec_unknown) {
if (try_readport(instr, instr->bank_swizzle())) {
m_has_kill_op |= instr->is_kill();
m_slot_assignemnt_order[m_next_slot_assignemnt++] = preferred_chan;
return true;
}
} else {
for (AluBankSwizzle i = alu_vec_012; i != alu_vec_unknown; ++i) {
if (try_readport(instr, i)) {
m_has_kill_op |= instr->is_kill();
m_slot_assignemnt_order[m_next_slot_assignemnt++] = preferred_chan;
if (try_readport(instr, i))
return true;
}
}
}
} else {
@ -209,18 +210,12 @@ AluGroup::add_vec_instructions(AluInstr *instr)
sfn_log << SfnLog::schedule << "V: Try force channel " << free_chan << "\n";
dest->set_chan(free_chan);
if (instr->bank_swizzle() != alu_vec_unknown) {
if (try_readport(instr, instr->bank_swizzle())) {
m_has_kill_op |= instr->is_kill();
m_slot_assignemnt_order[m_next_slot_assignemnt++] = free_chan;
if (try_readport(instr, instr->bank_swizzle()))
return true;
}
} else {
for (AluBankSwizzle i = alu_vec_012; i != alu_vec_unknown; ++i) {
if (try_readport(instr, i)) {
m_has_kill_op |= instr->is_kill();
m_slot_assignemnt_order[m_next_slot_assignemnt++] = free_chan;
if (try_readport(instr, i))
return true;
}
}
}
}
@ -318,6 +313,9 @@ AluGroup::try_readport(AluInstr *instr, AluBankSwizzle cycle)
else if (dest->pin() == pin_group)
dest->set_pin(pin_chgr);
}
m_has_kill_op |= instr->is_kill();
m_has_pred_update |= instr->has_alu_flag(alu_update_exec);
m_slot_assignemnt_order[m_next_slot_assignemnt++] = preferred_chan;
return true;
}
return false;

View file

@ -21,6 +21,7 @@ public:
using iterator = Slots::iterator;
using const_iterator = Slots::const_iterator;
void extracted(AluInstr *& instr);
bool add_instruction(AluInstr *instr);
bool add_trans_instructions(AluInstr *instr);
bool add_vec_instructions(AluInstr *instr);
@ -82,6 +83,7 @@ public:
bool addr_for_src() const { return m_addr_for_src; }
bool has_kill_op() const { return m_has_kill_op; }
bool has_update_exec() const { return m_has_pred_update; }
void set_origin(AluInstr *o) { m_origin = o;}
@ -100,6 +102,8 @@ private:
bool update_indirect_access(AluInstr *instr);
bool try_readport(AluInstr *instr, AluBankSwizzle cycle);
void apply_add_instr(AluInstr *instr);
Slots m_slots;
uint8_t m_next_slot_assignemnt{0};
std::array<int8_t, 5> m_slot_assignemnt_order{-1, -1, -1, -1, -1};
@ -119,6 +123,7 @@ private:
bool m_addr_is_index{false};
bool m_addr_for_src{false};
bool m_has_kill_op{false};
bool m_has_pred_update{false};
AluInstr *m_origin{nullptr};
uint8_t m_free_slots;

View file

@ -869,8 +869,8 @@ BlockScheduler::schedule_alu_to_group_vec(AluGroup *group)
bool success = false;
auto i = alu_vec_ready.begin();
auto e = alu_vec_ready.end();
bool group_has_kill = false;
bool group_has_update_pred = false;
bool group_has_kill = group->has_kill_op();
bool group_has_update_pred = group->has_update_exec();
while (i != e) {
sfn_log << SfnLog::schedule << "Try schedule to vec " << **i;
@ -945,6 +945,7 @@ BlockScheduler::schedule_alu_to_group_vec(AluGroup *group)
success = true;
group_has_kill |= is_kill;
group_has_update_pred |= does_update_pred;
sfn_log << SfnLog::schedule << " success\n";
} else {
@ -965,8 +966,20 @@ BlockScheduler::schedule_alu_multislot_to_group_vec(AluGroup *group, ValueFactor
auto i = alu_multi_slot_ready.begin();
auto e = alu_multi_slot_ready.end();
bool group_has_kill = group->has_kill_op();
while (i != e && util_bitcount(group->free_slot_mask()) > 1) {
/* A kill instruction and a predicate update in the same
* group don't mix well, so skip adding a predicate changing
* multi-slot op if we already have a kill. (There are no
* multi-slot kill ops).
*/
if (group_has_kill && (*i)->has_alu_flag(alu_update_exec)) {
++i;
continue;
}
auto dest = (*i)->dest();
bool can_merge = false;
@ -1038,6 +1051,10 @@ BlockScheduler::schedule_alu_to_group_trans(AluGroup *group,
bool success = false;
auto i = readylist.begin();
auto e = readylist.end();
bool group_has_kill = group->has_kill_op();
bool group_has_update_pred = group->has_update_exec();
while (i != e) {
if (check_array_reads(**i)) {
@ -1052,6 +1069,12 @@ BlockScheduler::schedule_alu_to_group_trans(AluGroup *group,
continue;
}
if ((group_has_kill && (*i)->has_alu_flag(alu_update_exec)) ||
(group_has_update_pred && (*i)->is_kill())) {
++i;
continue;
}
if (group->add_trans_instructions(*i)) {
(*i)->pin_dest_to_chan();
auto old_i = i;

View file

@ -88,8 +88,10 @@ class CollectDeps : public ConstRegisterVisitor {
public:
void visit(const Register& r) override
{
for (auto p : r.parents())
add_dep(p);
for (auto p : r.parents()) {
if (instr->block_id() == p->block_id() && instr->index() < p->index())
add_dep(p);
}
}
void visit(const LocalArray& value) override {(void)value; UNREACHABLE("Array is not a value");}
void visit(const LocalArrayValue& r) override

View file

@ -22,7 +22,7 @@ r600_test_dep = declare_dependency(
if with_tests
foreach t : ['valuefactory', 'value', 'instr', 'instrfromstring', 'liverange',
'optimizer', 'shaderfromstring', 'split_address_loads' ]
'optimizer', 'regression', 'shaderfromstring', 'split_address_loads' ]
test(
t,
executable('test-@0@-r600-sfn'.format(t),

View file

@ -0,0 +1,65 @@
#include "sfn_test_shaders.h"
#include "../sfn_optimizer.h"
#include "../sfn_ra.h"
#include "../sfn_scheduler.h"
#include "../sfn_shader.h"
#include "../sfn_split_address_loads.h"
using namespace r600;
using std::ostringstream;
TEST_F(TestShaderFromNir, CombineRegisterToTexSrc)
{
const char *shader_input =
R"(VS
CHIPCLASS EVERGREEN
INPUT LOC:0
INPUT LOC:1
OUTPUT LOC:0 VARYING_SLOT:0 MASK:15
OUTPUT LOC:1 VARYING_SLOT:32 MASK:3
REGISTERS R1.xyzw R2.xyzw R6.y R7.x R8.y R9.x R10.y
ARRAYS A3[2].zw
SHADER
BLOCK_START
ALU MOV R10.y@free : R2.x@fully{sb} {W}
ALU MOV R9.x@free : R2.y@fully{sb} {W}
ALU MOV R8.y@free : R2.z@fully{sb} {W}
ALU MOV R7.x@free : R2.w@fully{sb} {W}
ALU MOV R6.y@free : I[0] {W}
LOOP_BEGIN
BLOCK_END
BLOCK_START
IF (( ALU PRED_SETGE_INT __.x : R6.y@free KC0[0].x {LEP} PUSH_BEFORE ))
BLOCK_END
BLOCK_START
BREAK
BLOCK_END
BLOCK_START
ENDIF
BLOCK_END
BLOCK_START
ALU INT_TO_FLT CLAMP S22.y@free{s} : R6.y@free {W}
ALU TRUNC S24.w@free{s} : S22.y@free{s} {W}
ALU FLT_TO_INT S25.x@free{s} : S24.w@free{s} {W}
ALU MOV A3[S25.x@free].z : R10.y@free {W}
ALU MOV A3[S25.x@free].w : R9.x@free {W}
ALU MUL_IEEE R5.x@free : R9.x@free I[0.5] {W}
ALU MUL_IEEE R9.x@free : R8.y@free I[0.5] {W}
ALU MUL_IEEE R8.y@free : R7.x@free I[0.5] {W}
ALU MUL_IEEE R7.x@free : R10.y@free I[0.5] {W}
ALU ADD_INT R6.y@free : R6.y@free I[1] {W}
ALU MOV R10.y@free : R5.x@free {W}
LOOP_END
BLOCK_END
BLOCK_START
ALU ADD S47.z@group{s} : A3[0].z A3[1].z {W}
ALU ADD S47.x@group{s} : A3[0].w A3[1].w {W}
EXPORT_DONE PARAM 0 S47.zx__
EXPORT_DONE POS 0 R1.xyzw
BLOCK_END)";
auto sh = from_string(shader_input);
split_address_loads(*sh);
schedule(sh);
}

View file

@ -56,6 +56,19 @@ static bool si_update_shaders(struct si_context *sctx)
struct si_shader *old_ps = sctx->shader.ps.current;
int r;
if (GFX_VERSION >= GFX9) {
/* For merged shaders, mark the next shader as dirty so its previous_stage is updated. */
if (is_vs_state_changed) {
if (HAS_TESS) {
is_tess_state_changed = true;
} else if (HAS_GS) {
is_gs_state_changed = true;
}
}
if ((sctx->dirty_shaders_mask & BITFIELD_BIT(MESA_SHADER_TESS_EVAL)) && HAS_GS && HAS_TESS)
is_gs_state_changed = true;
}
/* Update TCS and TES. */
if (HAS_TESS && is_tess_state_changed) {
if (!sctx->has_tessellation) {

View file

@ -690,6 +690,7 @@ v3d_get_sand8_fs(struct pipe_context *pctx, int cpp)
nir_variable_create(b.shader, nir_var_shader_out,
vec4, "f_color");
color_out->data.location = FRAG_RESULT_COLOR;
b.shader->info.outputs_written |= BITFIELD_BIT(FRAG_RESULT_COLOR);
nir_variable *pos_in =
nir_variable_create(b.shader, nir_var_shader_in, vec4, "pos");
@ -998,6 +999,7 @@ v3d_get_sand30_fs(struct pipe_context *pctx)
nir_var_shader_out,
glsl_uvec4, "f_color");
color_out->data.location = FRAG_RESULT_COLOR;
b.shader->info.outputs_written |= BITFIELD_BIT(FRAG_RESULT_COLOR);
nir_variable *pos_in =
nir_variable_create(b.shader, nir_var_shader_in, vec4, "pos");

View file

@ -1,6 +1,6 @@
# Please include a comment with the log message and a testcase triggering each
# VUID at the bottom of the file.
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-vkCmdDrawMultiIndexedEXT-format-07753,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-vkCmdDrawMultiIndexedEXT-None-10909,VUID-vkDestroyDevice-device-05137,VUID-vkCmdDrawMultiEXT-None-08879,VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-VkPhysicalDeviceMeshShaderFeaturesEXT-primitiveFragmentShadingRateMeshShader-07033
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-vkCmdDrawMultiIndexedEXT-format-07753,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-vkCmdDrawMultiIndexedEXT-None-10909,VUID-vkDestroyDevice-device-05137,VUID-vkCmdDrawMultiEXT-None-08879,VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-08879
khronos_validation.report_flags = error
khronos_validation.debug_action = VK_DBG_LAYER_ACTION_LOG_MSG,VK_DBG_LAYER_ACTION_BREAK
VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT

View file

@ -1,6 +1,6 @@
# Please include a comment with the log message and a testcase triggering each
# VUID at the bottom of the file.
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-vkCmdDrawMultiIndexedEXT-None-10909,VUID-VkPhysicalDeviceMeshShaderFeaturesEXT-primitiveFragmentShadingRateMeshShader-07033
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-vkCmdDrawMultiIndexedEXT-None-10909
khronos_validation.report_flags = error
khronos_validation.debug_action = VK_DBG_LAYER_ACTION_LOG_MSG,VK_DBG_LAYER_ACTION_BREAK
VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT

View file

@ -1,6 +1,6 @@
# Please include a comment with the log message and a testcase triggering each
# VUID at the bottom of the file.
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-vkCmdDrawMultiIndexedEXT-format-07753,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-vkCmdDrawMultiIndexedEXT-None-10909,VUID-vkDestroyDevice-device-05137,VUID-VkPhysicalDeviceMeshShaderFeaturesEXT-primitiveFragmentShadingRateMeshShader-07033
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-vkCmdDrawMultiIndexedEXT-format-07753,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-vkCmdDrawMultiIndexedEXT-None-10909,VUID-vkDestroyDevice-device-05137
khronos_validation.report_flags = error
khronos_validation.debug_action = VK_DBG_LAYER_ACTION_LOG_MSG,VK_DBG_LAYER_ACTION_BREAK
VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT

View file

@ -1,6 +1,6 @@
# Please include a comment with the log message and a testcase triggering each
# VUID at the bottom of the file.
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-vkCmdDrawMultiEXT-None-02699,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-vkCmdPipelineBarrier2-shaderTileImageColorReadAccess-08718,VUID-VkGraphicsPipelineCreateInfo-flags-06482,VUID-vkCmdPipelineBarrier2-None-08719,VUID-vkCmdDrawMultiEXT-rasterizationSamples-07474,VUID-vkDestroyDevice-device-05137,VUID-VkRectLayerKHR-offset-04864,VUID-vkAcquireNextImageKHR-semaphore-01779,VUID-vkQueueSubmit-pSignalSemaphores-00067,VUID-VkImageMemoryBarrier2-srcAccessMask-07454,UNASSIGNED-GeneralParameterError-RequiredHandle,VUID-VkImageMemoryBarrier2-image-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-10909,VUID-VkPhysicalDeviceMeshShaderFeaturesEXT-primitiveFragmentShadingRateMeshShader-07033
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-vkCmdDrawMultiEXT-None-02699,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-vkCmdPipelineBarrier2-shaderTileImageColorReadAccess-08718,VUID-VkGraphicsPipelineCreateInfo-flags-06482,VUID-vkCmdPipelineBarrier2-None-08719,VUID-vkCmdDrawMultiEXT-rasterizationSamples-07474,VUID-vkDestroyDevice-device-05137,VUID-VkRectLayerKHR-offset-04864,VUID-vkAcquireNextImageKHR-semaphore-01779,VUID-vkQueueSubmit-pSignalSemaphores-00067,VUID-VkImageMemoryBarrier2-srcAccessMask-07454,UNASSIGNED-GeneralParameterError-RequiredHandle,VUID-VkImageMemoryBarrier2-image-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-10909
khronos_validation.report_flags = error
khronos_validation.debug_action = VK_DBG_LAYER_ACTION_LOG_MSG,VK_DBG_LAYER_ACTION_BREAK
VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT

View file

@ -1,7 +1,7 @@
# Please include a comment with the log message and a testcase triggering each
# VUID at the bottom of the file.
#khronos_validation.message_id_filter = VUID-RuntimeSpirv-Location-06272,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicPrimitiveTopologyUnrestricted-07500,VUID-vkCmdDrawMultiEXT-None-08879,VUID-vkCmdDrawMultiIndexedEXT-dynamicPrimitiveTopologyUnrestricted-07500,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicRenderingUnusedAttachments-08911,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737,VUID-vkCmdDrawMultiEXT-None-08879,VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-10909,UNASSIGNED-Draw-primitiveTopologyPatchListRestart,VUID-VkPhysicalDeviceMeshShaderFeaturesEXT-primitiveFragmentShadingRateMeshShader-07033
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicRenderingUnusedAttachments-08911,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737,VUID-vkCmdDrawMultiEXT-None-08879,VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-10909,UNASSIGNED-Draw-primitiveTopologyPatchListRestart
khronos_validation.report_flags = error
khronos_validation.debug_action = VK_DBG_LAYER_ACTION_LOG_MSG,VK_DBG_LAYER_ACTION_BREAK
VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT

View file

@ -1,7 +1,7 @@
# Please include a comment with the log message and a testcase triggering each
# VUID at the bottom of the file.
#khronos_validation.message_id_filter = VUID-RuntimeSpirv-Location-06272,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicPrimitiveTopologyUnrestricted-07500,VUID-vkCmdDrawMultiEXT-None-08879,VUID-vkCmdDrawMultiIndexedEXT-dynamicPrimitiveTopologyUnrestricted-07500,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-VkRenderingAttachmentInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicRenderingUnusedAttachments-08911,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737,VUID-vkCmdDrawMultiEXT-None-08879,VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-10909,UNASSIGNED-Draw-primitiveTopologyPatchListRestart,VUID-VkPhysicalDeviceMeshShaderFeaturesEXT-primitiveFragmentShadingRateMeshShader-07033
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-VkRenderingAttachmentInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicRenderingUnusedAttachments-08911,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737,VUID-vkCmdDrawMultiEXT-None-08879,VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-10909,UNASSIGNED-Draw-primitiveTopologyPatchListRestart
khronos_validation.report_flags = error
khronos_validation.debug_action = VK_DBG_LAYER_ACTION_LOG_MSG,VK_DBG_LAYER_ACTION_BREAK
VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT

View file

@ -1,7 +1,7 @@
# Please include a comment with the log message and a testcase triggering each
# VUID at the bottom of the file.
#khronos_validation.message_id_filter = VUID-RuntimeSpirv-Location-06272,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicPrimitiveTopologyUnrestricted-07500,VUID-vkCmdDrawMultiEXT-None-08879,VUID-vkCmdDrawMultiIndexedEXT-dynamicPrimitiveTopologyUnrestricted-07500,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicRenderingUnusedAttachments-08911,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737,VUID-vkCmdDrawMultiEXT-None-08879,VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-10909,UNASSIGNED-Draw-primitiveTopologyPatchListRestart,VUID-VkPhysicalDeviceMeshShaderFeaturesEXT-primitiveFragmentShadingRateMeshShader-07033
khronos_validation.message_id_filter = VUID-VkPhysicalDeviceProperties2-pNext-pNext,VUID-VkDeviceCreateInfo-pNext-pNext,VUID-RuntimeSpirv-Location-06272,VUID-RuntimeSpirv-OpEntryPoint-08743,VUID-StandaloneSpirv-OpEntryPoint-08721,VUID-vkCmdDrawMultiEXT-dynamicRenderingUnusedAttachments-08911,VUID-vkCmdDrawMultiIndexedEXT-None-08879,VUID-vkDestroyDevice-device-05137,VUID-vkQueueSubmit-pCommandBuffers-00065,VUID-VkShaderCreateInfoEXT-pCode-08737,VUID-vkCmdDrawMultiEXT-None-08879,VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter,VUID-vkCmdDrawMultiIndexedEXT-None-10909,UNASSIGNED-Draw-primitiveTopologyPatchListRestart
khronos_validation.report_flags = error
khronos_validation.debug_action = VK_DBG_LAYER_ACTION_LOG_MSG,VK_DBG_LAYER_ACTION_BREAK
VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT

View file

@ -3368,9 +3368,8 @@ begin_rendering(struct zink_context *ctx, bool check_msaa_expand)
VK_TRUE,
ctx->gfx_pipeline_state.rast_samples + 1,
};
ctx->dynamic_fb.info.pNext = ctx->transient_attachments && !ctx->blitting && has_msrtss ? &msrtss : NULL;
if (has_msrtss && !ctx->blitting)
ctx->dynamic_fb.info.pNext = ctx->transient_attachments ? &msrtss : NULL;
VKCTX(CmdBeginRendering)(ctx->bs->cmdbuf, &ctx->dynamic_fb.info);
ctx->in_rp = true;
return clear_buffers;

View file

@ -1725,6 +1725,7 @@ zink_descriptors_deinit(struct zink_context *ctx)
VKSCR(DestroyDescriptorSetLayout)(screen->dev, ctx->dd.push_dsl[0]->layout, NULL);
if (ctx->dd.push_dsl[1])
VKSCR(DestroyDescriptorSetLayout)(screen->dev, ctx->dd.push_dsl[1]->layout, NULL);
VKSCR(DestroyDescriptorSetLayout)(screen->dev, ctx->dd.old_push_dsl, NULL);
}
/* called on screen creation */
@ -1766,7 +1767,8 @@ zink_descriptor_util_init_fbfetch(struct zink_context *ctx)
return;
struct zink_screen *screen = zink_screen(ctx->base.screen);
VKSCR(DestroyDescriptorSetLayout)(screen->dev, ctx->dd.push_dsl[0]->layout, NULL);
/* save this layout; it may be used by programs, and tracking that is extra complexity */
ctx->dd.old_push_dsl = ctx->dd.push_dsl[0]->layout;
//don't free these now, let ralloc free on teardown to avoid invalid access
//ralloc_free(ctx->dd.push_dsl[0]);
//ralloc_free(ctx->dd.push_layout_keys[0]);

View file

@ -270,8 +270,7 @@ update_gfx_pipeline(struct zink_context *ctx, struct zink_batch_state *bs, enum
pipeline = zink_get_gfx_pipeline<DYNAMIC_STATE, true, false>(ctx, ctx->curr_program, &ctx->gfx_pipeline_state, mode);
else
pipeline = zink_get_gfx_pipeline<DYNAMIC_STATE, false, false>(ctx, ctx->curr_program, &ctx->gfx_pipeline_state, mode);
}
if (pipeline) {
assert(pipeline);
pipeline_changed = prev_pipeline != pipeline || ctx->shobj_draw;
if (BATCH_CHANGED || pipeline_changed)
VKCTX(CmdBindPipeline)(bs->cmdbuf, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
@ -986,8 +985,7 @@ update_mesh_pipeline(struct zink_context *ctx, struct zink_batch_state *bs)
pipeline = zink_get_gfx_pipeline<ZINK_DYNAMIC_STATE3, true, true>(ctx, ctx->mesh_program, &ctx->gfx_pipeline_state, MESA_PRIM_COUNT);
else
pipeline = zink_get_gfx_pipeline<ZINK_DYNAMIC_STATE3, false, true>(ctx, ctx->mesh_program, &ctx->gfx_pipeline_state, MESA_PRIM_COUNT);
}
if (pipeline) {
assert(pipeline);
pipeline_changed = prev_pipeline != pipeline || ctx->shobj_draw;
if (BATCH_CHANGED || pipeline_changed)
VKCTX(CmdBindPipeline)(bs->cmdbuf, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);

View file

@ -119,7 +119,9 @@ pipeline_statistic_convert(enum pipe_statistics_query_index idx)
[PIPE_STAT_QUERY_PS_INVOCATIONS] = VK_QUERY_PIPELINE_STATISTIC_FRAGMENT_SHADER_INVOCATIONS_BIT,
[PIPE_STAT_QUERY_HS_INVOCATIONS] = VK_QUERY_PIPELINE_STATISTIC_TESSELLATION_CONTROL_SHADER_PATCHES_BIT,
[PIPE_STAT_QUERY_DS_INVOCATIONS] = VK_QUERY_PIPELINE_STATISTIC_TESSELLATION_EVALUATION_SHADER_INVOCATIONS_BIT,
[PIPE_STAT_QUERY_CS_INVOCATIONS] = VK_QUERY_PIPELINE_STATISTIC_COMPUTE_SHADER_INVOCATIONS_BIT
[PIPE_STAT_QUERY_CS_INVOCATIONS] = VK_QUERY_PIPELINE_STATISTIC_COMPUTE_SHADER_INVOCATIONS_BIT,
[PIPE_STAT_QUERY_MS_INVOCATIONS] = VK_QUERY_PIPELINE_STATISTIC_MESH_SHADER_INVOCATIONS_BIT_EXT,
[PIPE_STAT_QUERY_TS_INVOCATIONS] = VK_QUERY_PIPELINE_STATISTIC_TASK_SHADER_INVOCATIONS_BIT_EXT,
};
assert(idx < ARRAY_SIZE(map));
return map[idx];

View file

@ -3133,6 +3133,12 @@ init_driver_workarounds(struct zink_screen *screen)
screen->info.have_EXT_host_image_copy = false;
}
static void
disable_features(struct zink_screen *screen)
{
screen->info.mesh_feats.primitiveFragmentShadingRateMeshShader = false;
}
static void
check_hic_shader_read(struct zink_screen *screen)
{
@ -3513,6 +3519,7 @@ zink_internal_create_screen(const struct pipe_screen_config *config, int64_t dev
check_hic_shader_read(screen);
init_driver_workarounds(screen);
disable_features(screen);
screen->dev = zink_create_logical_device(screen);
if (!screen->dev)

View file

@ -437,6 +437,7 @@ struct zink_descriptor_data {
uint8_t state_changed[ZINK_PIPELINE_MAX]; //gfx, compute, mesh
struct zink_descriptor_layout_key *push_layout_keys[2]; //gfx, compute
struct zink_descriptor_layout *push_dsl[2]; //gfx, compute
VkDescriptorSetLayout old_push_dsl; //the non-fbfetch layout; this can't be destroyed because it may be in use
VkDescriptorUpdateTemplate push_template[2]; //gfx, compute
struct zink_descriptor_layout *dummy_dsl;

View file

@ -574,7 +574,7 @@ lvp_encode_as(struct vk_acceleration_structure *dst, VkDeviceAddress intermediat
/* The BVH exceeds the maximum depth supported by the traversal stack,
* flatten the offending parts of the tree.
*/
if (max_node_depth >= 24)
if (max_node_depth >= (geometry_type == VK_GEOMETRY_TYPE_INSTANCES_KHR ? LVP_MAX_TLAS_DEPTH : LVP_MAX_BLAS_DEPTH))
lvp_flatten_as(header, ir_box_nodes, root_offset, node_depth, output);
free(node_depth);

View file

@ -547,6 +547,7 @@ VKAPI_ATTR void VKAPI_CALL lvp_UpdateDescriptorSets(
desc[didx + p].functions = iview->planes[p].image_handle->functions;
}
} else {
memset(&desc[didx], 0, sizeof(desc[didx]) * bind_layout->stride);
for (unsigned k = 0; k < bind_layout->stride; k++)
desc[didx + k].functions = device->null_image_handle->functions;
}
@ -577,6 +578,7 @@ VKAPI_ATTR void VKAPI_CALL lvp_UpdateDescriptorSets(
lp_jit_image_from_pipe(&desc[j].image, &bview->iv);
desc[j].functions = bview->image_handle->functions;
} else {
memset(&desc[j].image, 0, sizeof(desc[j].image));
desc[j].functions = device->null_image_handle->functions;
}
}
@ -846,6 +848,7 @@ lvp_descriptor_set_update_with_template(VkDevice _device, VkDescriptorSet descri
desc[idx + p].functions = iview->planes[p].image_handle->functions;
}
} else {
memset(&desc[idx], 0, sizeof(desc[idx]) * bind_layout->stride);
for (unsigned k = 0; k < bind_layout->stride; k++)
desc[idx + k].functions = device->null_image_handle->functions;
}
@ -872,6 +875,7 @@ lvp_descriptor_set_update_with_template(VkDevice _device, VkDescriptorSet descri
lp_jit_image_from_pipe(&desc[idx].image, &bview->iv);
desc[idx].functions = bview->image_handle->functions;
} else {
memset(&desc[idx].image, 0, sizeof(desc[idx].image));
desc[idx].functions = device->null_image_handle->functions;
}
break;
@ -1073,8 +1077,9 @@ VKAPI_ATTR void VKAPI_CALL lvp_GetDescriptorEXT(
desc[p].functions = iview->planes[p].image_handle->functions;
}
} else {
unsigned plane_count = size / sizeof(struct lp_descriptor);
memset(desc, 0, size);
unsigned plane_count = size / sizeof(struct lp_descriptor);
for (unsigned p = 0; p < plane_count; p++)
desc[p].functions = device->null_image_handle->functions;
}
@ -1087,6 +1092,7 @@ VKAPI_ATTR void VKAPI_CALL lvp_GetDescriptorEXT(
lp_jit_bindless_texture_buffer_from_bda(&desc->texture, (void*)(uintptr_t)bda->address);
desc->functions = get_texture_handle_bda(device, bda->address, bda->range, pformat).functions;
} else {
memset(desc, 0, size);
desc->functions = device->null_texture_handle->functions;
desc->texture.sampler_index = 0;
}
@ -1099,6 +1105,7 @@ VKAPI_ATTR void VKAPI_CALL lvp_GetDescriptorEXT(
lp_jit_image_buffer_from_bda(&desc->image, (void *)(uintptr_t)bda->address, bda->range, pformat);
desc->functions = get_image_handle_bda(device, bda->address, bda->range, pformat).functions;
} else {
memset(desc, 0, size);
desc->functions = device->null_image_handle->functions;
}
break;

View file

@ -1271,8 +1271,8 @@ lvp_get_properties(const struct lvp_physical_device *device, struct vk_propertie
/* VK_KHR_acceleration_structure */
.maxGeometryCount = (1 << 24) - 1,
.maxInstanceCount = (1 << 24) - 1,
.maxPrimitiveCount = (1 << 24) - 1,
.maxInstanceCount = (1 << LVP_MAX_TLAS_DEPTH) - 1,
.maxPrimitiveCount = (1 << LVP_MAX_BLAS_DEPTH) - 1,
.maxPerStageDescriptorAccelerationStructures = MAX_DESCRIPTORS,
.maxPerStageDescriptorUpdateAfterBindAccelerationStructures = MAX_DESCRIPTORS,
.maxDescriptorSetAccelerationStructures = MAX_DESCRIPTORS,

View file

@ -100,13 +100,16 @@ extern "C" {
#define MAX_DESCRIPTORS 1000000 /* Required by vkd3d-proton */
#define MAX_PUSH_CONSTANTS_SIZE 256
#define MAX_PUSH_DESCRIPTORS 32
#define MAX_DESCRIPTOR_UNIFORM_BLOCK_SIZE 4096
#define MAX_DESCRIPTOR_UNIFORM_BLOCK_SIZE MAX_DESCRIPTORS
#define MAX_PER_STAGE_DESCRIPTOR_UNIFORM_BLOCKS 8
#define MAX_DGC_STREAMS 16
#define MAX_DGC_TOKENS 16
/* Currently lavapipe does not support more than 1 image plane */
#define LVP_MAX_PLANE_COUNT 1
#define LVP_MAX_TLAS_DEPTH 24
#define LVP_MAX_BLAS_DEPTH 29
#ifdef _WIN32
#define lvp_printflike(a, b)
#else

View file

@ -356,7 +356,7 @@ lvp_ray_traversal_state_init(nir_function_impl *impl, struct lvp_ray_traversal_s
state->current_node = nir_local_variable_create(impl, glsl_uint_type(), "traversal.current_node");
state->stack_base = nir_local_variable_create(impl, glsl_uint_type(), "traversal.stack_base");
state->stack_ptr = nir_local_variable_create(impl, glsl_uint_type(), "traversal.stack_ptr");
state->stack = nir_local_variable_create(impl, glsl_array_type(glsl_uint_type(), 24 * 2, 0), "traversal.stack");
state->stack = nir_local_variable_create(impl, glsl_array_type(glsl_uint_type(), LVP_MAX_TLAS_DEPTH + LVP_MAX_BLAS_DEPTH, 0), "traversal.stack");
state->hit = nir_local_variable_create(impl, glsl_bool_type(), "traversal.hit");
state->instance_addr = nir_local_variable_create(impl, glsl_uint64_t_type(), "traversal.instance_addr");

View file

@ -171,7 +171,8 @@ init_ray_query_traversal_vars(void *ctx, nir_shader *shader, unsigned array_leng
result.stack_base =
rq_variable_create(ctx, shader, array_length, glsl_uint_type(), VAR_NAME("_stack_base"));
result.stack_ptr = rq_variable_create(ctx, shader, array_length, glsl_uint_type(), VAR_NAME("_stack_ptr"));
result.stack = rq_variable_create(ctx, shader, array_length, glsl_array_type(glsl_uint_type(), 24 * 2, 0), VAR_NAME("_stack"));
result.stack = rq_variable_create(ctx, shader, array_length,
glsl_array_type(glsl_uint_type(), LVP_MAX_TLAS_DEPTH + LVP_MAX_BLAS_DEPTH, 0), VAR_NAME("_stack"));
return result;
}

View file

@ -1,6 +1,14 @@
# Copyright © 2017 Dylan Baker
# SPDX-License-Identifier: MIT
libradeonwinsys_deps = [idep_mesautil, dep_libdrm]
libradeonwinsys_c_args = []
if with_gallium_radeonsi
libradeonwinsys_deps += [idep_amdgfxregs_h]
libradeonwinsys_c_args = ['-DHAVE_GALLIUM_RADEONSI']
endif
libradeonwinsys = static_library(
'radeonwinsys',
files('radeon_drm_bo.c',
@ -14,5 +22,6 @@ libradeonwinsys = static_library(
'radeon_surface.h'),
include_directories : [inc_src, inc_include, inc_gallium, inc_gallium_aux],
gnu_symbol_visibility : 'hidden',
dependencies : [idep_mesautil, dep_libdrm],
c_args : libradeonwinsys_c_args,
dependencies : libradeonwinsys_deps,
)

View file

@ -8,6 +8,10 @@
#include "radeon_drm_bo.h"
#include "radeon_drm_cs.h"
#ifdef HAVE_GALLIUM_RADEONSI
#include "amdgfxregs.h"
#endif
#include "util/os_file.h"
#include "util/simple_mtx.h"
#include "util/thread_sched.h"
@ -105,6 +109,73 @@ static bool radeon_get_drm_value(int fd, unsigned request,
return true;
}
static void get_hs_info(struct radeon_info *info)
{
/* This is the size of all TCS outputs in memory per workgroup.
* Hawaii can't handle num_workgroups > 256 with 8K per workgroup, so use 4K.
*/
unsigned max_hs_out_vram_dwords_per_wg = info->family == CHIP_HAWAII ? 4096 : 8192;
unsigned max_workgroups_per_se;
#ifdef HAVE_GALLIUM_RADEONSI /* for gfx6+ register definitions */
unsigned max_hs_out_vram_dwords_enum = 0;
switch (max_hs_out_vram_dwords_per_wg) {
case 8192:
max_hs_out_vram_dwords_enum = V_03093C_X_8K_DWORDS;
break;
case 4096:
max_hs_out_vram_dwords_enum = V_03093C_X_4K_DWORDS;
break;
case 2048:
max_hs_out_vram_dwords_enum = V_03093C_X_2K_DWORDS;
break;
case 1024:
max_hs_out_vram_dwords_enum = V_03093C_X_1K_DWORDS;
break;
default:
UNREACHABLE("invalid TCS workgroup size");
}
#endif
/* Gfx7 should limit num_workgroups to 508 (127 per SE)
* Gfx6 should limit num_workgroups to 126 (63 per SE)
*/
if (info->gfx_level == GFX7) {
max_workgroups_per_se = 127;
} else {
max_workgroups_per_se = 63;
}
/* Limit to 4 workgroups per CU for TCS, which exhausts LDS if each workgroup occupies 16KB.
* Note that the offchip allocation isn't deallocated until the corresponding TES waves finish.
*/
unsigned num_offchip_wg_per_cu = 4;
unsigned num_workgroups_per_se = MIN2(num_offchip_wg_per_cu * info->max_good_cu_per_sa *
info->max_sa_per_se, max_workgroups_per_se);
unsigned num_workgroups = num_workgroups_per_se * info->max_se;
#ifdef HAVE_GALLIUM_RADEONSI /* for gfx6+ register definitions */
if (info->gfx_level == GFX7) {
info->hs_offchip_param = S_03093C_OFFCHIP_BUFFERING_GFX7(num_workgroups) |
S_03093C_OFFCHIP_GRANULARITY_GFX7(max_hs_out_vram_dwords_enum);
} else {
info->hs_offchip_param = S_0089B0_OFFCHIP_BUFFERING(num_workgroups) |
S_0089B0_OFFCHIP_GRANULARITY(max_hs_out_vram_dwords_enum);
}
#endif
/* The typical size of tess factors of 1 TCS workgroup if all patches are triangles. */
unsigned typical_tess_factor_size_per_wg = (192 / 3) * 16;
unsigned num_tess_factor_wg_per_cu = 3;
info->hs_offchip_workgroup_dw_size = max_hs_out_vram_dwords_per_wg;
info->tess_offchip_ring_size = num_workgroups * max_hs_out_vram_dwords_per_wg * 4;
info->tess_factor_ring_size = typical_tess_factor_size_per_wg * num_tess_factor_wg_per_cu *
info->max_good_cu_per_sa * info->max_sa_per_se * info->max_se;
info->total_tess_ring_size = info->tess_offchip_ring_size + info->tess_factor_ring_size;
}
/* Helper function to do the ioctls needed for setup and init. */
static bool do_winsys_init(struct radeon_drm_winsys *ws)
{
@ -639,6 +710,9 @@ static bool do_winsys_init(struct radeon_drm_winsys *ws)
default:;
}
if (ws->gen == DRV_SI)
get_hs_info(&ws->info);
ws->check_vm = strstr(debug_get_option("R600_DEBUG", ""), "check_vm") != NULL ||
strstr(debug_get_option("AMD_DEBUG", ""), "check_vm") != NULL;
ws->noop_cs = debug_get_bool_option("RADEON_NOOP", false);

View file

@ -196,6 +196,9 @@ brw_compile_tcs(const struct brw_compiler *compiler,
brw_prog_data_init(&prog_data->base.base, &params->base);
brw_fill_tess_info_from_shader_info(&prog_data->tess_info,
&nir->info);
nir->info.outputs_written = key->outputs_written;
nir->info.patch_outputs_written = key->patch_outputs_written;
@ -221,6 +224,7 @@ brw_compile_tcs(const struct brw_compiler *compiler,
BITSET_TEST(nir->info.system_values_read, SYSTEM_VALUE_PRIMITIVE_ID);
prog_data->input_vertices = key->input_vertices;
prog_data->output_vertices = nir->info.tess.tcs_vertices_out;
prog_data->patch_count_threshold = get_patch_count_threshold(key->input_vertices);
if (compiler->use_tcs_multi_patch) {

Some files were not shown because too many files have changed in this diff Show more