intel_decoder_init() initializes intel_batch_decode_ctx so later
we can call decode functions but it depends on data stored in
brw/elk_isa_info but that was being allocated in stack
of intel_decoder_init() then when the decode functions were executed
it was accessing garbage at the brw/elk_isa_info memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ec2d20a70d ("intel/tools: Add helpers for decoder_init/disasm")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34776>
(cherry picked from commit 3e5a735d01)
Or current render target cache setting is to key on the binding table
index, meaning the HW associates a number in the range [0, 7] to a
RENDER_SURFACE_STATE description. If you want change the render target
0 between 2 draw calls, you need to insert a PIPE_CONTROL in between
the 2 draw calls with pb-stall + rt-flush in order to flush an writes
to a previous RENDER_SURFACE_STATE that has now becomed disassociated
with the [0, 7] number.
This PIPE_CONTROL taking care of the flush is dealt with in
cmd_buffer_maybe_flush_rt_writes(). This function diffs the current
BTI setup for render targets (first 0 to 7 BTIs) with what the next
fragment shader wants.
The issue here is we might have a render pass with 0 color attachments
and yet in 98cdb9349a we added one pointing to the render target 0,
but in the emit_binding_table() when we finally program the BTI, we
check the render pass color count and program a null surface state
instead of an actual surface state. And this leads to hangs because
the render target cache will end up with inconsistent state data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 98cdb9349a ("anv: ensure null-rt bit in compiler isn't used when there is ds attachment")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12955
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34603>
(cherry picked from commit 63f633557f)
Not entirely sure if it's actually required, but this makes it consistence
with r600_resource_create also calling r600_compute_global_buffer_create
for global memory buffers.
Cc: mesa-stable
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34623>
(cherry picked from commit f6e3c967d9)
On a7xx, the max constlen for compute is increased to 512 vec4s or 8KB,
however the size of the LB was not increased beyond 40KB. A quick
calculation shows that 8KB of consts multiplied by 2 banks plus the
API maximum of 32KB shared memory would exceed 40KB. This means that
we can't always use a constlen of 512, and sometimes have to fall back
to 256 when a lot of shared memory is in use.
In the future, we can use similar calculations to figure out how much
"extra" shared memory is available for the backend to spill to, but we
currently don't support spilling to shared memory.
Fixes: 5879eaac18 ("ir3: Increase compute const size on a7xx")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34746>
(cherry picked from commit ea9d694a7b)
It is not longer necessary to build the panfrost gallium driver for
panfrost_compile so do not do that.
Backport-to: 25.1
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34693>
(cherry picked from commit f6c5f0c19d)
This allows building tools for cross-compiling without building gallium
or vulkan drivers unnecessarily.
Backport-to: 25.1
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34693>
(cherry picked from commit 674c96ad0a)
This allows building tools for cross-compiling without building gallium
or vulkan drivers unnecessarily.
Backport-to: 25.1
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34693>
(cherry picked from commit 007d7418f8)
This brings back c9e33e5cbf ("intel/fs/cse: Make HALT instruction act
as CSE barrier."), from the old CSE pass into the new one.
Fixes new CTS test: dEQP-VK.subgroups.shader_quad_control.terminated_invocation
Fixes: 9690bd369d ("intel/brw: Delete old local common subexpression elimination pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34643>
(cherry picked from commit 29d7b90cfc)
This fixes a compilation issue in Marvel Rivals where the legalization
logic and the encoding logic don't line up, which results in an
assertion failure on this instruction:
r17 = hfma2 r17.xx -r18.xx 0x3c003c00
The fix here is a little overly restrictive because it turns out we
actually do have modifiers for all 3 sources. Those modifiers will
be added in later commits.
Fixes: 567cae69c3 ("nak: Add 16-bits float operations")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34750>
(cherry picked from commit 1ff7135691)
For Xe2+, from Bspec 64643, bit field "StackID": The maximum number of
StackIDs can be 2^12- 1.
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34709>
(cherry picked from commit 821c1bfa7e)
In float pointing rules adding +0.0f preserves all values except
for -0.0f, so what we want here is to add -0.0f. In the future
we should add proper support for float immediates in the assembler.
Fixes: fafdd24285 ("intel/executor: Update bfloat example")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
(cherry picked from commit 3e0418ba02)
We were wrongly lowering all add/sub operations with saturation on 8-bit
values on v11+.
This fixes CTS failures on
"dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.*" and
likely more apps.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: d79a31bf81 ("pan/bi: Lower removed instructions in algebraic on v11+")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34743>
(cherry picked from commit 6ab4ae1a19)
Most of the churn in this commit is changing unit tests that were
testing things that are now invalid.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122204 -> 17122669 (<.01%)
instructions in affected programs: 120669 -> 121134 (0.39%)
helped: 0 / HURT: 124
total cycles in shared programs: 895602370 -> 895613210 (<.01%)
cycles in affected programs: 17868974 -> 17879814 (0.06%)
helped: 35 / HURT: 85
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736518 -> 210743769 (+0.00%)
Cycle count: 30377733040 -> 30377699060 (-0.00%); split: -0.00%, +0.00%
Max live registers: 66056852 -> 66056966 (+0.00%)
Totals from 1505 (0.21% of 706776) affected shaders:
Instrs: 1890151 -> 1897402 (+0.38%)
Cycle count: 48397408 -> 48363428 (-0.07%); split: -0.11%, +0.04%
Max live registers: 256821 -> 256935 (+0.04%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
(cherry picked from commit e26270249b)
When I originally wrote that code, I didn't understand what a jerk NaN
can be.
v2: Remove the brw_type_is_uint stuff. This function is currently only
called for float types. In a later commit, integer types will be
supported but only for NZ and Z conditions. Noticed by Matt.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122197 -> 17122204 (<.01%)
instructions in affected programs: 1691 -> 1698 (0.41%)
helped: 0 / HURT: 4
total cycles in shared programs: 895602484 -> 895602370 (<.01%)
cycles in affected programs: 912964 -> 912850 (-0.01%)
helped: 2 / HURT: 2
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736388 -> 210736518 (+0.00%)
Cycle count: 30377728900 -> 30377733040 (+0.00%); split: -0.00%, +0.00%
Totals from 130 (0.02% of 706776) affected shaders:
Instrs: 169911 -> 170041 (+0.08%)
Cycle count: 18021210 -> 18025350 (+0.02%); split: -0.00%, +0.02%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
(cherry picked from commit 0dab520a19)
In multi-layered framebuffer LRZ also has several layers and binning
pass needs to write depth to a correct layer, so binning VS needs
VARYING_SLOT_LAYER.
Fixes: 9775b33d0f ("tu: Enable GMEM with layered rendering")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34728>
(cherry picked from commit dea4bb3757)
While not yet officially conformant, we support all the required
features, and we pass the CTS. Let's mark off Vulkan 1.2, to make things
easier for applications.
Backport-to: 25.1
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34512>
(cherry picked from commit b3fd8ddf6a)
Previously we were not extracting the resource index from the resource
handle.
This fixes failures with PanVK+ANGLE on "dEQP-GLES31.functional.ssbo.array_length.unsized_*".
Fixes: e4613f8b23 ("panvk: Lower get_ssbo_size() on Valhall")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34697>
(cherry picked from commit 845611bb43)
Panvk already enables VK_EXT_acquire_xlib_display, but not
VK_EXT_direct_mode_display which is a dependency. This causes a failure
in dEQP-VK.info.instance_extensions.
Fixes: 8c2bfa279d ("panvk: support x11 wsi")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34672>
(cherry picked from commit 8dd578e2a4)
This has been an oversight when implementing indirect draw.
Fixes: 1f3b8bb918 ("panvk: Add support for Draw[Indexed]Indirect")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34674>
(cherry picked from commit c7f2bc6bed)
We already had a path for sysvals in panfrost_emit_const_buf, but it was
unused because we only allowed pushing the default UBO 0. Improves
glmark2 score on G610 from 3051 to 3071, but mostly we need it as a
prerequisite for dynamic blend constants.
Signed-off-by: Olivia Lee <benjamin.lee@collabora.com>
Fixes: 59a3e12039 ("panfrost: do not push "true" UBOs")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34664>
(cherry picked from commit e93261f579)
A8_UNORM on v9+ is using RGBA8_UNORM as a pixel format with the
A8_UNORM clump format to dealing with the diffences between
RGBA8 and the actual A8 in-memory layout.
The problem is, LEA_TEX only loads the InternalConversionDescriptor
which contains only the pixel format, and that's what ST_CVT uses
to do the conversion, so we'll actually store 4 components instead
of one.
This shows up with
dEQP-VK.image.load_store.without_any_format.buffer.a8_unorm* after
enabling maintenance5.
For now I've turned off the image storage capability for A8_UNORM
on all gens, but I'd be fine disabling it only on v9+ if you think
that's preferable.
Fixes: d95423686f ("pan/format: Add PAN_BIND_STORAGE_IMAGE flag")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34648>
(cherry picked from commit 9d1262e108)
This is the most serious bug we've had in a long time due to a fundamental
misunderstanding of the hardware (due to incomplete reverse-engineering). It
caught me off guard.
The texture descriptor has "mode" bits which configure two aspects of how the
address pointer is interpreted:
* whether it is indirected, pointing to a secondary page table for sparse
* whether it writes texture access counters (for Metal's idea of sparse).
...Neither of these is a "null texture" mode.
So why did I see Apple's blob using a non-normal mode for null textures, and why
did I copy those settings?
1. Because the hardware texture access counters provide a cheap way to detect
null texture accesses after the fact, which I think their GPU debug tools
use. I'm not sure why release builds of the driver do/did that, but whatever.
2. Because I assumed Cupertino knew best and I didn't bother looking too close.
We can't use them here (without doing extra memory allocations), since then
the GPU will increment access counters. And since our null texture address used
to just be a pointer in the command buffer, that mean the GPU will trash
whatever memory happened to be 0x400 bytes after the start of the null texture
descriptor. The symptom being random faults.
This bug was caught when trying to use the zero-page instead, which raised a
permission fault when the GPU tried to write counts. Then I remembered the
sparse mechanism and had a bit of a eureka moment. Immediately followed by an
"oh, f#$&" moment as I realized how many random bugs could potentially be root
caused to this.
The fix is two-fold:
1. Use normal layout instead.
2. Set the address to the zero-page (which is a fixed VA) and detect null
textures by checking the address, instead of the mode.
The latter is a good idea anyway, but both parts needs to be done atomically to
maintain bisectability.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34703>
(cherry picked from commit 3eb7575679)
With a 64 bit pointer model, instead of doing -1 the pass ended up doing
+4294967295. The reason here was some implicit integer conversion going
horribly wrong, so just do the offset math in 64 bit to get a nice result.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13023
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34669>
(cherry picked from commit 33965bb21b)
For h264/h265/av1/vp9, give warning when application is
sending more slices than allowed by limit, and stop copying
remaining slices to avoid unwanted behaviour.
Cc: mesa-stable
Signed-off-by: Yinjie Yao <yinjie.yao@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34633>
(cherry picked from commit eecfb02463)
Since headless overrides create_mem, it needs to override finish_create
too. Fixes a segfault in nvk that was caused by us mixing
wsi_create_null_image_mem with wsi_finish_create_blit_context, which
would then call CmdCopyImageToBuffer with image->blit.buffer == NULL
Fixes a cts failure on nvk in:
dEQP-VK.image.swapchain_mutable.headless.2d.r8g8b8a8_unorm_b8g8r8a8_unorm_clear_copy_format_list
and several others
Fixes: 579578f10a ("vulkan/wsi/drm: Break create_prime_image in pieces")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34646>
(cherry picked from commit 60452e016e)