Fixes: 692b5fa9f2 ("anv: Add shader to copy acceleration structures")
This commit fixes the future test "sparse_binding_structures" for
"header_bottom_address" for ray tracing pipeline.
Even on 48-bit ray tracing (Xe1/2), the software-defined part
instance_leaf_part1.bvh_ptr has to be in canonical form for copy.comp
to deference a bvh, which means we have to preserve the upper 16bits.
This is especially relevant in cases where the acceleration structure buffer
is located high, such as sparse buffer.
Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33745>
Fixes: 2fe57947e3 ("anv: Implement encode shader to fit in ANV BVH")
This commit resolves the failures in the future tests
"sparse_binding_structures" for rayquery. Sparse buffers' heaps are
located high, and since it's in canonical form, the higher 16bits are
all set to 1. However, the existing encoder did not expect any non-zero
values at the higher 16bits. As a result, the instance flags got
corrupted, causing most triangle tests to fail.
Thanks for Paulo providing insights about sparse buffer properties.
Co-developed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33745>
On Xe2+, Hardware will always have scale.x and scale.y as 1.0.
This is not fixing any issues.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33726>
build_subgroup_quad_mask can now be written in terms of
build_cluster_mask.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31732>
... since `BoxedHandleManager` should, well, manager the handles.
This simplifies `VkDecoderGlobalState` a little bit and should also
allow us to remove a bunch of functions that no longer need to
depend on `VkDecoderGlobalState`.
Test: cvd create --gpu_mode=gfxstream_guest_angle_host_swiftshader
Test: cvd snapshot_take --force \
--auto_suspend \
--snapshot_path=/tmp/snapshot1
Test: cvd reset -y
Test: cvd create --snapshot_path=/tmp/snapshot1
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33740>
Add vkGetFenceStatus and vkWaitForFences functions to the
global state tracking list for the host.
This will allow adding more functionality to the fences
and perform additional operations before waiting for and
signaling them.
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33740>
Enables software rendering via swiftshader on host side and angle
on guest when using DMABUF based framebuffers.
TEST=Run internal application successfully
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33740>
... to break the recursive behavior of the replay calling into
VkDecoderSnapshot so that locking and thread safety annotations can be
preserved in VkDecoderSnapshot.
Follow up to aosp/3412302.
Test: cvd create --gpu_mode=gfxstream_guest_angle_host_swiftshader
Test: cvd snapshot_take --snapshot_path=<>
Test: cvd create --snapshot_path=<>
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33740>
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
Bifrost LDEXP.v2f16 takes a 16-bit exponent, which requires messy
lowering. The codegen for this is quite bad currently, but would be
improved by implementing unpack_32_2x16_split_*, and by fusing
comparisons with CSEL.
The main alternative is converting to F32, then LDEXP.f32, then
converting back to F16. This has better codegen for dynamic exponents
currently, but worse in the common case with a constant exponent where
all the saturating cast logic can be folded.
Fixes dEQP-VK.glsl.builtin.precision_fp16_storage16b.ldexp.compute.vec2
when shaderFloat16 is enabled in panvk.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
This instruction does not support swizzles. This information is not used
for anything, but will be if we use the instruction tables for
bi_lower_swizzle.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Fixes: 316486dd9f ("pan/va: Add initial ISA.xml for Valhall")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
The original implementation of this returned false when the src was
replicated, and true when it was not.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Fixes: 21bdee7bcc ("pan/bi: Switch to lower_bool_to_bitsize")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
nir_lower_int64 may generate 16-bit fexp2 instructions, which need to be
lowered.
Fixes dEQP-VK.spirv_assembly.instruction.compute.convertstof.int64_to_float16_m1234
when shaderFloat16 is enabled in panvk. I don't believe it's possible to
trigger this with mediump, so it's not a bug without shaderFloat16.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
On vulkan, truncating to S/U16 before converting is not valid, because
out-of-range conversions are specified to be correctly rounded. IEEE 754
requires that out-of-range values round to ±inf with RTNE and ±F16_MAX
with RTZ.
On gl, truncating is valid for U16->F16, because out-of-range int->float
conversions are undefined behavior. For S16->F16, it is not valid
because S16_MAX < F16_MAX, so some in-range values will be truncated as
well.
Instead, just handle S/U16->F16 as S/U16->F32->F16.
Fixes dEQP-VK.spirv_assembly.instruction.compute.convertstof.int32_to_float16_*
when shaderFloat16 is enabled in panvk.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Fixes: be74b84e6f ("pan/bi: Fill in some more conversions")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
Right now the aliasing/overlapping checks are only done with index
0. I guess that was done because variables don't get a different
internal location even if you have a different index.
But doing that, the checks would not detect a case like this:
layout(location = 0, index = 1) out vec4 color;
layout(location = 0, index = 1) out vec4 factor;
That was used on the following piglit parser test:
spec/arb_explicit_attrib_location/1.10/compiler/layout-13.frag
And as the spec included on that test, is a link error case:
" * if more than one varying out variable is bound to the same
number and index; or"
This commit executes the aliasing checks for index 1 too, and moves
the skip down, to only skip if the current variable and all previous
location-assigned variables has different index and location.
The bad news is that now such assigned variables need to be tracked on
OpenGL-ES. Before that commit that was avoided.
With this commit the mentioned parser test properly fails to link in
any driver.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33093>
Creating moves out of undefs makes it more difficult for other passes to
detects undefs without having to chase moves. Instead, just create a new
1-component undef.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29889>
Fixed the issue where the background color was specified but not displayed.
Fixed the issue where the color would be different from the expected.
Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33790>
If we insert
<code>
s_branch 1
s_branch Target
at the end of some block, and later hide an additional chained branch
after the existing one, then we have to update the 's_branch 1' to
also jump over the newly added branch.
Fixes: cab5639a09 ('aco/assembler: chain branches instead of emitting long jumps')
Closes: #12673
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762>
Venus has long supported creating swapchain image alias via binding. So
below are exposed without extra work needed:
- VK_EXT_surface_maintenance1
- VK_EXT_swapchain_maintenance1
Test: dEQP-VK.wsi.*.maintenance1.*
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33782>
Our name for this enum was brw_message_target, but it's better known as
shared function ID or SFID. Call it brw_sfid to make it easier to find.
Now that brw only supports Gfx9+, we don't particularly care whether
SFIDs were introduced on Gfx4, Gfx6, or Gfx7.5. Also, the LSC SFIDs
were confusingly tagged "GFX12" but aren't available on Gfx12.0; they
were introduced with Alchemist/Meteorlake.
GFX6_SFID_DATAPORT_SAMPLER_CACHE in particular was confusing. It sounds
like the SFID to use for the sampler on Gfx6+, however it has nothing to
do with the sampler at all. BRW_SFID_SAMPLER remains the sampler SFID.
On Haswell, we ran out of messages on the main data cache data port, and
so they introduced two additional ones, for more messages. The modern
Tigerlake PRMs simply call these DP_DC0, DP_DC1, and DP_DC2. I think
the "sampler" name came from some idea about reorganizing messages that
never materialized (instead, the LSC came as a much larger cleanup).
Recently we've adopted the term "HDC" for the legacy data cluster, as
opposed to "LSC" for the modern Load/Store Cache. To make clear which
SFIDs target the legacy HDC dataports, we use BRW_SFID_HDC0/1/2.
We were also citing the G45, Sandybridge, and Ivybridge PRMs for a
compiler that supports none of those platforms. Cite modern docs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33650>
This is part of the hashing key :
==25753== Uninitialised byte(s) found during client check request
==25753== at 0x93D29AE: blob_write_bytes (blob.c:164)
==25753== by 0x93A62C6: vk_pipeline_precomp_shader_serialize (vk_pipeline.c:722)
==25753== by 0x93AC55E: vk_pipeline_cache_add_object (vk_pipeline_cache.c:433)
==25753== by 0x93A691B: vk_pipeline_precompile_shader (vk_pipeline.c:875)
==25753== by 0x93A8FB9: vk_create_graphics_pipeline (vk_pipeline.c:1715)
==25753== by 0x93A9799: vk_common_CreateGraphicsPipelines (vk_pipeline.c:1860)
==25753== Address 0xf1adf82 is 82 bytes inside a block of size 152 alloc'd
==25753== at 0x64FA858: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25753== by 0x99AAC38: vk_default_alloc (vk_alloc.c:26)
==25753== by 0x93A403B: vk_alloc (vk_alloc.h:48)
==25753== by 0x93A406B: vk_zalloc (vk_alloc.h:56)
==25753== by 0x93A60A0: vk_pipeline_precomp_shader_create (vk_pipeline.c:680)
==25753== by 0x93A689D: vk_pipeline_precompile_shader (vk_pipeline.c:866)
==25753== by 0x93A8FB9: vk_create_graphics_pipeline (vk_pipeline.c:1715)
==25753== by 0x93A9799: vk_common_CreateGraphicsPipelines (vk_pipeline.c:1860)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9308e8d90d ("vulkan: Add generic graphics and compute VkPipeline implementations")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33792>
It's mature, so if you want it, just enable it for your driver by default.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33480>
This is useful both for correctness (to ensure that things we think are
constant stay constant) and it improves performance a bit by reducing
register pressure and avoiding spilling.
Pipeline-db stats:
CodeSize: 29665072 -> 29437344 (-0.77%); split: -0.92%, +0.16%
Number of GPRs: 157124 -> 156082 (-0.66%)
SLM Size: 148900 -> 146436 (-1.65%)
Static cycle count: 6840286 -> 6805711 (-0.51%); split: -0.98%, +0.47%
Spills to memory: 177779 -> 173337 (-2.50%)
Fills from memory: 177779 -> 173337 (-2.50%)
Spills to reg: 17692 -> 16731 (-5.43%)
Fills from reg: 12013 -> 11897 (-0.97%)
Max warps/SM: 309128 -> 309456 (+0.11%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33771>