We set max subgroup size as 16 for 'UnrealEngine5.1', this improves a
customer benchmark by 50% on A750.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26385>
(cherry picked from commit 012b6fbe63)
Allocate a ralloc sub-context which takes the u64 hash table as a parent
and attach a destructor to it so we can free the hash_key_u64 objects
that were allocated by _mesa_hash_table_u64_insert().
The order of creation of this sub-context is crucial: it needs to happen
after the _mesa_hash_table_create() call to guarantee that the
destructor is called before ht->table and its children are freed,
otherwise the _mesa_hash_table_u64_clear() call in the destructor leads
to a use-after-free situation.
Fixes: ff494361be ("util: rzalloc and free hash_table_u64")
Cc: stable
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26423>
(cherry picked from commit db5166718d)
When an entry exists, _mesa_hash_table_insert() updates the entry with
the new data/key pair, which causes a leak if the key has previously
been dynamically allocated, like is the case for hash_u64_key keys.
One solution to solve that is to do the insertion in two steps: first
_mesa_hash_table_search_pre_hashed(), and if the entry doesn't exist
_mesa_hash_table_insert_pre_hashed(). But approach forces us to do the
double-hashing twice.
Another approach is to extract the logic in hash_table_insert() that's
responsible for the searching and entry allocation into a separate helper
called hash_table_get_entry(), and keep the entry::{key,data} assignment
in hash_table_insert().
This way we can re-use hash_table_get_entry() from
_mesa_hash_table_u64_insert(), and lake sure we free the allocated
key if the entry was already present.
Fixes: 6649b840c3 ("mesa/util: add a hash table wrapper which support 64-bit keys")
Cc: stable
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26423>
(cherry picked from commit 5a60fd7b14)
CALLOC_STRUCT() calls the OS abstraction layer to do the allocation.
Call FREE() to free the corresponding objects so we keep things
consistent and have proper debug traces when memory-debugging
is enabled.
Fixes: 6649b840c3 ("mesa/util: add a hash table wrapper which support 64-bit keys")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26423>
(cherry picked from commit 977cc3157d)
Crysis 2 and 3 Remastered's RT shaders non-uniformly index into SSBO
descriptor arrays without specifying the NonUniformEXT qualifier on the
relevant access chains/load ops. This leads to artifacts around objects.
To add insult to injury, the game fails to provide a meaningful
applicationName/engineName in the Vulkan part of the DX11-Vulkan interop
solution used for RT. Both of these fields are set to "nvpro-sample"
(perhaps the code has been copied from NVIDIA's sample applications).
Therefore, fall back to executable name matching.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9883
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26391>
(cherry picked from commit f1817ab7e0)
Vulkan backend of Hades can only handle at most 3 swapchain images.
It affects all drivers after below commit:
04d654a5d0
and then only affects specific driver backend which enables
extra_xwayland_image in wsi device options after below commit:
c1a62476ac
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Renato Pereyra <renatopereyra@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26607>
(cherry picked from commit 88c5affacf)
vl_rbsp_fillbits may fill less than 32 bits if it removes emulation
prevention bytes, but will fill at least 16 bits. We need to call it
twice when reading more than 16 bits.
This fixes parsing H264 SPS packed header in va frontend when emulation
prevention bytes are at position where 32 bit values are read.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26276>
(cherry picked from commit 73d69ef1e6)
There are rendering issues with FCV on DG2 and Unreal engine 5.1,
patch adds option to disable fcv in drirc.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26169>
(cherry picked from commit 01046cd6ad)
Fixes UBSan error:
src/util/sha1/sha1.c:140:8: runtime error: null pointer passed as argument 2, which is declared to never be null
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25853>
This is used for parsing VA packed headers and those can be without
emulation prevention bytes.
Add emulation_bytes argument to vl_rbsp_init and skip all emulation
bytes handling when set.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25565>
When the number of draw calls is very large, instead of allocating
large amounts of batch buffer space for the draws, use a ring buffer
and process the draw calls by batches.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8645
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25361>
gcc is able to optimize away either the modulo or the logical and. This
makes no difference to gcc.
clang is only able to optimize away the logical and. This allows clang
to generate faster code for BITFIELD_MASK.
As for BITFIELD64_MASK, this also makes no difference to clang except it
fixes a compile error for BITFIELD64_MASK(64):
error: shift count >= width of type [-Werror,-Wshift-count-overflow]
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9989
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25757>
The driver glue doesn't have access to that information in a
centralized place. If you want to generate perfetto iid, you need
access to all names.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25730>
Useful to figure out what's the tracepoint name you're implementing.
We'll use this in the intel perfetto integration glue to index into an
array of perfetto iid.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25730>
With this, we pick up new in-tree defaults for driconfig variables
when using meson devenv. This is useful for testing new config
variables.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21808>
This adds an environment variable that can be used to override the
global drirc serach directory. This can be useful for debugging, and
meson devenv, as used in the following commit.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21808>
this patch calls the init and finish functions of the vk
runtime astc decoder. initializes emulate_astc flag. sets
up the additional plane to store decoded texture.
v2: fix _tex_dataformat() and _tex_numformat() (Chia-I Wu)
use correct function for bufferToImage (Chia-I Wu)
v3: add radv_is_layout_emulated() (Chia-I Wu)
avoid repeated pattern (Chia-I Wu)
v4: not create all pipelines on_demand (Chia-I Wu)
v5: current code does not support astc hdr (Chia-I Wu)
v6: keep luts in staging buffer only (Chia-I Wu)
v7: use 2DArray for both input and output
v8: document todo to use fp16 (Chia-I Wu)
not required to move meta init anymore (Chia-I Wu)
move astc_emulation_format to vk_texcompress_astc.h (Chia-I Wu)
v9: remove LAYOUT check (Chia-I Wu)
check on iview->vk.view_format
move setting tiled flags for astc (Chia-I Wu)
remove is format emulated check in radv_is_storage_image* (Chia-I Wu)
use LAYOUT_ASTC for if check (Chia-I Wu)
no 1D support (Chia-I Wu)
calculate start end offset in 2x blk size
v10: remove old wrong code (Chia-I Wu)
v11: use existing defined local format variable (Chia-I Wu)
dst image layout is always VK_IMAGE_LAYOUT_GENERAL (Chia-I Wu)
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24672>
v2: remove extra u_formats.h header addition (Chia-I Wu)
use stddef.h and not unistd.h (Chia-I Wu)
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24672>
We convert from doubles to half by going through float in between, but
as noted in the comment in this commit, that can give wrong results in
some cases.
Add some helpers to ensure correct results based on rounding mode that
will be used in the next commit.
v2: Use fi/di from u_math.h (Ian)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281>
In the linear allocator, when a size larger than the minimum
buffer size is allocated, we currently create the new buffer
to fit exactly the requested size.
In that case, don't bother updating the `latest` pointer, since
this newly created buffer is already full. In the worst case,
the current `latest` is full and it would be the same; in the
best case, there's still room for small allocations, so we avoid
wasting space if the next allocations would fit.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25517>
A descriptor set is internally reserved for descriptor set dynamic
offset which might not be used by an applications which otherwise
requires an extra descriptor set. This driconf option allows making
that trade-off by dropping support for dynamic offsets in exchange
for an extra descriptor set which means 5 usable descriptor sets on
A6XX and 8 on A7XX.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25534>
In release mode print the ralloc tree header pointers. In debug mode,
will look at the child allocations recursively checking for the canary
values. In most cases this should work and give us extra information
about the type of the subtree (GC, Linear) and the allocation sizes.
For example, calling it at the end of spirv_to_nir() will output the
following tree with a summary at the end (lines elided to focus on the
general structure of the output)
```
0xca8d00 (472 bytes)
0xdf2c60 (32 bytes)
0xdf2c00 (32 bytes)
0xdf2ba0 (32 bytes)
0xdf2b40 (32 bytes)
(...)
0xcde8b0 (64 bytes)
0xcde760 (72 bytes)
0xd2de40 (168 bytes)
0xcce660 (64 bytes)
0xcce510 (72 bytes)
0xcce5a0 (120 bytes)
0xcce490 (64 bytes)
(...)
0xcbc450 (456 bytes)
0xdf55c0 (160 bytes)
0xdf5730 (72 bytes)
0xdf57c0 (80 bytes)
0xdf5530 (72 bytes)
0xdf56a0 (80 bytes)
(...)
0xcbe840 (128 bytes)
0xcc4310 (4 bytes)
0xcbc660 (536 bytes) (gc context)
0xde6b40 (32576 bytes) (gc slab or large block)
0xddb160 (32704 bytes) (gc slab or large block)
0xdc8d50 (32704 bytes) (gc slab or large block)
(...)
0xcde9a0 (32704 bytes) (gc slab or large block)
0xcd6720 (32704 bytes) (gc slab or large block)
0xcce6e0 (32768 bytes) (gc slab or large block)
0xcbc330 (72 bytes)
0xd680a0 (208 bytes)
0xca9010 (78560 bytes)
0xca8f20 (176 bytes)
==== RALLOC INFO ptr=0xca8d30 info=0xca8d00
ralloc allocations = 4714
- linear = 0
- gc = 23
- other = 4691
content bytes = 1055139
ralloc metadata bytes = 226272
linear metadata bytes = 0
====
```
There's a flag to pass so only the summary at the end is printed.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25482>
When creating a new node, we were clobbering the original size
requested, and use that as offset, so the node would always be full.
Fixes: 591db9a9a5 ("util: Remove per-buffer header in linear alloc for release mode")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25382>
Not comprehensive, but those were the ones used to work on the
previous linear_alloc changes. Also having a test already
set up lower the barrier to add more tests for future in case of
bugs.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
There's only need to keep the offset and size of the latest buffer,
so rename linear_header into linear_ctx and change the code to
keep records there.
For debug mode we still keep a header, now called linear_node_canary,
to have a magic check. Since due to alignment we have a free space,
also keep the individual occupation of each node (offset), for
debugging.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
With linear_realloc() gone, there's no code that reads the size
in linear_size_chunk struct, so it can be removed. This removes
the 8-byte overhead per child allocation and simplifies the
allocation code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
Now that linear_realloc() is unused, remove it. It is not an actual
realloc, will always allocate new memory and copy data around -- and
had a big warning about it in the documentation.
In the couple of uses we had before, the client code knew the size,
so it could be changed to perform the allocation and the copy by
themselves. The client code keeping the size is the recommended
way here.
This will allow us remove linear_size_chunk later.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
Note that for linear allocator, the realloc will always
allocate new memory. In both cases that realloc was used,
the existing size was known, so we can just allocate
and do the copy ourselves.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
In the linear allocation only the parent (context) can be used
to allocate new children, so let's use an opaque type to identify
the linear context. This is similar to what's done in GC allocator.
Update the documentation and a couple of function names to
refer to linear context instead of linear parent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
Linear allocator doesn't support calling custom destructors to
its child allocations nor freeing individual child allocations.
So the destructor callback and the delete operator don't apply
to objects using linear allocator.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>