The flags_regid is only present if gs is present (in which case, gs is
the last_shader). If there is no gs, flags_regid is initialized to
zero, not INVALID_REG (r63.x). But you have to scroll up several pages
of a long fxn to see that.
Move the assert to make things more clear.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
Drop the reg_config table trickery, which doesn't play nicely with
register changes across generations. (Note, some of the registers,
like PC_HS_CNTL, are not in fact the same as other shader stages.)
While at it, convert to crb to simplify copying code from the gallium
driver.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
Rework it to take all active/enabled shader stages in one shot, to
simplify things and drop the xs_configs table.
This lets us use the variant reg packers directly to better deal with
register changes across generations.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
Drop the legacy register offset macros. And re-work how we map the vk
query to pipeline stat offset, to account for re-ordering of the
pipeline stat regs in gen8.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
These were only used in one place. Drop them and convert to using the
new register packers for improved cross-gen portability.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
The non-variant reg packers were removed in commit fd6489c026 ("tu:
Drop emitting of deprecated packing."), along with
FD_NO_DEPRECATED_PACK.
Add support to mark the even older reg builders as deprecated, and
re-introduce FD_NO_DEPRECATED_PACK to control this.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
If the builder is passed just a raw value, like an iova in the case of
turnip, we probably don't want to assert that it is all unknown bits.
This hasn't been a problem for the gallium driver, as the bo pointer is
first. But became a problem for turnip with commit 2d6c15ad57 ("tu:
remove magic bo reg packing (use iovas directly)").
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
sam.s2en uses the first src for its samp/tex while the coordinates (for
which derivatives need to be calculated) are in the second src. We used
to unconditionally track needed helpers for the first src causing (eq)
to potentially get scheduled too early for sam.s2en. Fix this by using
the second src for sam.s2en.
Fixes frame instability in Metro Exodus.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 29f8277952 ("ir3/legalize: schedule (eq) more accurately")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38992>
Add an option that signals if we can use transfer queues on the current
hardware and kernel or not.
Reviewed-by: Thomas H.P. Andersen <phomes@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36617>
By calling into nvk_event_report_semaphore we get better support for
queues other than graphics.
Reviewed-by: Thomas H.P. Andersen <phomes@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36617>
Most of the functions were using unsigned but we had too uint32 and even a
function with a uint64_t so lets standarize into uint32.
No changes in behavior expected.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39024>
When in alloc_bo_from_slabs() size and alloc_size are different enough to have
different pb_slabs it causes the slab to be put into the reclaim list of a the
smaller pb_slabs when calling iris_bo_unreference(), causing a memory leak of
(alloc_size - size) bytes.
So here storing and using the actual slab size to fix this issue.
Cc: stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39024>
The CTS issue for this was closed two months ago, so this should be
fixed now.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38997>
Those helpers can be called late (since it's mostly for debug
purposes). This can avoid surprises in the backend and also avoids
rerunning gather_info.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38995>
This will allow us to build this multiple times for different
architectures. For now, it only defines a single architecture, because
that's what we currently support. But this makes room for future
architectures, that will follow relatively soon.
Co-authored-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38922>
When running Yooka-Laylee under FEX, the executable name will be the one of
the FEX binary, which the existing driconf option won't match. FEX is able
to override the executable name in newer versions, but overall it's still
more reliable to match the application name provided through Vulkan.
Fixes: 0574bfd5f4 ("tu: add UBO lowering workaround for Yooka-Laylee")
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39012>
Image size queries for buffer images were incorrectly using the
underlying buffer's width instead of the image view's size.
This affected `get_image_width` in OpenCL C for 1Dbuffer images, in
cases where the buffer is larger than the image to account for
padding, breaking the conformance test `test_kernel_image_methods
1Dbuffer`.
Fixes: 0efe7a6eb9 ("panfrost: implement image_size sysval")
Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38949>
Decode logic in Gfx12+ has become complex with the new types, so Caio
suggested that we move to the table like other gens.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>
These type encodings were first were used in dpas instructions, but
continue to be used in more places.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>
We were previously reporting a larger maxStorageBufferRange than our
maxBufferSize, which is weird. Lower maxStorageBufferRange to match
maxBufferSize.
Fixes crucible stress.limits.buffer-update.range.storage.q0
Fixes: 65f12fde44 ("nvk: Improve address space and buffer size limits")
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39021>
Later commits will call DCE after lowering has been performed. Creating
more things that would need lowering is problematic.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
Later commits will call cmod prop after register allocation. At that
time, there is only FIXED_GRF.
No shader-db or fossil-db changes on any Intel platform.
v2: FIXED_GRF uses subnr instead of offset. Add a unit test to
demonstrate the issue.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
PRMs for G35 (Gfx4) through Ivy Bridge (Gfx7) all say that conditional
modifiers are allowed for MACH. Starting with Haswell (Gfx7.5), this
seems to be removed. This function doesn't have any way to know the
platform, so false is returned for all platforms.
No shader-db or fossil-db changes on any Intel platform.
Prevents a failure in "brw: Do cmod prop again after post-RA scheduling"
in piglit's builtin-uint-mad_sat-1.0.generated.cl.
Cc: stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
The group implicity selects which flags the instruction can write. This
was discovered while working on another set of changes that could change
some logical operations into predicated MOV instructions.
Prevents regressions later in the series in
dEQP-VK.graphicsfuzz.cov-loop-fragcoord-identical-condition.
No shader-db or fossil-db changes on any Intel platform.
v2: Update the comment in the test case. Suggested by Caio.
Fixes: 95ac3b1dae ("i965/fs: don't propagate cmod when the exec sizes differ")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>
Full command lines include full path to the output file, which triggers
reproducibility warnings (e.g. in Yocto builds). Drop the args and print
only a basename of the script used to generate the file.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38875>
An example when the memory leak happens: requested_size = 4 and alignment = 65536 in anv_slab_bo_alloc:
The alloc_size = 65536 and requested = 4 in this case.
The group to allocate the entry is the group of size 65536 based on the entry size,
while the group to reclaim the entry is the group of size 4 due to the bo->size is
registered as the requested_size=4 and used in anv_slab_bo_free.
That means, the entry is allocated in group[order of size 65535]->free,
moved from group[order of size 65535]->free to the user, and then moved
to group[order of size 4]->reclaim, so the entries is accumulated in
group[order of size 4]->reclaim and group[order of size 65535] keeps
allocating new entries and leading to OOM.
The solution is to use `bo->actual_size` to get the group in pb_slab_bo_free using the allocation size.
Fixes: dabb012423 ("anv: Implement anv_slab_bo and enable memory pool")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14396
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: hwandy <hwandy@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38989>
Enables compression for select images. Additionally, we get large (64K), and
huge (2M) pages as a bonus as the hardware can only do compression on these page
sizes. However, due to nouveau limitations, this means that we are limited to
enabling it on things pinned to VRAM. Fortunately, this works out for us as we
can enable it for color, Z/S, and storage images, which are the main types
to benefit from compression as they're write heavy.
Unfortunately, this means that we need to handle the memory allocation in a
delicate way, as the Vulkan API is a bit restrictive in this regard, so we have
to use dedicated allocations for compression/larger pages.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36450>
Set it to 500 tests, as if just only one test fails the asan, all the
tests will be marked as fail too. Keeping the size smaller, will allow
to process later to bisect searching for the tests that actually expose
the issue.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39022>
This fixes a page fault when nr_samples=4 but nr_storage_samples=2.
Based on si_is_format_supported this is only supported for color
formats and when has_eqaa_surface_allocator is true (< GFX11).
The referenced commit below didn't introduce the issue but it
exposed it by forcing the gfx blit path to be used.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13255
Fixes: 3424e16ece ("radeonsi: add decision code to select when to use CB_RESOLVE for performance")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38925>
It displays the renderer string and the PCIe bus info.
It's not a real graph because hud_graph is built to draw
numbers and 'dev' is the only use case so far where we
just want to draw a string.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38925>
This makes the layout of "fps,cpu" identical to "fps,stdout,cpu".
Without this change, the ',' separator after 'stdout' would increase
y and we would have a gap between the fps and cpu graphs.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38925>
The steering bits tell the GPU which caches to invalidate on the
subsequent uniform state writes. There is no point in writing
those steering bits when there are no uniforms to emit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38998>
We leave GS pushed inputs using load_per_vertex_input for now - they're
relatively simple, and using load_attribute_payload doesn't work well
since it's assumed to be convergent (for TES, FS inputs) while GS inputs
are divergent.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
We're going to be deciding on push vs. pull in the NIR lowering pass
soon, so move the code to limit our register usage from brw's thread
payload code to brw_nir_lower_gs_inputs().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
This reduces the number of initialized VGPRs by 1 when no barycentric
coordinates are used.
I have verified with zink that this indeed increases performance for
cases where sysvals like frag_coord and front_face are used without
interpolated PS inputs.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38936>
When no barycentric VGPRs are needed, we always enabled one of the pairs
(e.g. PERSP_SAMPLE_ENA) because it's a HW requirement. However,
the requirement says that LINE_STIPPLE_TEX_ENA can be enabled instead,
which occupies only 1 VGPR.
To get maximum pixel throughput, we can only have 2 initialized VGPRs
at most. By reducing initialized VGPRs from 2 (with PERSP_SAMPLE_ENA) to 1
(with LINE_STIPPLE_TEX_ENA), we can have 1 additional initialized VGPR
for free with maximum pixel throughput, such as POS_FIXED_PT for
frag_coord.xy without MSAA.
Only ACO gets this perf improvement because the change would be more
complicated with LLVM.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38915>
This game sets the reset isolation bit which causes the GL context
creation to fail as Mesa doesn't support the
GLX_ARB_robustness_application_isolation extension. Here we override
and clear the bit.
According to the spec says:
"The GLX_ARB_robustness_application_isolation and
GLX_ARB_robustness_share_group_isolation extensions do not provide
guarantees for graphics resets caused by applications which did
not create their contexts with both the LOSE_CONTEXT_ON_RESET_ARB
reset notification strategy and the
GLX_CONTEXT_RESET_ISOLATION_BIT_ARB bit."
And the game doesn't set LOSE_CONTEXT_ON_RESET_ARB so technically
we could ignore the reset isolation bit even if Mesa did support
the extension.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13336
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38668>
This allows us to override and clear the reset isolation bit.
It will be used in the following patch to override missing support
for GLX_CONTEXT_RESET_ISOLATION_BIT_ARB.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38668>
A in-place resolve via the BLT engine is only supposed to fill the
tiles of a single layer of a resource, so the size to calculate the
number of tiles is the layer stride, same as done for the in-place
resolve via the RS engine in
8df11f3fad ("etnaviv: fix in-place resolve tile count.")
CC: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39005>
Previously, we would spill at the NIR level any temp array over 16 vec4s.
This had two problems:
1) We wouldn't spill for the worst case scenario: a MAD accessing a dst
array and 3 different src arrays (that all get fully unspilled, rather
than just reloading the specific reg in the operand). This would fail to
register allocate. We haven't seen this in practice.
2) We would spill vec4[17] and larger arrays that weren't necessary to get
the shader to register allocate. This occurred on a FS for in Stray that
had a vec4[24] array and just 4 vec4s of register pressure other than the
array.
Instead, use NIR scratch spilling when the worst case set of vars to
reference in an instruction would overflow GPR space. This makes the
shader in Stray go from 11ms to .5ms, by eliminating all spilling and
leaving the array in GPRs. On the other hand, if leaving the arrays
unspilled in NIR means that we cause spilling in ir3, the fact that ir3
spills/reloads work on the whole array may cause the amount of spilling to
increase. However, we can see the effect is very small in terms of number
of shaders affected in shader-db and an overwhelmingly positive effect on
spills:
MaxWaves: 22522470 -> 22520664 (-0.01%)
Instrs: 396093281 -> 396122221 (+0.01%); split: -0.00%, +0.01%
STPs: 218915 -> 182907 (-16.45%)
LDPs: 155374 -> 153364 (-1.29%); split: -2.79%, +1.50%
Totals from 496 (0.03% of 1561298) affected shaders:
MaxWaves: 3792 -> 1986 (-47.63%)
Instrs: 441224 -> 470164 (+6.56%); split: -0.00%, +6.57%
CodeSize: 926164 -> 976734 (+5.46%); split: -0.05%, +5.52%
NOPs: 58896 -> 52765 (-10.41%); split: -14.95%, +4.60%
MOVs: 16314 -> 57901 (+254.92%)
COVs: 3293 -> 5146 (+56.27%)
Full: 12876 -> 23632 (+83.54%)
(ss): 18613 -> 11573 (-37.82%); split: -47.53%, +9.71%
(sy): 2539 -> 2505 (-1.34%); split: -10.75%, +9.41%
(ss)-stall: 40682 -> 26413 (-35.07%); split: -47.90%, +12.80%
(sy)-stall: 147862 -> 117004 (-20.87%); split: -37.65%, +16.69%
STPs: 38566 -> 2558 (-93.37%)
LDPs: 5060 -> 3050 (-39.72%); split: -85.77%, +45.93%
Cat0: 65593 -> 59487 (-9.31%); split: -13.42%, +4.15%
Cat1: 19667 -> 63105 (+220.87%)
Cat2: 155958 -> 157879 (+1.23%); split: -0.05%, +1.28%
Cat6: 105228 -> 94910 (-9.81%); split: -12.36%, +2.54%
Cat7: 2480 -> 2485 (+0.20%); split: -0.08%, +0.28%
Subgroup size: 31872 -> 31744 (-0.40%)
The primary impacted application from shader-db is gfxbench aztec ruins.
A quick test of it showed no significant performance improvement (n=3).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
This is mostly a s/dyn ShaderModel/ShaderModelInfo/ with a few manual fixes.
With this change, we now statically dispatch into ShaderModel, which is
a bit faster than dynamically dispatching. Together, this commit and the
last one improve compile times by about 1% geomean.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38913>
Because the tracked registers are really driver dependant, the driver
is expected to handle the tracked_registers struct itself.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
Both PVR and PanVK are drivers for generic embedded GPU IP cores, so
just take the can_present_on_device implementation from PanVK, which
allows any platform devices for presentation.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38985>
Now that qcom has released the gpu firmware for the a750, let's stop
using my fw package in favor of the publicly-available ones.
v2:
* Be more specific in the list of files we want to keep (lumag)
* Uprev the linux firmware version
* Use gfx-ci/firmware rather than the upstream gitlab repo
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38977>
We also need to do this in the GLES-only code-path, otherwise we'll end
up setting PIPE_BIND_RENDER_TARGET for these, which means we'll
incorrectly require these to be color-renderable.
Fixes: 60e115dedf ("mesa/st: do not drop binding prematurely")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38945>
This is not actually necessary and moreover was corrupting
mipmapped arrayed 2D images in cases when the transition barrier
wasn't transitioning all mips, but more than one layer.
Keep the layout transition infrastructure in place as we'll need
it for transaction elimination CRC zeroing on v10-.
Fixes: c95f8993 ("panvk: add a meta command for transitioning image layout")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38972>
Vulkan allows destroying an image without destroying the views of
this image first. These views can not be used in any way and the
only thing that the user can do with such a view is destroy it.
This also means that the driver can not refer to the image inside
the image view's destructor.
Fixes cb3f6481 ("panvk: Create MS shadow images and views")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38972>
When swapping buffer with damage regions, to be strictly correct we
need to swap the entire back buffer to the front buffer. This needs to
be done in case the compositor does not support damage regions. This
means we need to ignore the input damage region and tell drisw to swap
the entire buffer.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38817>
It is enough to compute them after upload.
This saves some disk space and eliminates an unlikely
bug where the shader cache is shared between two GPUs
with the same chip but a different number of enabled CUs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38970>
On Xe3+, we have typed MSAA load/store message support. We can use them
during MSAA copies. We don't have to fallback on RCS companion queue
anymore.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
We are passing number of layers as inline parameter register, so figure
out z_pos and write to 2D MSAA array images in compute
shader. We already get component X, Y and sample index, all we needed
was the number of layers.
Ken:
- Use load/store var instead of derefs
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
Only 3D shader gets dispatched per sample not the compute shader.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
The VK_ERROR_FRAGMENTED_POOL and VK_ERROR_OUT_OF_POOL_MEMORY errors are
not as exceptional cases as most. These are expected to be hit by
applications in the normal course of doing their thing. Probably best
not to spam stderr and the debug logs with them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38940>
The pitch is in bytes, rather than pixels, whereas internally lrz_layout
uses a pitch in pixels. Adjust the xml and state emit accordingly.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38930>
Sample count is only known at pipeline bind not when the render pass is
started since there is no attachments to infer sample count.
Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38957>
vkCmdSetEvent2 and vkCmdResetEvent2 can only be called from outside a
render pass so there is no need to handle anything about them.
Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38957>
Not ending the compute encoder allows us to concatenate next commands into
the same encoder if a compute encoder is requested.
Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38957>
Required to avoid availability write races. We could have an availability
update pending so adding the reset availability write to the same pool of
writes led to write races. Avoid this by flushing writes before reseting
queries.
Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38957>
Writes will now happen before a command encoder is started aiming to merge
them with the existing compute encoder before we end it. Alternatively,
they also happen when a compute encoder is started. This will simplify
the migration to Metal4 later down the road.
Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38957>
Piles of CTS blowing up, e.g. dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.2d.r5g6b5_unorm_pack16.r32g32b32a32_sfloat.optimal_optimal_linear
Fixes: 4bbc29373a ("nir/lower_flrp: Check and set shader_info::flrp_lowered")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>
Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff
as-if hand written:
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>
We currently have BITSET_*_RANGE macros which take a closed interval/range: a
start bit and an end bit. Occassionally that is what you want, but most of the
time callers actually want a start and a length. For example, register
allocators will often do operations at (variable start register, variable start
register + variable size - 1). It's more convenient to just take a start and a
size, while also making the size=0 case well-defined as a no-op set/clear and
false for test.
This patch adds BITSET_*_COUNT macros aliasing to the existing range macros, and
the rest of the series converts many call sites across the tree to use the new
macros.
Of the few call sites not converted, a whole bunch look like off-by-one bugs
which I did not want to "fix" here and risk breaking something else. Probably
worth checking your driver if you have RANGE calls leftover after this series.
Also, aco and dozen both open-coded RANGE helpers that should probably be
switched to the common code but that's neither here nor there.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>
Avoid using exec_node::remove() and the initial "main list of
instructions", and instead use the existing helpers like other
passes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37146>
This will also fix NIR_DEBUG=extended_validation complaining about
invalid loop analysis. GCM will invalidate loop analysis if progress
was made, and depending on the removed instruction it will affect the
instr_cost.
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38932>
While the MAX2 thing here is correct for some formats, it's not correct
for all; for instance R8_SNORM doesn't need 32-bits here.
This should enable some higersample-counts on some 8 and 16-bit formats
on some Mali GPUs.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38968>
We're going to have to do this from non arch-specific code, so let's
factor the meat out into a helper so we don't need to repeat the logic.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38968>
This matches better what we do in pan_emit_fbd, where we don't increase
the cbuf_offset variable for unused render-targets. This way we simply
make sure we *at least* can fit a dummy-RT (as per the HW spec), but
since we don't write to it we also don't need to give it dedicated
memory beyond that.
This also seemingly fixes a subtle bug where we don't deal with PLS if
there's no active render-targets.
Fixes: 9ec6197a0b ("panfrost: allocate tile-buffer for dummy render-targets")
Fixes: c15a43cce0 ("pan/lib: prepare for pixel local storage support")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38968>
When only the tile buffers are touched, it's okay to take care of the
dependency at the draw level, with DCD_FLAGS_2, but as soon as one side
of the dep has side effects that could impact the other side, we need to
split the render pass and insert a real barrier, with a proper flush on
read-only L1 caches.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38950>
There seems to be a weston crash leading to new gfx@ failures.
Reflect that in the zink-anv-tgl fails list so we can keep merging
stuff that touch common files.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
We've notived a few weird crashes that look like memory corruption.
Let's add an -asan job to see if we can catch any
double-free/UAF/memleak issues.
Since the new -asan job is also a full run, and we're short on rock5s,
discard the g610-vk-full job. The reason the new job takes roughly
the same time even though it's less parallel and has the ASan overhead
is because of the tests_per_group increase.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
We intend to add panfrost-xxx-asan jobs soon, so let's add our
gallium/vulkan drivers to the lists.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
Things are stable enough now to increase the number tests per group,
which allows us to lower the DEQP_FRACTION to 3. This also allows
us to lower the 'parallel' property to 4 leaving one extra board for
other jobs to run.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
dEQP-VK.image.general_layout.memory_barrier.fragment.write_read.* are
passing sometimes, which causes UnexpectedImprovement(Pass) to show up,
but the bug still exists.
Add those to the flake list until this is fully sorted out.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
If we don't do that and something fails in the middle, we leak
the decode context.
Fixes: d155d6b7a3 ("panvk: Add a decode context at the panvk_device level")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
cs_finish() is doing two things:
1. wrapping up the CS to prepare for its execution
2. freeing the temporary instrs array and maybe_ctx allocations
Mixing those two things lead to confusion and leaks, so let's split
those into cs_end() and cs_builder_fini(), and make sure panvk/panfrost
call both when appropriate.
Fixes: 50d2396b7e ("pan/cs: add helpers to emit contiguous csf code blocks")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
own_bin needs to be set to true if we want the bin_ptr to be freed.
Fixes: 3d2cc01f8a ("panvk: Add create_shader_from_binary")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
s/util_dynarray_clear/util_dynarray_fini/ to fix the leak.
Fixes: 7dc4f28507 ("pan/bi: schedule simple iterators to avoid extra move")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
The desc_heap field is unconditionally initialized, so we need to
call util_vma_heap_finish() on it.
Fixes: ec02137c86 ("panvk: Support DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
VK_EXT_multisampled_render_to_single_sampled needs those to be able
to render to the MS attachment when the app only provides a single-
sampled one.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38825>
Couldn't find in the docs a reference for the types needing to match,
and simulator + MTL seem fine with mixing UD and UW, so not adding
a replacement for the removed assertions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
Most stages call this as part of brw_nir_postprocess_opts() but mesh
lowers to URB intrinsics after that since it needs bit-sizes lowered.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
We don't bother with maximums or wrapping because it shouldn't come up
for IO intrinsics anyway.
fossil-db results on Battlemage:
Instrs: 231363032 -> 231359554 (-0.00%)
Cycle count: 34057005552.0 -> 34057236190.0 (+0.00%); split: -0.00%, +0.00%
Max live registers: 71873886 -> 71870438 (-0.00%)
Non SSA regs after NIR: 67159408 -> 67159523 (+0.00%)
Totals from 1779 (0.23% of 788851) affected shaders:
Instrs: 774359 -> 770881 (-0.45%)
Cycle count: 10551280.0 -> 10781918.0 (+2.19%); split: -0.32%, +2.51%
Max live registers: 158193 -> 154745 (-2.18%)
Non SSA regs after NIR: 180104 -> 180219 (+0.06%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
We keep this separate from the other lowering infrastructure because
there's no semantic IO involved here, just byte offsets. Also, it needs
to run after nir_lower_mem_access_bit_sizes, which means it needs to be
run from brw_postprocess_opts. But we can't do the mesh URB lowering
there because that doesn't have the MUE map.
It's not that much code as a separate pass, though.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
With all the infrastructure in place, this is largely a matter of
calling the lowering passes with the appropriate data from the MUE map.
MUE initialization is now done with semantic IO instead of raw offsets.
This drops another case of non-standard NIR IO usage (and no_validate).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
(Split by Ken from a larger patch originally written by Lionel.)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
(Based on the original implementation by Lionel Landwerlin, but adapted
to my respun URB lowering framework.)
The mesh shader URB payload requires reading and writing fields at
arbitrary DWord offsets. For example, the Primitive Indices array
starts at DWord 1, and it can be a vec1[], vec2[], or vec3[] array,
leading to very unaligned and sometimes double-parked elements.
Still, most fields are still conveniently vec4-aligned.
To handle this, we add a new cb_data::vec4_access flag. If set, access
remains in vec4 units, with vec4 alignment. We use this for non-mesh
stages. When unset, offset is in 32-bit units, allowing unaligned
DWord access.
This is trivial to support on Xe2, where the LSC URB messages support
arbitrary byte-aligned addressing. On older platforms, we have to
convert this to vec4 aligned offsets plus a component offset (either
returning a subset of the channels loaded, or using component masking
to store a subset of a vec4/vec8).
Thankfully, since the OWord URB messages support accessing a vec8 at
a time, this means we can do any vec4 access in one message, even if
it's double-parked. We use mod-analysis to see if we can statically
determine the sub-vec4 component offset required (we often can). If
not, we use the ability to have dynamic writemasks to sort it out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
When considering ((x << y) % divisor), we recursed to calculate
mod = (x % (divisor << y)) but incorrectly returned mod directly,
rather than the correct value, (mod << y).
(Note that we require divisor to be a power-of-two.)
As an example of this going wrong, (x << 1) % 4 was returning (x % 2)
which is 0 or 1, but x << 1 is 2x, which is always an even number so
the result mod 4 can only be 0 or 2.
Unit test suggested by Caio Oliveira during review.
Fixes: 2255375c4d ("nir: add nir_mod_analysis & its tests")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
I forgot to copy this over in the LSC case. This meant we were missing
reorderability which meant that we were missing out on CSE.
fossil-db results on Battlemage:
Instrs: 231471427 -> 231363032 (-0.05%)
Send messages: 12077759 -> 12019628 (-0.48%)
Cycle count: 34058451430.0 -> 34057005552.0 (-0.00%); split: -0.01%, +0.00%
Spill count: 520387 -> 520135 (-0.05%)
Fill count: 470812 -> 470722 (-0.02%)
Max live registers: 72111834 -> 71873886 (-0.33%)
Totals from 2898 (0.37% of 788851) affected shaders:
Instrs: 1223836 -> 1115441 (-8.86%)
Send messages: 148633 -> 90502 (-39.11%)
Cycle count: 17732554.0 -> 16286676.0 (-8.15%); split: -10.65%, +2.49%
Spill count: 252 -> 0 (-inf%)
Fill count: 90 -> 0 (-inf%)
Max live registers: 491684 -> 253736 (-48.39%)
Non SSA regs after NIR: 255397 -> 255402 (+0.00%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
This now lowers IO intrinsics to URB intrinsics in a single step,
rather than modifying IO intrinsics to have non-standard meanings
temporarily. We are able to drop one "no_validate" flag.
For example, remap_patch_urb_offsets had added (vertex * stride) to
(offset) for per-vertex IO intrinsics, but left them as per-vertex
intrinsics. Now we just have an urb_offset() function to calculate
that when doing the lowering.
This also provides a central location for calculating URB offsets,
which we should be able to extend for other uses (per-view lowering,
mesh per-primitive lowering) in future patches.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
Fixes the following building error happening with clang:
FAILED: src/amd/vulkan/libvulkan_radeon.so.p/nir_radv_nir_rt_traversal_shader.c.o
...
../src/amd/vulkan/nir/radv_nir_rt_traversal_shader.c:1159:49: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_nir_rt_traversal_params params = {};
^
1 error generated.
Fixes: f692ac76 ("radv/rt: Use traversal vars for object origin/direction in ahit/isec")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38954>
GFX10 hangs when drawing from a 0-sized index buffer.
GFX6 has a HW bug when the index buffer address is 0.
Looking at VK CTS runs, GFX6 still triggers VM faults despite the
current mitigation, and it also tries to access memory when the
index buffer is zero sized. So it looks like GFX6 and GFX10
really have the same bug.
Let's share the mitigation between the two.
Use a zero-filled BO instead of the upload buffer.
This fixes VM faults on GFX6, and should speed up GFX10 a bit.
Note that the zero-filled BO is also going to be used for
other bug mitigations on GFX6-7.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958>
SGPR offset is not included in the bounds check
according to the ISA documentation of GFX6-7 and
indeed it can trigger VM faults on OOB access.
Note that ACO already doesn't use the SGPR offset
on GFX6-7 for buffer loads and stores. This commit
just does the same for buffer atomics.
This commit mitigates a ton of VM faults that are exposed by:
24e75fea4b
Fossil DB stats on Hawaii (GFX7):
Totals from 148 (0.24% of 61818) affected shaders:
Instrs: 324004 -> 327352 (+1.03%)
CodeSize: 1556468 -> 1514100 (-2.72%); split: -2.74%, +0.02%
Latency: 1271480 -> 1276894 (+0.43%)
InvThroughput: 396850 -> 397740 (+0.22%)
VClause: 6861 -> 6858 (-0.04%)
Copies: 34083 -> 37430 (+9.82%)
PreVGPRs: 5705 -> 5706 (+0.02%)
VALU: 147529 -> 150898 (+2.28%)
SALU: 98194 -> 98172 (-0.02%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958>
While the concept of "VRAM" is somewhat nebulous on SVGA devices this is
the value above which some performance degradation is likely to occur.
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38818>
We should not manipulate the session buffers at command recording time.
It shouldn't cause any problems as these initialized probability tables
are not modified by firmware, but moving these to bind time should be
safer and also faster if an application frequently RESETs.
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38926>
This block of code will be re-used in a future patch, also it reduces a bit the
size and complexity of lower_sampler_logical_send().
No changes in behavior intended here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38792>
This is now done in NIR, with the exception of exclusive min/max/and/or scans.
But those are not really useful, and if we ever come across them we can
optimize them in NIR using write_invocation_amd.
No Foz-DB changes on Navi21.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38902>
v3d_nir_lower_load_store_bitsize that uses nir_lower_mem_access_bit_sizes
already ensures that any writemask on store has consecutive bits set.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38921>
In case vk_color_attachment_location_state is in its default state, we
would end up with an identity mapping for color_map resulting in 8 RTs
being selected instead of what is really required.
This now use the rendering state attachment count to properly emit
SET_CT_SELECT.
Found while debugging MRT on
"dEQP-VK.shader_object.rendering.color_attachment_count_1.extra_attachment_after_1.none.none.same_color_formats.after.none.r16g16_sint_d32_sfloat_s8_uint"
and while comparing with the proprietary driver.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 84de6c12b2 ("nvk: Emit SET_CT_SELECT based on the dynamic color location map")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38946>
This is required for upcoming resource barrier work to implement HDC
flush's.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Gfx20+ doesn't do PIPELINE_SELECT, the assumption is that we can now
do any PIPE_CONTROL we want regardless of the pipeline mode.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
We experimentally found that some fixed functions have apparently be
hooked up to the L3. So we can drop a some flushing.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Now that we track the stages, it's not required to add those bits
anymore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Now that we have the stages accumulated, we can delay this at flushing
time.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
For a bunch of workarounds and special cases we want PIPE_CONTROL not
RESOURCE_BARRIER. We want emit_apply_pipe_flushes() to be mostly for
application barriers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
This simplifies usage of estimate_variable_latency a little in that we
can just use it directly in our max() expressions instead of guarding it
with an if.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38939>
This matches my current understanding of nir_opt_copy_prop, including that
nir_opt_copy_prop always replaces movs with vecN.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
Due to rebasing not recognizing it as a conflict, it ended up having
the same value as nir_io_assign_color_input_bases_after_all_other_inputs.
Fixes: 9a2f1be814 - nir: add FRAG_RESULT_DUAL_SRC_BLEND and an option to use it
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38916>
Keep only the metadata when initially parsing the files. Then re-load
the relevant archives again when necessary.
The old code was just keeping everything in memory, which was slow when
looking at a directory containing archives resulted from processing
a large fossil file.
Extra care is taken with `search` commands to ensure we don't keep
unnecessary contents around. At some point we could reorganize so
find_all is not used here, but for now this should be fine.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38228>
Instructions that calculate derivatives (whether implicitly or
explicitly) don't actually need helpers enabled as long as helpers were
enabled while their coordinates were calculated. We currently don't
track this and leave helpers enabled until the derivative instructions
themselves.
Improve this by adding a backwards data-flow analysis which tracks the
last instruction that wrote the coordinates so that helpers can be
disabled after that.
Totals from 38306 (23.26% of 164705) affected shaders:
Instrs: 19635952 -> 19647753 (+0.06%); split: -0.03%, +0.09%
CodeSize: 40465212 -> 40489860 (+0.06%); split: -0.03%, +0.09%
NOPs: 3493898 -> 3505699 (+0.34%); split: -0.16%, +0.49%
(ss)-stall: 1755983 -> 1755365 (-0.04%); split: -0.04%, +0.01%
(sy)-stall: 5345890 -> 5350570 (+0.09%); split: -0.03%, +0.12%
Last helper: 8754510 -> 6313744 (-27.88%); split: -27.89%, +0.01%
Cat0: 3821218 -> 3833019 (+0.31%); split: -0.14%, +0.45%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36410>
Some expectation updates in the piglit uprev come from results we already see
in the nightly runs. Updating xfiles with those results before the uprev
commit, shows better the origin of the changes.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38333>
With the default non-colored job status, all the listed non-colored job
statuses can be absorbed by the default behavior.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38919>
When GitLab adds new job statuses, we need to upgrade the coloring dictionary.
By having a default value for non-colored output, we handle future unknown
status, and avoid crashes.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38919>
Silences several meson warnings like:
```
../subprojects/equivalent-1.0.1/meson.build:9: WARNING: Project does not target a minimum version but uses feature introduced in '1.3.0': rust_abi arg in static_library.
```
The target of 1.7.0 was chosen since that's the minimal required meson version of the rust components in mesa anyway.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38814>
Rust 1.91 as well as clippy show various benign warnings in our dependencies. Silence them since we
can't really do much about them anyway and we want to enforce clippy complience via CI in the
future.
Matches cargo behavior, which also doesn't show warnings or clippy lints outside the workspace.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38814>
Although a crate may happen to be compatible with multiple editions, building with the wrong
edition can - generally speaking - lead to subtle bugs.
There are no known failures in this case, but better to match the official Cargo.toml anyway.
Fixes: e28ff81869 ("meson: Add pest rust dependencies")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38726>
Although a crate may happen to be compatible with multiple editions, building with the wrong
edition can - generally speaking - lead to subtle bugs.
There are no known failures in this case, but better to match the official Cargo.toml anyway.
Fixes: dde95fc039 ("meson,ci: Add the paste crate")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38726>
Although a crate may happen to be compatible with multiple editions, building with the wrong
edition can - generally speaking - lead to subtle bugs.
There are no known failures in this case, but better to match the official Cargo.toml anyway.
Fixes: 9e3e12e6a9 ("meson: Add indexmap rust dependencies")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38726>
This implies relying on all users of these pools to do the flushing
explicitly.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
This centralizes things so that we only ever write to the descriptor
buffer in write_desc_data(). get_desc_slot_ptr() now returns a const
void * so we don't write to it.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
We can used CPU cached mappings for our private BOs being updated by
the CPU. We make the printf BO an exception to avoid having to
invalidate it every time we check the queue status.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Those will be used as we progressively transition some of our
internal buffers to writeback CPU mappings.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
When allocating individual objects from a shared pool, we don't want
objects to share cachelines, otherwise cache maintenance operations on
individual objects might corrupt other objects.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
If the GPU is IO coherent, we expose one memory type that's both
host-coherent and host-cached. Otherwise we expose one type that's
host-uncached and host-coherent, and one that's host-cached and
host-noncoherent.
By default, we advertise <cached,non-coherent> before
<non-cached,coherent> because that's the combination providing the
best perfs in situations where the user knows how to deal with the
non-coherent nature of the GPU.
Unfortunately, the CTS has a few bugs (missing or incorrect flush/inval
calls) forcing us to re-order things. We might drop the flag at some
point (some fixes have been submitted, others are on their way).
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Stop hard-coding 1 and just advertise everything on the physical device.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
This is a little annoying. We probably don't want to call into the
kernel once for every Z slice or array layer we touch. But at the same
time if we can flush from userspace we don't want to flush/invalidate
more than necessary. So we have two sets of flushes, a more precise one
which we do based for userspace flushing and a coarse-grained one for
kernel flushing.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Flush deferred CPU sync ops so we can make CPU changes visible to the GPU.
This is currently a NOP because we haven't enabled cached mappings in
panvk yet, but we need to prepare for that before we progressively
switch each relevant buffer to use writeback CPU mappings.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
This makes it easier to say we want WB maps various places.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Now that we have it hooked up at the props level, we can filter
this flag out in panvk_device_adjust_bo_flags() and use this helper
when creating our uncached mempool.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
The buffer descriptor is copied to the descriptor set, and there's no
side-band data to allocate in GPU memory.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
pan_kmod_flush_bo_map_syncs() queues CPU-sync operations, and
pan_kmod_flush_bo_map_syncs_locked() ensures all queued
operations are flushed/executed. Those will be used when we start
adding support for CPU-cached mappings.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Fail early in pan_kmod_bo_mmap() if PAN_KMOD_BO_FLAG_NO_MMAP is set.
This saves us a user -> kernel round-trip, but most importantly, it
allows us to enforce NO_MMAP at the userspace level on BOs that the
kernel would otherwise accept to mmap() (mapping of imported BOs
requires extra DMA_BUF_IOCTL_SYNC calls we don't have).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Will be used to skip cache maintenance operations when the GPU is IO
coherent.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Will be needed to let the frontend know if it can use cached CPU-mappings,
and it allows us to extend the set of supported flags without introducing
a new field if we ever have to.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
We have a few hand-rolled instances of this which work well enough but it
gets more complicated as soon as we care about checking a major version
more than 1. Add a helper to make this more robust.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
The frontend is going to query the device props anyway, so let's just
query it at device creation time and store it in pan_kmod_dev::props.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
It's not good for performance, but it's possible to use for debugging.
Running single-wave GS workgroups could work around any LDS race conditions.
Setting the workgroup size to 64 reliably works around
GLCTS *primitive_counter*line failures, indicating streamout data
corruption with multi-wave GS workgroups.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38328>
This avoids u_upload_data_ref() when cb0 is bound. The u_upload_*_ref()
paths are still problematic to mix with uploaders that the front-end
uses with explicitly managed releasebufs, but this at least side-steps
the issue, and is a legit fix on it's own.
Cc: mesa-stable
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38896>
pipe_upload_constant_buffer0() was immediately releasing the
u_upload_alloc() releasebuf. But it is used in various call-
paths where the release needs to be deferred further.
Fixes crashes in firefox for any driver that uses the same
u_upload_mgr instance for pipe->const_uploader and
pipe->stream_uploader.
Fixes: b3133e250e ("gallium: add pipe_context::resource_release to eliminate buffer refcounting")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14309
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38896>
Triggering the rollover where the old upload buffer is released is a
good way to catch bugs with a releasebuf being dropped too soon (ie.
while the frontend still needs a reference).
This makes it easy to reproduce firefox crashes in any driver where
pipe->const_uploader == pipe->stream_uploader.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38896>
URB messages on Xe2 are LSC messages with FLAT addressing. We can
specify a S19 immediate offset in the extended message descriptor,
which should be more than adequate to hold any offsets we need.
We wrote the original URB code before implementing that, and never
doubled back to take advantage of it. But doing so can drop ADDs
near every URB access.
fossil-db results on Battlemage:
Totals:
Instrs: 232239759 -> 231432254 (-0.35%)
Cycle count: 34044435848.0 -> 34055507100.0 (+0.03%); split: -0.00%, +0.04%
Spill count: 520370 -> 520362 (-0.00%); split: -0.00%, +0.00%
Fill count: 470790 -> 470803 (+0.00%); split: -0.00%, +0.00%
Max live registers: 72111853 -> 72111369 (-0.00%); split: -0.00%, +0.00%
Totals from 227920 (28.89% of 788851) affected shaders:
Instrs: 59841897 -> 59034392 (-1.35%)
Cycle count: 683385208.0 -> 694456460.0 (+1.62%); split: -0.14%, +1.76%
Spill count: 17278 -> 17270 (-0.05%); split: -0.10%, +0.06%
Fill count: 17481 -> 17494 (+0.07%); split: -0.03%, +0.10%
Max live registers: 23052652 -> 23052168 (-0.00%); split: -0.00%, +0.00%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38899>
I recently converted urb->offset to be in bytes on Xe2, but neglected to
update these comments that still said OWord.
Fixes: 9ffae42975 ("brw: Store brw_urb_inst::offset in bytes on Xe2")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38899>
Without this commit, panfrost_batch_update_access receives a parameter
called "batch", and then it uses the same name while iterating for all
batches on the current context. This can be confusing and error-prone,
so let's rename the latter.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38908>
It seems placing the shader at the end has a negative performance
impact.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8ba197c9ef ("anv: Switch shaders to dedicated VMA allocator")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38900>
Before DG2, the value the HW gives us seems to be backwards, but
since DG2 this is supposed to be supported just fine.
However, due to Wa_22012766191, enable it only for Xe2 and up.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
But before ACM, we need to mis-report it to keep the CTS sane, as the
implementation of coarse pixel seems to have all sorts of wrongs in
older HW.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
GLSL defines gl_SampleMaskIn as :
"a fragment language that indicates the set of samples covered
by the primitive generating the fragment during multisample
rasterization"
when variable rate shading is enabled, a single invocation might cover
multiple samples. The lowering done in nir_lower_single_sampled() does
not account for that case, so add an option to selectively disable it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
This shows off how we don't need to pass an explicit size per CRB instance
in our non-growable CSes.
However, I don't like the additional indentation I did to make a CRB go
out of scope when I needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38762>
Loosely based on freedreno's, but simplified since a lot of overflow
handling was already there in tu_cs. It successfully catches issues of:
- Overflowing the CRB reservation
- Starting a new CRB with one in progress.
- Emitting a pkt4 while a CRB emit is in progress.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38762>
Current stack size is stored in layout.sw_stack_size, but the function
thats supposed to update it is comparing layout.total_size instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CC: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38898>
Replace the duplicated swapchain image detection pattern across all
Vulkan drivers with the new wsi_common_is_swapchain_image() helper.
Since the swapchain handle can be extracted from VkImageCreateInfo's
pNext chain inside wsi_common_create_swapchain_image(), remove the
now-redundant VkSwapchainKHR parameter from that function.
This removes the #ifdef guards for Android/WSI platforms from each
driver, as the helper now handles this uniformly.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38541>
Add a helper function to check if a VkImageCreateInfo represents a
swapchain image by looking for VkImageSwapchainCreateInfoKHR in the
pNext chain.
This consolidates the swapchain detection logic that is currently
duplicated across all Vulkan drivers, and handles the Android case
in one place.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38541>
This is causing an OOM, and weston gets killed, which causes all
the remaining jobs to fail after that point.
Until this is sorted out, disable THP for this specific job.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38912>
The MR adding support for ge7800 (!38211) was submitted while the MR
that disabled has_gs_rta_support (!38024) was under review. It seems
nobody noticed that we missed disabling it here as well.
Let's fix that up, so we don't try to use this when it's not expected to
work.
Fixes: c60232c0c5 ("pvr: temporarily disable gs_rta_support on all cores")
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Alessio Belle <alessio.belle@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38909>
We no longer need this. Now panvk_lower_nir() is simply about
descriptors, I/O, and calling into the back-end lowering code.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
It only applies to vertex shaders right now. Also, once we add
geometry, it won't really work anymore so we'll need to do something
else.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
There's no real point in the array of NIRs and the array of infos. It
just makes everything more of a headache inside the loop.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
Now that input attachment lowering is factored out, there's no reason to
be passing the whole shader variant around here. This both makes things
a lot more clear and gives us more flexibility about when we call it,
allowing us to potentially call it once per-shader instead of once
per-variant.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
This is what everybody outside the compiler actually wants. This also
gives us the opportunity to centralize our choice to use LD_VAR_BUF[_IMM]
to one place in the compiler and stop sprinkling it around all over.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
It really is per-stage and there's nothing we're really saving ourselves
by trying to do it in a common place.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
This is more clear in the presence of the lowering pass and will prevent
a name collision in the next commmit.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
Input attachment lowering really has nothing to do with descriptor
lowering. It just needs to be called first so we still have access to
the variables.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
We're about to move this to another file which won't have anything to do
with descriptor sets. This makes the one potentially functional change
easily bisectable and then the next commit is just moving code.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
This is needed for intershader optimization and GS lowering.
We now pass all NIR variants with pan_compile_inputs to
panvk_compile_shader and handle sysvals/push consts lowering in there.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Co-authored-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
We were running this in the preprocess step and then trusting that it
would clean up everything before we got to the back-end. However, we
were running the entire optimization loop in between as well as drivers
potentially adding stuff (since panvk has it's own passes after
postprocess). Instead, this should be one of the last things run, right
before we go into the back-end.
Cc: mesa-stable
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
The comments pretty explicitly say that they assume lowered UBOs and
SSBOs so preprocess is the wrong place for them.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
Other resource variables like UBOs and SSBOs are still useful for
debugging and not harming anything. Also, some Vulkan-focused passes
look up variables from the otherwise detached resource intrinsics and
use them for things. We really want to keep them around.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38821>
cat2_may_use_scalar_alu() was incorrect because the instruction could
use an indirectly-accessed const where a0.x (i.e. the offset) is
non-uniform. Fortunately, we already know whether this is the case,
because the original instruction would then write a non-shared GPR.
Also, the restrictions for scalar ALU are the same regardless of whether
we write up0.x or a shared GPR, and vice versa the restrictions for
normal cat2 are the same regardless of whether we write p0.x or a
non-shared GPR, so it should always be safe to write p0.x if non-shared
and up0.x if shared. So, just do that.
Fixes: 2a8c5ebc77 ("ir3: enable scalar predicates")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38895>
This is about writes to shared regs, not GPR (as early-preamble can only
use shared regs). It's a pretty hypothetical case, but might as well
get it correct.
Fixes: 189e494249 ("ir3: Add (sy) before end of preamble when necessary")
Reported-by: Job Noorman <jnoorman@igalia.com>
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38868>
This calculation needs to happen in the same loop as the
geometry/triangle id calculations in case the selected invocation is
before all invocations that were already selected.
Totals from 1269 (15.10% of 8406) affected BVHs:
compacted_size: 137581888 -> 137606464 (+0.02%); split: -0.08%, +0.10%
sah: 6496048424 -> 6496048450 (+0.00%); split: -0.00%, +0.00%
primitive_node_count: 604384 -> 604656 (+0.05%); split: -0.14%, +0.19%
Fixes: c18a7d0 ("radv: Emit compressed primitive nodes on GFX12")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38462>
We already have a function called pvr_render_targets_init() in
pvr_device.c. Having two symbols with the same name, even though they
both are static only leads to confusion.
So let's rename one of them to clear things up.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38888>
The only HW-definition we depend on here is the occlusion query minimum
alignment. But this value doesn't differ per arch, so let's just make it
a global define for now.
This should probably have a better name and location. But AFAICT, this
applies to a lot in this header. So let's untangle this down the road
instead.
Acked-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38832>
This just pre-emptively fixes up some includes that will bite us in the
future if not fixed.
Acked-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38832>
These are only implemented for Rogue so far, which is fine because
that's all we currently support. Once we add support for more
generations, these helpers need updating.
Acked-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38832>
This takes us from one "supported"-bit to per-bind flags for each of
vertex-buffer, sampler-view, render-target, depth-stencil and
storage-image bindings.
While we chould have stored extra bits for storage storage/uniform
texel-buffers as well, that seems a bit excessive to me.
Acked-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38832>
The PBE details here aren't relevant for all GPUs, so let's split them
out to a separate table here.
Acked-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38832>
The VkFormat is the key into the format-table, so we always know from a
higher level up what VkFormat we're talking about. So let's instead pass
VkFormat around, and look up pvr_format from the format-table later on
instead.
This leads to a few more lookups, but those are relatively cheap, and
doesn't happen in critical code-paths.
Acked-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38832>
On the assumption that nobody will use a sample id greater than the sample
count, have loop unrolling guess based on the driver's max sample count.
This unrolls a simple resolve shader with a uniform max samples on ir3 to:
value = vec4(0);
if (max_samples > 0) {
value += txf_ms(coord, 0);
if (max_samples > 1 {
value += txf_ms(coord, 1);
if (max_samples > 2){
value += txf_ms(coord, 2);
if (max_samples > 3) {
value += txf_ms(coord, 3);
for (i = 4; i < max_samples; i++)
value += txf_ms(coord, i);
}
}
}
}
...
This is only worth a 1% win on our microbenchmark as-is, but if we could
flatten those ifs out and pull the fadds out to the end, avoiding syncs
per load would be a big win. This seems like a first step.
I've taken a shot at updating drivers to set the value, and tried to leave
notes in places that drivers might update, and want to follow up with
updating the compiler option.
This affects over half the DX11 apps in shader-db-private.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38585>
In preparation for dynamic rendering and for removing renderpasses
move relevant code so it doesn't need to move again.
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38744>
In preparation for dynamic rendering make the code in start of renders
clear more robust and generic so not to depend on render passes when
it doesn't have / need to.
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38744>
The assert here was too strict and dynamic rendering tests do exercise this
part. The correct fix is to either re-enable RTA support or handle properly
multi-layered emulation.
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38744>
This change also preserves the functionality of the old renderpass path.
Signed-off-by: Ella Stanforth <ella@igalia.com>
Co-authored-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38744>
Metal does not seem to respect memory coherency for threads. Workaround 6
enforces device coherency for global loads/stores even if it should not
be needed.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38847>
We want to start testing new features that made it into drm-misc-next
recently, so switch to pan{frost,thor} specific kernel tags to get that
tested on g{52,610,925}.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38890>
Add the new file so panfrost CI jobs are properly triggered.
Fixes: c793f612fc ("ci/panfrost: Split inherit definitions into -inc")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38890>
In 0ca870c6f3 I forgot to fill the bind_map::dynamic_descriptors
array... Duh!
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0ca870c6f3 ("anv: fix broken ray tracing dynamic descriptors")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38893>
If invalid childrens don't consume space in memory, we don't have to
increment the block count. HW unit just look at the bounding boxes and
reject them in intersection test.
Also, this patch handles invalid children type encoding.
Fixes: 198537039a ("anv/rt: reduce writes to block_incr_and_start_prim")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38717>
We forgot to call tu_fill_render_pass_state when resuming because it was
mixed in with emitting commands for the start of the subpass. Fix that
by pulling it out. This adds some duplication, but I think it's better
than mixing command emission and CPU-side state setup in the same
function.
Fixes: cb0f414b2a ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38873>
The previous implementation seems to predate nir_instr_clone() and
duplicates a lot of the deref cloning code. This also makes the pass
preserve deref->arr.in_bounds correctly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38856>
Switched to the new VMA allocator that provides explicit GPU VA
control via util_vma_heap.
This is architectural preparation for ray tracing capture/replay,
which requires the ability to reserve and allocate shaders at specific
VAs. The state pool's free-list design makes VA reservation difficult
to add, while the new chunk allocator is designed for explicit VA
management from the ground up.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38869>
Introduce a VMA-first chunk allocator for shader binaries to eventually
replace the anv_state_pool-based implementation. This allocator works
directly with GPU virtual addresses through util_vma_heap, making the
virtual address space an explicit resource managed by ANV.
No functional change in this commit.
v2(Michael Cheng): Use existing instruction state pool anv_va_range
v3(Lionel): Simplify allocator
Signed-off-by: default avatarMichael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38869>
The shader assembly was only available when not hitting the cache.
Additionally the serialized shader code was also the relocated variant
which meant that it could differ from one run to the next. Instead
serialize the unrelocated code produced by the compiler.
With this change we now decode the copy of the ISA we have on the
host.
NIR dumps are only available for shaders not loaded from the cache
(much like the other drivers).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8f4c2bd566 ("anv: add runtime shader statistic support")
Acked-by: Michael Cheng <michael.cheng@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38869>
This is potentially nicer for some drivers. AMD drivers will use it.
mesa_frag_result_get_color_index will be used often.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38604>
We used to maximize threads_per_task, but that is ideal when the system
has a single gpu client. When there are multiple gpu clients, we want
smaller threads_per_task such that cores can be more fairly shared among
the clients.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37988>
task_axis selects the dim of the global workgroup, not the dim of the
local workgroup.
v2: fix assert for dEQP-VK.compute.pipeline.basic.empty_workgroup*
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org> (v1)
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37988>
Set compute_ep_limit to max_tasks_per_core on v12+. It is generally a
good idea to queue as many tasks as possible to better utilize the
cores.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37988>
Since v12, RUN_COMPUTE.ep_limit specifies the size of the compute task
queue. RUN_COMPUTE stalls when there are more tasks in the queue than
the specified ep_limit.
Sensible values are 0 (treated as 4), 4, or 16 (max_tasks_per_core).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37988>
Coverity notices that we read past the end of the array we're pointing
to, which is intentional, we want to copy additional members from the
source struct into the target pointer. As such, cast to a `void *`,
since this will make Coverity happy.
CID: 1649589
Fixes: 314de7af06 ("anv: Initial support for VP9 decoding")
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38438>
Instead of manually applying the patch, backport the version that landed
in main, which requires a cmake argument to enable.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38071>
Unlike what the comment said here, V9 does in fact support a single
float-format, so let's allow that.
But also, V10 and later supports FP16 formats, but this incorrect check
made that not work. Enable the FP16 formats also while we're at it. We
don't need any additional checks here, because the 16-bit unorm formats
were also added in V10, so util_format_any_to_unorm() does the right
thing here.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38848>
There's no reason why the S8_UINT check should be written in a different
way than the other checks here; let's make this consistent.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38848>
nir_lower_clip_cull_distance_array_vars was sneakily updating
shader_info::clip/cull_distance_array_size.
This moves the gathering into a new function
nir_gather_clip_cull_distance_sizes_from_vars.
v2: remove assertions that prevented nir_lower_clip_cull_distance_array_vars
from being used with non-compact arrays
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38465>
Metal will prematurely discard fragments with side effects even if those
side effects happen before the discard. Work around this by making said
discards "optional".
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38741>
Consolidate importing paths by using the new importing
function so that compressed buffer can be imported
correctly.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36825>
The helper gets tiling and modifier in a single step.
The later will be used in the coming changes.
Copy the changes introduced in
cf5c294df4.
Suggested-by: Juston Li <justonli@google.com>
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36825>
As discussed in the reviews of cf5c294df4,
the 'plane' in this context means plane of a drm
modifier, so it makes sense to just use the new ISL
macro once it is available.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36825>
The new added function will be invoked on several paths
of importing Android native and hardware buffers.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36825>
This test was hitting an assert because gl_PrimitiveID is an input to the
FS that does not come from the VS which should be handled similar to
TGSI_SEMANTIC_FACE.
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38834>
I was wondering where this was disappearing to.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38789>
I'm assuming this based off the `if` branch above, after reading the
code for bit that Coverity pointed out in that branch. It doesn't look
correct to start at the base pointer, which will be 0 initialized and
has 52 bits of zero padding, while the default values are 255.
Fixes: 314de7af06 ("anv: Initial support for VP9 decoding")
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38437>
Coverity notices we're reading off the end of the array here, which is
true. We also intend to do that because we want to read the next field as
well. Cast to a `void *` to help Coverity out.
CID: 1649593
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38437>
Use the syntactic sugar for heap operations on the vertex/tiler queue.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38826>
The spec requires FINISH_FRAGMENT calls to be issue in the exact same
order FINISH_TILING calls were, otherwise heap chunks that are still
used by in-flight IDVS/FRAGMENT jobs might be returned to the heap
leading to a UAF situation.
In order to guarantee FINISH_FRAGMENT ordering, we request the new
iterator scoreboard (the one to be used on the next RUN_FRAGMENT)
just before issuing our FINISH_FRAGMENT, and we select this next
iterator scoreboard as our signal scoreboard. This guarantees that the
next FINISH_FRAGMENT won't trigger until both the next RUN_FRAGMENT and
the current FINISH_FRAGMENT are done.
This new approach forces us to revisit the sequencing of our
issue_fragment() logic. Previously we were requesting a new scoreboard
before RUN_FRAGMENT, but now we're assuming the scoreboard has already
been claimed, either by the subqueue init logic (simple static assignment,
since all scoreboards are free at this time) or by the previous
issue_fragment() call.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38826>
Right now all we do is get the next available scoreboard, and set
it as the current iterator scoreboard, but the fragment queue is
about require something more involved to fix FINISH_FRAGMENT ordering
issue.
Provide a cs_iter_sb_update() block where everything between the
selection of a new scoreboard and the transition to this new
scorevoard is customizable. Implement cs_next_iter_sb() as a dummy
wrapper around this new construct.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38826>
We are about to add sequences where we will do
wait(cur_iter_sb),signal(next_iter_sb)
in some deferred operations. When that happens, the wait mask can't have
the signal sb set, otherwise the behavior is undefined.
On v11+, we have enough scoreboards to disallow the currently bound
endpoint SB from being returned on a NEXT_SB_ENTRY, so let's do that all
the time.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38826>
Knowing which defer mode is used is quite useful.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38826>
The name is misleading since it's only setting the endpoint scoreboard
on v11+. On v10, we shouldn't assume the "other" SB is always zero,
since we're passed the SB slot to use at init time (ls_sb_slot).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38826>
Source assignment is mixed up in some of them. While at it, make
source argument names consistent with the descriptor field names.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38826>
Fix cs_extract_tuple() and implement cs_extract{32,64}() as wrappers
around cs_extract_tuple().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38826>
These were accidental when I split up the large article in to multiple
documents. Let's fix that up, so we don't end up repeating this for
future documents.
Fixes: 8248cc0bf4 ("docs/panfrost: move details to separate articles")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38738>
Unity's Vulkan backend used by Spilled! requires the vk_dont_care_as_load
workaround to achieve correct rendering. Observed on Turnip, in GMEM mode.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38863>
Since Metal doesn't pass clip distance into the fragment shader, we have to
do it ourselves. The CLIP_DIST0/1 varying slots are used to represent the
user-defined varyings we use to pass them from vertex to fragment and
a new intrinsic is added to represent the write to the built-in
clip_distance variable. Since the CLIP_DIST0/1 varying slots are not affected
by opt_varyings, there can be potential interface mismatches so the machinery
in msl_iomap.c is refactored to allow them to be output as a series of scalars
rather than vectors.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38839>
Add support for A840 and X2-85. Despite slice count, differences in
memory bus and clks, they are architecturally similar from the PoV of
the UMD.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
Enable gen8 support. Sysmem, gmem, and binning work. DEQP gles2/3/31
tests are passing.
LRZ is not supported yet, and will follow later.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
It was missed that this register changed for larger bin sizes. Use a
common bitset for all related gen8 regs, and change the field names for
earlier gens to match so the generated register packers dtrt.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
b13 encodes alternate opcode meanings for new instructions with
otherwise the same encoding (ie. src precision implied by opc).
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
We need to ensure GPR writes completes before the end of the preamble
to avoid writes landing after another preamble has already started.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
There are some new events, but existing ones look the same. So I think,
at least for now, we can keep the same table for gen7 and gen8.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
The FC RAM is the same size as gen7. Tbd if the metadata is also the
same. So far we aren't enabling LRZ on gen8 yet, but need to do
something to make the compiler happy, so treat gen8 as it was the same
as gen7.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
The c++ syntax for this isn't pretty. But it is something we need in a
few places, so add some macros to hopefully make things easier on the
eyes.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
We shouldn't be trying to calculate an offset from the bindless base reg
we looked up, as it already contains the array index.
We could lookup the index 0 reg offset, and calculate an offset from
there, but that breaks when the registers are not consecutive.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
Possibly overkill currently, if we only preempt on bin boundaries. But
might as well be complete in case that ever changes on the kernel side.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
We don't need to re-emit it each tile. But we do need to setup the
preemption to restore us to GMEM mode in case we get preempted on a tile
boundary.
While we are at it, rename the function to something more sensible.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
Now that we've corrected the event name (ie. CCU_RESOLVE to trigger the
resolve/unresolve engine) the helper name made less sense. And it
doesn't really add any value. So drop it.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
For a6xx+ devcoredump+crashdec does a good job in finding the CP
position on a crash. I don't really use the scratch regs for this
purpose anymore. So lets just drop this.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
It is just normal reg writes, we shouldn't handle it specially or
surpress summary state if enabled. In summary mode we shouldn't print
each individual register write, but just show the values at draw/etc
time.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
Cleanup variants ranges, and in a couple cases rename events to match
docs. In some cases events are marked valid thru A5XX, simply to
indicate that they were removed at some unknown point before A6XX.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
With gen8 we need to decode more sections before we have enough CP reg
vals to decode cmdstream. So simplify things by just moving it to the
end.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
We used load_frag_coord_unscaled_ir3 for loading the fragment coord for
input attachments in GMEM, where the normal scaling for gl_FragCoord
shouldn't be used. However with custom resolve a different scaling will
apply to attachments in GMEM. Separate "unscaled" from "gmem" and rename
the NIR options, in preparation for this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>
FragCoord seems to have the offset applied to it, so we don't need to
subtract it out. Fixes upcoming test
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.custom_resolve.monolithic.fdm_nonsubsampled_multiview_with_offset.
Fixes: b34b089ca1 ("tu: Use GRAS bin offset registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>
Similar to when patching load/store coordinates, we have to convert the
layer to the view, splatting view 0 to all layers when there is more
than 1 layer and FDM-per-layer is not enabled.
Fixes upcoming new test
dEQP-VK.renderpasses.*.custom_resolve.*.fdm_nonsubsampled_multilayer_with_offset.
Fixes: b34b089ca1 ("tu: Use GRAS bin offset registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>
This suppresses below compile warnings:
- warning: variable 'idx' is used uninitialized whenever 'if' condition
is false [-Wsometimes-uninitialized]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38835>
This is a post-RA pass that tracks registers that are preserved by the
ABI, but clobbered by shader code. The pass inserts scratch spills and
reloads in appropriate locations to ensure the register values at the
end of the shader are the same as they were at the start.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38281>
We already did the work of transforming the ray data, no need to do it
multiple times.
Should theoretically be a lot better. However, none of the fossils
appear to use object-space ray data in anyhit/intersection shaders. :(
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38809>
This splits up radv_nir_rt_shader.c into several parts.
The first part is all ray traversal lowering for RT pipelines, located
at radv_nir_rt_traversal_shader.c. It implements building the traversal
loop, including inlined any-hit/intersection shaders (optionally as a
completely separate shader).
The second part is lowering for individual RT stages (right now,
monolithic vs. CPS-style separate compilation). Each lowering technique
lives in its own file (radv_nir_rt_stage_{monolithic,cps}.c).
Code shared between RT lowering techniques (shader inlining helpers and
storage lowering passes) gets moved into radv_nir_rt_stage_common.c.
One header, radv_nir_rt_stage.h, is the public interface for RT pipeline
stage lowering. Functions exposed to users (really just
radv_pipeline_rt.c) go there. The header for internal shared helpers is
radv_nir_rt_stage_common.c.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38809>
shader_realtime_clock requires a newer kernel version in order to enable
GLB_COUNTER_EN this change adds a check on this kernel functionality.
Remove GL_EXT_shader_realtime_clock from extensions as this now depends
on kernel version.
Fixes: e9c2c324 ("panvk: enable VK_KHR_shader_clock")
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37915>
Several web sites block clients with "Apple" in the WebGL renderer
string if the reported OS is not one of Apple's.
This check seems to implemented via a 3rd party product which is slowly
rolled out over more web sites. Instead of playing whack-a-mole with
web sites in multiple browsers override the OpenGL renderer in mesa for
known browsers.
Backport-to: 25.3
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38845>
This change improves evergreen_apply_scissor_bug_workaround().
It provides a fully functional workaround for cayman.
Note: this was the last functionality which was working
properly on evergreen but not on cayman.
Here are the tests fixed:
spec/arb_framebuffer_no_attachments/arb_framebuffer_no_attachments-atomic/glscissor: fail pass
spec/arb_framebuffer_no_attachments/arb_framebuffer_no_attachments-query/glscissor: fail pass
deqp-gles31/functional/fbo/no_attachments/interaction/1x1ms0_default_2048x2048ms4: fail pass
deqp-gles31/functional/fbo/no_attachments/npot_size/1x1: fail pass
Fixes: 87a5b07f90 ("gallium/radeon: add R600/Evergreen/Cayman support to common viewport code")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38743>
Don't sink alu that uses ballot(true), as that can a local system value
and moving the alu then requires a new mov in the old location.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38829>
bit_size <= 32 does not actually guarantee a single component, which
nir_src_as_uint() requires. We could just check num_components == 1 but
it's easy enough to support any vector that fits in 32 bits.
Cc: mesa-stable
Reviewed-by: Romaric Jodin <rjodin@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38772>
Before every emit_vertex(stream_id = n), we would insert stores for all
outputs, including outputs that are not meant for that stream.
Those stores would end up having no effect while potentially reducing
performance.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38100>
For 64K x 64K textures.
The regular expressions used to find potentially overflowing multiplications:
(width.*\*.*height|height.*\*.*width|stride.*\*.*height|height.*\*.*stride|row.*\*.*height|height.*\*.*row)
(height.*\*.*depth|depth.*\*.*height|size.*\*.*depth|depth.*\*.*size)
A few things were authored by Pierre-Eric using static analysis to
detect potential overflows.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38587>
While v7+ checks this on HW, older architectures depend on SW to do it.
Fixes: c43882ad54 ("panfrost: Allow pixels using discard to be killed")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38522>
Setting this flag is valid as long as the fragment shader doesn't have
any side effects on v7+. v6 requires an extra check for earlyzs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38522>
Running nop sched before opt_jump runs into issues because at that
point, we might have branches like this:
getone x
jump y
which becomes this after nop sched:
(rpt5)nop
getone x
(rpt5)nop
jump y
and then opt_jump may remove the jump leaving the block without a
terminator. This in turn causes ir3_calc_reconvergence to calculate (jp)
incorrectly.
Sync sched is fine but let's keep the two together.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38833>
Fix for deqp:
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38827>
Fix for deqp:
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38827>
The min/max funcs are designed to operate solely on the source and
destination colors directly, without any scaling or multiplication by a
factor.
Test: dEQP-GLES3.functional.fragment_ops.blend.* pass with enabled FPK
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38824>
This fixes a biases in texture linear sampling,
which can be very noticeable when strong lighting
is applied on mipmapped textures generated at
runtime using successive linear blitting.
More bits aren't actually needed for lerp and the
intrinsic rounding is wrong, so it is removed in
favour of a correct uniform codegen.
Reviewed-by: Jose Fonseca <jose.fonseca@broadcom.com>
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37986>
The accumulation of biased results leads to images
that are slighly too bright or dark. This is particularly
noticeable in specular highlights, which magnify errors
a lot.
Reviewed-by: Jose Fonseca <jose.fonseca@broadcom.com>
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37986>
When alpha to coverage is enabled we cannot allow the fragment shader to
be skipped, because the calculated alpha result is essentially a side
effect (it can cause some samples to be discarded).
This brings the OpenGL driver in line with panvk, which already has this
check in its version of fs_required().
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38794>
64-bit reads are atomic (except 32-bit arch), but + and - require locking
or atomics, but 32-bit arch doesn't support 64-bit atomics.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38798>
This change add support VkDrmFormatModifierPropertiesList2EXT to
lavapipe driver. Previously VkDrmFormatModifierPropertiesListExt is
already supported and returns a list with one element but
VkDrmFormatModifierPropertiesList2EXT returns a empty list.
A lack of VkDrmFormatModifierPropertiesList2EXT becomes an issue
especially when the Vulkan Validation layers is enabled, it internally
uses VkDrmFormatModifierPropertiesList2EXT to confirm the
drmFormatModifierTilingFeatures is compatible to VkImageUsageFlagBits.
Without this patch it detects that the image is not compatible and
outputs warnings.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14414
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38822>
A vertex shader performs OOB UBO reads causing vertex corruption.
Disable UBO to const lowering to force bounds checking which fixes the
corruption.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38781>
Sometimes you fumble an answer, and would like to not restart from the
beginning (or just want to see the behavior of the script late in the run
if you're debugging it). Pass in the last bad range, and you can keep
going.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38760>
This allow us to reduce a bit the size of iris_upload_dirty_render_state().
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38790>
Moving all 3 for loops into a single one and populating binding tables before
emit it, the last part has no side effect but when reading the code it makes
more sense to populate binding table then emit it.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38790>
This allow us to reduce a bit the size of iris_upload_dirty_render_state() and
in next patches some improvements to this new function will be done.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38790>
When lowering IO, we would always lower UBOs and SSBOs with robustness
enabled. This change only applies robustness if requested by the user.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38784>
We've looked at it again, and concluded that there's just no way that LDG
crossing a boundary could be OK in the components-are-read case but bad in
the components-are-not-read case, and this must have been papering over
something else.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38704>
struct drm_amdgpu_info_uq_fw_areas is renamed to drm_amdgpu_info_uq_metadata.
query infor structure for compute and sdma is added.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38647>
This seems to work just fine, except on NAVI21, NAVI22 and VANGOGH. It
might be the same issue as has_vrs_ds_export_bug but it's not documented
anywhere. That being said, the CTS tests that fail don't even export
depth or stencil from fragment shaders.
Do not enable this feature on these GPUs to be conservative.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38767>
Everything is in place to support this feature already.
Passes:
dEQP-VK.pipeline.pipeline_library.multisample.variable_rate.*
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38693>
When running w/o KHR_display enabled, the display device node won't be
initlaized, and the information dump routine will terminate because of
this.
Fix this by not bailing out and not trying to print display device
compatible strings.
Fixes: 8825c91dcb ("pvr: Make display node optional")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38787>
Remove the early call to nir_lower_sysvals_to_varyings. This guarantees
that we only call it after lower_system_values has replaced the
variables with intrinsics. It turns out that a sequence like this
doesn't work:
1. shader declares Layer builtin as a signed integer
2. nir_lower_sysvals_to_varyings turns it into a varying (but keeps the
signed integer type)
3. nir_lower_input_attachments (or some other pass) creates a
nir_load_layer_id intrinsic.
4. nir_lower_sysvals_to_varyings is called again, and when creating the
varying variable it passes an unsigned type to
nir_get_variable_with_location(), which asserts because there is
already a signed integer variable.
By making lower_sysvals_to_varyings happen late for layer, we can avoid
this happening by lowering away the variable before (2).
Fixes: 5bbbf5cf9b ("tu: Set use_layer_id_sysval for nir_lower_input_attachments")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38793>
glslang apparently emits ViewID as a signed integer. If other passes
generate a sysval then lower_sysvals_to_varyings() tries to create an
variable with unsigned type and asserts when the preexisting varying's
type doesn't match the type it expects. Just make ViewID a sysval and
lower it in lower_sysvals_to_varyings() to avoid this and simplify the
SPIR-V parser.
Fixes: 5bbbf5cf9b ("tu: Set use_layer_id_sysval for nir_lower_input_attachments")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38793>
It existed entirey to save us a switch statement. It's very unlikely
that's worth pulling GENX API command stream stuff into the compiler.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38788>
We multiply by 16 correctly but then drop that in the case where vbase
is non-zero. We typically lower FS input indirects so we don't see this
often but there are a few cases where they still sneak through.
Fixes: 0fcddd4d2c ("pan/bi: Rework varying linking on Valhall")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38788>
It's duplicated in panvk_vX_shader.c, which is where it's actually used.
This version is a dead artifact of the past.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38788>
In order to decode blend shaders, we need to pass in the fragment binary
address because the blend pointer uses the same top 32 bits as the
fragment binary.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38788>
This utilizes the RGBX format faking logic from e8cd7a30 to enable
PIPE_FORMAT_R10G10B10X2_UNORM renderer support using swizzling.
This format is needed for better HDR rendering support in the iris driver, to
support the Proton / Wine DXGI implementation, which requires an RGBA ordered
renderer for its Vulkan implementation. This in turn requires the Wayland
display to support both alpha and opaque formats. The check currently fails,
since only PIPE_FORMAT_R10G10B10A2_UNORM is exposed when Gallium (iris) is
the DRI Wayland renderer.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38616>
Video post-processing now supports explicit color standards.
Applications can pass different combinations of primaries,
transfer functions, and color matrices.
This is used to ensure correct mapping.
Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38764>
It's the number of elements. RADV exposes VK_FORMAT_R64_{UINT,SINT}
formats for texel buffers, so the maximum is 1<<29 to fit in the
32-bit bounds checking.
Fixes KHR-GL46.texture_buffer_size_clamping.* with Zink and new VKCTS
dEQP-VK.texture.misc.max_elements.*.
Cc: mesa-stable.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38140>
It's already called by st_nir_opts, so it shouldn't be necessary to do
it again here.
This is only compile tested. I have not collected any shader-db or
fossil-db data.
Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
This is only compile tested. I have not collected any shader-db or
fossil-db data.
v2: Drop the calls to nir_opt_constant_folding. The builder in
nir_lower_flrp will already take care of this.
v3: NIR_PASS_V is gone. Noticed by Marge.
Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
No shader-db changes on any Intel platform. Both Iris and Crocus use
st_nir_opts, which calls nir_lower_flrp before brw_nir_optimize. The
call still needs to exist for hasvk, but I don't collect fossil-db data
for hasvk.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
No shader-db or fossil-db changes on any Intel platform.
v2: Return early if lowering_mask is zero. If the first call to
nir_lower_flrp has a lowering_mask of zero, later calls with non-zero
masks would not do any lowering. lp_bld_nir.c has this issue.
Suggested-by: Alyssa
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
Prevents failures with fp16 in lavapipe and Zink on lavapipe when
"nir/lower_flrp: Check and set shader_info::flrp_lowered" is
applied. Lowering with an incomplete mask on the first call to
nir_lower_flrp will prevent later calls (with the complete mask) from
doing anything.
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
There is nothing IR about it. It's really the compiler interface file,
so it should all go in pan_compiler.h.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38753>
While we're at it, rename pan_lower_* to pan_nir_lower_* for consistency
with everything else. The intention is that eventually this will be a
private header and drivers will stop calling these passes themselves.
However, that is a long way off so for now we'll just move them to the
sensibly named place. Notably, pan_preprocess/optimize/postprocess_nir
stay in pan_compiler.h because those are intended to be external APIs
indefinitely.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38753>
This is going to be where we put the compiler interface. For now,
disassembly wrappers are as good a place to start as any.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38753>
The only thing that only pulls in bifrost is our CL compiler for
pre-builts. Everything that uses pan_shader.h pulls it in because
that's header-only and calls into both back-ends anyway. Some day
we might want to figure out how to properly dead-code the midgard
compiler for Vulkan but today is not that day.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38753>
src/panfrost/util is all compiler stuff, we just put it in util/ so it
could be used by both midgard and bifrost. We also rename all the files
to have a pan_nir prefix. We'll rename the functions later.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38753>
It was this way for a while but then someone decided that we should try
to hide midgard and get it off the main path. However, we still try to
present a unified interface to the GL driver. The combination of these
two things means that those common compiler interfaces have to live
outside of panfrost/compiler which makes everything more awkward than
it needs to be. We either need to move it back into src/gallium and
make abstracting across the two the GL driver's problem (which breaks
other tooling) or need to put both under src/panfrost/compiler. The
midgard compiler can just as easily bitrot in panfrost/compiler/midgard.
The first step is moving the bifrost compiler.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38753>
When building with linux_glibc_x86_64 in AOSP, this is
observed.
src/util/u_debug_stack.h:66:4: error: unknown type name 'uint64_t'
66 | uint64_t start_ip;
| ^
Adding <stdint.h> fixes it.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38644>
Rather than having two independent copies of the same header in two
different places, place a single copy in nouveau/headers. The two copies
did differ, but the one in nouveau/winsys matches the one in the kernel
and the only changes in the one in nvc0 are some removed defines and
changed whitespace.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38761>
WSI extensions are implemented on the Venus driver side, layering on top
of external memory extensions. So we have to filter out WSI extensions
during device creation, otherwise the headless driver on the venus
renderer side can fail the device creation with _EXTENSION_NOT_PRESENT.
Fixes: 11195eb8de ("vulkan: Add KHR_swapchain_maintenance1 promotions.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38763>
This enhances the DescriptorType enum to include the
memory handle type (DMABUF vs SHM) alongside the size,
allowing consumers to differentiate between DMA-BUF and
shared memory file descriptors without redundant code.
Signed-off-by: Dorinda Bassey <dbassey@redhat.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38677>
The "REF" comparision is a timestamp comparision with rollover.
Our current use of CP_COND_EXEC is still fine, but we should be more
clear about the comparision operation.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38755>
Shader objects are by definition I think independents.
But implementation like Anv would like to optimize dynamic descriptors
if possible. It's possible if the sets are not independent.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38678>
Dynamic descriptors are mapped an array of offsets provided through
vkCmdBindDescriptorSets*() commands.
When pipelines are compiled with independent sets layouts, the
implementation might have to do additional runtime calculation to
figure out what offset in the contiguous array maps to what dynamic
descriptor in the pipeline layout.
For graphics pipelines you can always compute that information when
binding the shaders. There is always a limited amount of shaders (5
max).
For ray tracing pipelines, there could be lots of shaders to process
at every pipeline binding call. Besides there is no interface from the
runtime to the driver to list all the shaders used at the moment.
So do that tracking in the runtime and pass the information down to
the driver through the cmd_set_rt_state() vfunc.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 69a04151db ("vulkan/runtime: add ray tracing pipeline support")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38678>
`comps <= (1 << subword_shift)` cannot be guarantee.
Here is an example:
```
8x2 %27 = @load_ssbo (%26 (0x1000001), %4) (access=readonly|reorderable, align_mul=2, align_offset=0, offset_shift=0)
8x2 %32 = ior %25, %31
32 %34 = ult32 %33 (0x7), %12
8x2 %35 = b32csel %34.xx, %27, %32
```
When processing `%34.xx` in `bi_emit_alu` (for `instr->src[0]`),
`comps` is computed from the instr definition (`%35`), but
`subword_shift` from the src bitsize.
In that case comps is greater than `1 << subword_shift`, but this is
supported by `bi_alu_src_index`.
This example is extracted from `dEQP-VK.spirv_assembly.type.vec2.i8.bit_field_insert_offset16_count16_comp`
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36638>
When v4i8 instruction are using to compute a v2i8, it puts the 2
result values in b0 & b2, thus we need to swizzle the destination to
have them in b0 & b1 as expected by the consumer of the v2i8 produced.
example: dEQP-VK.spirv_assembly.type.vec2.i8.mul_frag
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36638>
Vectorizing it prevents optimisation related to the store
instruction. This is having negative impact from a shader-db
point-of-view.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36638>
Otherwise, the following error is observed:
src/util/cache_ops_x86_clflushopt.c:40:22:
error: arithmetic on a pointer to void is a GNU extension [-Werror,-Wgnu-pointer-arith]
40 | void *end = start + size;
| ~~~~~ ^
src/util/cache_ops_x86_clflushopt.c:44:9:
error: arithmetic on a pointer to void is a GNU extension [-Werror,-Wgnu-pointer-arith]
44 | p += cpu_caps->cacheline;
| ~ ^
This works with GNU extension enabled, but does lead to warnings
with Clang.
v2: Add to trial_c + trial_cpp checks (Erik)
v3: use c_msvc_compat_args to avoid fixing other instances of this issue (Erik)
Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38752>
25.3.1 was planned for the day before the US Thanksgiving holiday, but
it slipped due to that holiday. The current plan puts 25.3.3 on
Christmas Eve, which will be missed for the same reason. To attempt to
fix this, I've updated the plan to move the missed release to this week,
with all releases re-aligned with that date. This moves the Christmas
Eve release to New Years Eve. We could possibly slip that by a week into
the new years as there is likely to be less work than normal done at
that time.
Due to the change in schedule I've removed one planned release, as we
should reach 25.3.Last and 26.0.1 at roughly the same time with one less
release.
Also update the last 25.2 release to match.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38757>
Fixes things like:
GfxStreamVulkanMapper.cpp:45:10:
error: no previous prototype for function 'chooseGfxQueueFamily'[-Werror,-Wmissing-prototypes]
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38751>
DXVK's DXGI implementation can create extra instances used for
enumerating physical devices besides the games' instance. When reserving
VMIDs for SPM, the DXGI instances may snatch the VMID reservation early,
making VMID reservation for the instance that actually needs it fail.
This starts being a problem on kernels 6.18+ where only one user may
reserve a VMID at a time.
Move reserving VMIDs to SQTT initialization inside vkCreateDevice so
that only the instances that actually create logical devices try
reserving VMIDs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38746>
According to AV1 spec, force_integer_mv=1 on intra frames. However, VCN
FW does not expect integer mv to be set unless screen content tools are
enabled. This also aligns the code to the radeonsi logic.
Cc: mesa-stable
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38716>
When the number of planes per descriptor is greater than one, the layout
used on Bifrost doesn't work. Fix that by making sure all plane
descriptors are contiguous in the texture/sampler table.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38145>
When preparing the attribute buffer descriptor, need to select the
correct plane index for multi-planar images (like ycbcr ones)
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38145>
When a vertex shader does not load any input attributes after any store
of output attributes, we can enable bit 25 of SPH ("ISBE space sharing")
Effectively this seems to allow input and output attributes to live in
the same allocated space in ISBE and could improve occupancy.
Found while researching geometry passthrough and mesh shaders.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38729>
Required so that uint array formats don't incorrectly wrap to last array
slice instead of the first one. For some reason this only happens with
uint texture formats while int and float formats work as expected.
Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38713>
The order of pReferenceSlots is not well-defined by spec. Instead we
need to look at the RefPicList0/1 which provides slot indices.
Cc: mesa-stable
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38686>
Since we only support 1 L0/L1 ref, the default num refs in the PPS
should always be 0. With that there never any need to set the override
flag in the slice header (until more references are supported).
Also the ref pic list modifications should be clamped to the size of the
ref pic list.
This fixes an issue seen with dEQP-VK.video.encode.h264.i_p_b_13_*.
Cc: mesa-stable
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38686>
Weird that CTS did not catch that ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: 11195eb8de ("vulkan: Add KHR_swapchain_maintenance1 promotions.")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38728>
The subslice IDs provided by the SR0.0 EU register are not adjusted to account
for fusing, so the upper bound max_scratch_ids can vary from device to device
depending on what specific slices were fused during manufacturing.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38689>
also preserve all metadata because it doesn't add/remove any instructions
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38599>
The last holdouts of the var options are gone so we can just emit the
system values. This is overall simpler as it confines all the sysval to
var logic to nir_lower_sysvals_to_varyings().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
This lets us request system values from lower_input_attachments and just
lower them ourselves instead of asking it to create variables.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
Since this is a downgrade path for drivers, it's useful to support both
forms of these common sysvals.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
We have nir_lower_sysvals_to_varyings() so we can just have that lower
it for the drivers who don't want a sysval. Most have to support the
sysval version anyway for various lowering so making them all have to
support both is pretty annoying.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
We almost always call both passes next to each other.
The code is copied from nir_io_add_const_offset_to_base. No changes.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38277>
aco_nir_op_supports_packed_math_16bit currently can't be used by amd/common
because tests don't link with ACO, so linking would fail, but we want
to move the nir_opt_vectorize callback here that uses it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603>
While it's true that we currently need to eventually fall back to
checking without the render-target binding, this should really be the
last resort. Because otherwise we might end up picking a format that
isn't possible to render to for a color-renderable internalformat.
In the long run, this code should be rewritten to check *properly* if
the internalformat is color-renderable or not *up front*, and not even
try to fall back. But we're currently missing proper helpers for this,
and reworking what we have is a fair bit of work.
So for now, let's just do what we currently do, but shuffle around the
order of testing things so we don't end up dropping unless we absolutely
have to.
Tested-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38673>
We still have to fall back to our pointers-as-indices for things like the
preamble's block or old streamout (I'm presuming here that pointers will
always be much greater than the NIR block count, which seems safe enough
for debug). This gives us nice printouts in debugoptimized builds, and
helps you correlate your ir3 back to NIR (which was helpful in the
hundreds of blocks in the shader I fixed in the previous commit).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38666>
To link together instructions in a rpt group we currently (ab)use
list_head. This is a bit of a hack because we don't actually have a
list_head that points to the first instruction without being embedded in
an instruction itself (the way list_head is supposed to be used).
Instead, the list_head embedded in the first instruction of a rpt group
also serves as the one pointing to the list. In order make a distinction
between the first and last instruction (for which the main list_head
would usually be used), we rely on the fact that (currently)
instructions in a rpt group are emitted in order which means that later
instructions have a larger serialno than earlier ones.
In order to make all this less hacky, and to lift the restriction of
needing instructions to be emitted in order, replace the list_head with
explicit rpt_next/rpt_prev pointers which link the instructions together
in a doubly but non-circular linked list.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38576>
There's no need to generate the separate memory accesses, when
nir_lower_mem_access_bit_sizes() has already done so. Also, the way the
64b address is handled changed in 2490ecf5fc ("ir3: ingest global
addresses as 64b values from NIR")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38688>
At the last round of the "remaining > 0" loop, we'd deref of the end of
binding layout in setting up pointers for the next loop. We don't need
these values that were getting updated at this point.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38684>
This is done in preparation of compression enablement to avoid allocating a VA
and then delete it later because compressed images use the memory object VA and
not the image plane VA. This guarantees we never put the image in an invalid
state and saves us an alloc and free in case of compressed images.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
This lays the groundwork for enabling compression by adding a way to pass in
whether the image will be compressed or not from NVK to NIL.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
Previously, there was some mixing up of alignments between the alignment
provided by the caller, and the minimum alignment we have (4KiB). Additionally,
there was some redundant aligning being done to data already passed in aligned.
This didn't matter because we were always using 4K pages anyways due to kernel
limitations. However, this now needs fixing to allow for larger page support.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
Previously, for imports we wouldn't carry over the PTE kind with the import,
which worked fine up till now. However, compression depends on the PTE kind
being correct otherwise there will be a mismatch between both sides.
The GEM info object we get from the kernel already has the PTE kind embedded in
the tile flags object, so all we have to do is retrieve it and store in the bo
object, and then the lower layers can retrieve the kind from the bo directly.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
This is so we can enable features needing kernel support based on whether the
detected kernel driver supports them or not by checking for the version in
nvkmd.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
In bccb9fe091 ("nvk/nvkmd: nouveau uses the OS page size"),
the alignment size was narrowed to the OS page size in
nvkmd_nouveau_alloc_tiled_mem. This makes the same change
for nvk_AllocateMemory.
This is being done in preparation for large page support, which will
be more picky about alignments.
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
Currently, GPU state is reset immediately after each flush and during
context creation, even when the next command might be a simple BLT/RS
operation that doesn't require the full GPU rendering pipeline.
This patch introduces lazy GPU state reset by:
- Adding a needs_gpu_state_reset flag to track when reset is needed
- Setting the flag to true after flush instead of immediately resetting
- Only performing the actual reset in etna_draw_vbo() when rendering
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36565>
Move RS_SINGLE_BUFFER from global context initialization to individual
RS operations, enabling it before each operation and disabling it
immediately after. The same pattern is seen in traces from the binary
blob driver.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36565>
There's nothing for the driver to do; it's all handled in spirv_to_nir.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38574>
It's will be used to replace SetEnvironmentVariableA,putenv on windows
and putenv,setenv on non-windows
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38640>
When laying out a sparse partially-resident image we need to align
rows of ordered blocks to a mapping granularity in bytes (i.e. the
page size) and array layers to a multiple of sparse block size.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
To implement sparse partially-resident images, we need to be able
to express mapping in terms of rectangles of texel blocks.
With row_align_B we can constrain the rows of ordered blocks to
start at mapping boundary (i.e. page size) and using array_align_B
we can ensure that each subresource starts at a multiple of
whatever sparse block size we decide to use.
Not setting each of these fields is the same as setting them to 1.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
While we're at it also add the SPDX header to panvk_sparse.c
because I forgor to do that when it was first being added.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
It's also used for testing helper invocations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3328dfa2f ("brw: only initialize sample mask flag if needed")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38699>
The staging buffer is persistent until the destruction of the pvr_pipeline
object, so we should set the allocation scope to PVR_ALLOC_SCOPE_OBJECT instead
of PVR_ALLOC_SCOPE_COMMAND.
Also did the same change in the function pvr_pds_coeff_program_create_and_upload
for the staging buffer, because that buffer is also destroyed at pipeline destruction.
Fixes dEQP-VK.api.object_management.single_alloc_callbacks.graphics_pipeline.
Signed-off-by: Leon Perianu <leon.perianu@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Tested-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38662>
It is sometimes useful to see the raw hex values of what instructions
are assembled to, similar to the output of shaders in cffdump. Add an
option for this to computerator.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37595>
We want to add some disassembly options in the future. Add new
ir3_shader_disasm_options function that takes options from a new
ir3_disasm_options struct in which we can add options later. The
original ir3_shader_disasm becomes a wrapper for the new function to not
have to update all call sites now.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37595>
Don't excludes stages coming from pipeline libraries. This caused valid
group indices referring to library stages to be dropped, leading to
mismatched stage_count.
Fixes: e05a9b77b6 ("vulkan/runtime: split rt shaders hashing from compile")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38669>
All things related to selecting the position when no sample is covered
isn't actually dependent on fragment shader loop iteration, in fact
it's not even dependent on the shader invocation, only the sample mask
(which is from jit context, not from shader key, otherwise could just
precalculate all of it). And certainly there's no need for all the extra
per-sample selects.
Just calculate it once at interpolation context init. LLVM should be able
to easily toss out (as with the previous version) all extra code done at
interpolation init if centroid interpolation isn't actually used.
(Although the code didn't turn out as simple as I hoped...)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38664>
D3D11 is pretty strict about how to do centroid interpolation.
In particular, llvmpipe didn't honor these rules when no sample was
covered for a pixel (relevant for helper pixels), in this case llvmpipe
selected the position of the sample with the highest index (just due to
initialization, not really by choice).
Given that helper pixels are only really used for derivative calculations,
and derivatives are generally sketchy with centroid interpolation, this
seems quite a lot of work, but I suppose it could be useful if the state
sample mask has only 1 sample set (since these d3d11 rules then guarantee
that even with centroid the derivatives are actually useful as the
interpolation will be done at the position defined by the sample specified
in the sample mask, regardless if that sample is covered by the primitive
or not).
Other APIs might technically not need this (they tend to not even define
at which position centroid interpolation is done, other than it must be
inside the primitive), but it shouldn't really hurt them neither.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38664>
This fixes a real issue when ESO uses fbfetch output because this
was determined after instead of before.
This solution isn't the most elegant one but binding graphics shaders
earlier would require more work. Let's just handle this specific corner
case for now.
This fixes
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.custom_resolve.shader_objects.fragment_region*
on some GPUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38617>
When the register allocator decides to spill a value, all reads of that
value are filled. This can result in cases where the same value is
filled many times in a single block. In those cases, the result of an
earlier fill may still be available when a later fill occurs.
This optimization replaces the later fill with a move from the result of
the earlier fill.
v2: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.
v3: Use brw_transform_inst. Suggested by Caio. Add
brw_scratch_inst::offset instead of storing it as a source. Suggested by
Lionel.
v4: In intervening spill to the same location also invalidates the
value. 🤦
v5: Don't eliminate a fill if its destination partially overlaps the
preceeding fill destination. Fixes failures in cooperative matrix CTS.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17249903 -> 17249653 (<.01%)
instructions in affected programs: 35550 -> 35300 (-0.70%)
helped: 20 / HURT: 0
total cycles in shared programs: 893092398 -> 893101836 (<.01%)
cycles in affected programs: 2501720 -> 2511158 (0.38%)
helped: 6 / HURT: 14
total fills in shared programs: 1901 -> 1776 (-6.58%)
fills in affected programs: 1757 -> 1632 (-7.11%)
helped: 20 / HURT: 0
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 929949528 -> 926770338 (-0.34%)
Cycle count: 105126671329 -> 104851299099 (-0.26%); split: -0.28%, +0.02%
Fill count: 6520785 -> 5021518 (-22.99%)
Totals from 54281 (2.69% of 2018922) affected shaders:
Instrs: 239616289 -> 236437099 (-1.33%)
Cycle count: 22051883404 -> 21776511174 (-1.25%); split: -1.33%, +0.08%
Fill count: 6406295 -> 4907028 (-23.40%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
When the register allocator decides to spill a value, all writes to that
value are spilled and all reads are filled. In regions where there is
not high register pressure, a spill of a value may be followed by a fill
of that same file while the spilled register is still live. This
optimization pass finds these cases, and it converts the fill to a move
from the still-live register.
The restriction that the spill and the fill must have matching NoMask
really hampers this optimization. With the restriction removed, the pass
was more than 2x helpful.
v2: Require force_writemask_all to be the same for the spill and the fill.
v3: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.
v4: Use brw_transform_inst. Suggested by Caio. The allows two of the
loops to be merged. Add brw_scratch_inst::offset instead of storing it
as a source. Suggested by Lionel.
v5: Add no-fill-opt debug option to disable optimizations. Suggested by
Lionel.
v6: Move a calculation outside a loop. Suggested by Lionel.
v7: Check that spill ranges overlap instead of just checking initial
offset. Zero shaders in fossil-db were affected, but some CTS with
spill_fs were fixed (e.g.,
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_uint64_t_requiredsubgroupsize).
Suggested by Lionel.
v8: Add DEBUG_NO_FILL_OPT to debug_bits in
brw_get_compiler_config_value(). Noticed by Lionel.
shader-db:
Lunar Lake
total instructions in shared programs: 17249907 -> 17249903 (<.01%)
instructions in affected programs: 10684 -> 10680 (-0.04%)
helped: 2 / HURT: 0
total cycles in shared programs: 893092630 -> 893092398 (<.01%)
cycles in affected programs: 237320 -> 237088 (-0.10%)
helped: 2 / HURT: 0
total fills in shared programs: 1903 -> 1901 (-0.11%)
fills in affected programs: 110 -> 108 (-1.82%)
helped: 2 / HURT: 0
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19968898 -> 19968778 (<.01%)
instructions in affected programs: 33020 -> 32900 (-0.36%)
helped: 10 / HURT: 0
total cycles in shared programs: 885157211 -> 884925015 (-0.03%)
cycles in affected programs: 39944544 -> 39712348 (-0.58%)
helped: 8 / HURT: 2
total fills in shared programs: 4454 -> 4394 (-1.35%)
fills in affected programs: 2678 -> 2618 (-2.24%)
helped: 10 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 930445228 -> 929949528 (-0.05%)
Cycle count: 105195579417 -> 105126671329 (-0.07%); split: -0.07%, +0.00%
Spill count: 3495279 -> 3494400 (-0.03%)
Fill count: 6767063 -> 6520785 (-3.64%)
Totals from 43844 (2.17% of 2018922) affected shaders:
Instrs: 212614840 -> 212119140 (-0.23%)
Cycle count: 19151130510 -> 19082222422 (-0.36%); split: -0.39%, +0.03%
Spill count: 2831100 -> 2830221 (-0.03%)
Fill count: 6128316 -> 5882038 (-4.02%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001375893 -> 1001113407 (-0.03%)
Cycle count: 92746180943 -> 92679877883 (-0.07%); split: -0.08%, +0.01%
Spill count: 3729157 -> 3728585 (-0.02%)
Fill count: 6697296 -> 6566874 (-1.95%)
Totals from 35062 (1.53% of 2284674) affected shaders:
Instrs: 179819265 -> 179556779 (-0.15%)
Cycle count: 18111194752 -> 18044891692 (-0.37%); split: -0.41%, +0.04%
Spill count: 2453752 -> 2453180 (-0.02%)
Fill count: 5279259 -> 5148837 (-2.47%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
These opcodes are emitted during register allocation instead of the
scratch reads and writes that were previously emitted. These
instructions contain additional information (i.e., the instruction
encodes the scratch offset) that enable optimizations to be added
later.
The fill and spill opcodes are lowered to scratch reads and writes
shortly after register allocation. Eventually this lower may have some
optimizations (e.g., reuse previous address calculations for successive
spills).
v2: Add brw_scratch_inst::offset instead of storing it as a
source. Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
This ensures that g0 is reserved for spilling since there is going to be
spilling.
Fixes: 8bca7e520c ("intel/brw: Only force g0's liveness to be the whole program if spilling")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
Don't set it before the render pass, that's unnecessary. In the future
we may want to move this to the FS state object, as the blob does, but
for now don't set it unnecessarily.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38581>
I can't find any evidence that the blob ever did this on a740 or a750.
Doing this breaks subsequent sysmem render passes and would force an
otherwise-unnecessary WFI with custom_resolve.
Fixes: 062e90f19b ("freedreno: Move RB_CCU_DBG_ECO_CNTL to raw_magic_regs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38581>
It's pretty spammy and since the whole purpose of queries is to report
support, why bother with errors?
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38661>
Support new vrend command that queries layout of a backing GBM buffer
for a giver vrend resource. Use it for querying stride/modifier of a
PIPE_SHARED resource, passing this info down to WSI for exported resources.
Now venus is able to import vrend resources, making gamescope work in KMS
mode on QEMU. Virgl doesn't use stride/modifier info of winsys when it
imports classic vrend resources, hence this change only affects venus
context when it imports virgl WSI buffers.
Based on initial version of resource-layout command from Daniel Stone.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Yiwei Zhang <zzyiwei@gmail.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37646>
The .resource_create_with_modifiers() callback became required after
7d1a32fafd for venus to work in KMS mode. This fixes GBM buffer
allocation failure for vkmark-kms and fixes implicit modifier not
working on host when using Intel i915 driver for running Steam with
gamescope-kms on guest. Note that KMS support for venus on QEMU never
worked before, hence this is not regression fix.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37646>
Sadly I don't see an obvious way to use it for int8 matrices, therefore
the code is a bit of a mess right now.
It allows us to vectorize load/stores more often as we can simply
transpose row/col major matrices when needed.
And the movm optimization is also only enabled for 16 bit types, even
though we _could_ do it for 32 bit. It's not clear yet if using it for 32
bit types is an overall advantage or not.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998>
Initial idea and code from Dave, but this is a complete rewrite of the
patch.
The Matrix layouts contain groups of values, for int8 we have vec4 groups,
for fp16, fp32 and int32 we have vec2s. With this we load and store them
as vectors getting rid of a bunch of address calculation.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998>
For umul24 we expose the operation as UMUL24_RTOP0 so we can identify
the difference between umul24 as part of a sequence generated from an
imul as "multop+umul24" and a simple umul24 where rtop will always be 0.
For umul24_rtop0 instructions we relax the scheduling restrictions,
so they don't need to be serialized like the multop+umul24 ops. But
we maintain the read dependency with the last_rtop.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38642>
Without this, we will report some image formats as unsupported
and the dedicated sparse binding queue won't work
when sparse support is enabled using RADV_PERFTEST=sparse
Fixes: dd90c76cea12 ("radv: Advertise sparse features pre Polaris with perftest flag")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38676>
These are failing in nightly builds. Seems we forgot to update these
xfails when fixing the problem.
Fixes: a6bf07e7c2 ("dri: avoid sending too many present reuqests when app start or pause"
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38675>
RDR2 VRAM memory management when resizable BAR is enabled seems
incorrect because it keeps allocating VRAM without freeing anything.
This introduces a drirc option to emulate a fake carveout of 256MiB to
workaround this game bug. This also adjust memory budgets by
distributing it between visible and invisible because AMDGPU reports
the same value for both when REBAR is enabled.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12091
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38627>
When running the Vulkan CTS test
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic ,
the driver sometimes crashes because of cleaning up sequences try to do
pvr_suballoc_bo_free() on bo's that is never initialized (thus old stale
value remains as pointer).
Fix the issues that lead to wild pointers access (a wrong cleanup
sequence and trying to free bo's that fails to be allocated).
The CTS test still fails here with "Allocations still remain, failed on
index 4274", but at least it does not crash now.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38506>
This is forgotten when advertising the corresponding extension, which
leads to inconsistency, thus fail of
dEQP-VK.api.info.vulkan1p2.feature_extensions_consistency CTS testcase.
Enable the corresponding feature too. I ran all CTS tests with
"mirror_clamp_to_edge" in name, which are all skipped with NotSupported
before (because of the feature being not advertised), and gain
3695/11140 Pass with the remaining ones still NotSupported (no Fail).
This also makes the feature extension consistency CTS testcase Pass too.
Fixes: 4d34c07b7a ("pvr: advertise VK_KHR_sampler_mirror_clamp_to_edge")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38653>
The usual pass modernization with the twist that I don't want new drivers
actually using it (-:
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38533>
All drivers use the same callback and it is unlikely that new drivers will use
this pass since it has better replacements today (lower_mem_bit_sizes for
memory, and it never worked for I/O). This should discourage as much.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38533>
nir_lower_wrmasks as-is is broken for semantic I/O, since semantic I/O is slot
based and nir_lower_wrmasks is purely byte-based. No drivers use it as such, and
no drivers should. Remove the support so people don't think it works. This came
up in !38482.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38533>
M1 chips are more restrictive than M2 and above. We need to enforce memory
coherency when needed through "coherent" for buffer memory and
"memory_coherence_device" for textures. Without these the memory operations
are not visible to other threads.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38595>
All non-atomic allocations are on pretty slow paths where we only have a
single invocation running. This means they're technically thread-safe
(assuming only a single queue) but it also means the perf of a single
allocation doesn't matter much. However, as a bunch of things are
becoming helpers that may or may not be run in parallel for things like
multi-draw, it becomes harder to know when non-atomic is safe. We're
probably better off using atomic allocations everywhere.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
The original asahi code assumed a subgroup size of 32 and a workgroup
size of 32 * 32 = 1024. This makes doing ctz(ballot(b)) across an
entire workgroup an almost trivial operation. On panfrost, we won't be
so blessed unless we choose a workgroup size of 16 * 16 = 256. It's
also not clear that we want to use workgroups at all and we may better
off sticking to just subgroup parallelism and cutting out memory
bandwidth by more than half. With the new code, the only requirement
should be that the subgroup size is a power of two (this is always true)
and that the workgroup size is an even multiple of the subgroup size.
Even though the new code looks way more complicated, thanks to the magic
of NIR constant folding, it should all fold down to the original code on
asahi and something even smaller if one opts to go for a single subgroup.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
The interface here intentionally doesn't handle multi-draw. It's
intended that the caller will sort that out in whatever way they want to
handle multi-draw dispatches.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
On asahi, we can still specialize based on the shader key and get
everything folded. But this gives drivers the option to make it
dynamic if they wish.
Co-authored-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
There are a few sysvals which exist just so we can specialize them based
on shader keys or linking. In the case where we can't specialize them,
this provides a pass which loads them from the appropriate poly_*_param
struct.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
The important bit here is that we move agx_update_vs() to before we
build up the poly_*_params so we have access to the final linked vertex
shader.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
It makes more sense here along with the output buffer. I think this
should be squashed with the previous commit (and not sure it works
without).
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
We have access to the poly_vertex_state from the GS so we might as well
use it. Asahi uses a single poly_vertex_state for VS and TCS and just
assumes the tessellator stalls before we update it for TCS. If a driver
wants to use two separate poly_vertex_state buffers, it will be the
driver's responsibility to make the system values return the right one.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
Instead of having the vertex output buffer be a system value and
something the driver needs to manage, put it in poly_vertex_param. We
already need to have it somewhere GPU-writable so we can write it from
indirect setup kernels. Instead of manually allocating 8B all over the
place just to hold this one pointer, stick it in poly_vertex_param.
This also lets us get rid of a NIR intrinsic.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
For vertex shaders, it comes from the preamble. For geometry and
tessellation shaders it comes straight from the root table.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
We're about to put more than just input assembly data in there so the
name will make a lot more sense. Also, add a comment to make it more
clear that this buffer applys to both VS and TES.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
Otherwise we get a lot of individual x/y/z stores to tesslevels when
we should really just be storing the whole thing at once.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
The newly rewritten remap_tess_levels_legacy will have already lowered
anything it cares about to URB intrinsics. So the generic remapping
pass won't see them, as it operates on generic input/output intrinsics.
This also drops some of the callback boilerplate we needed temporarily.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
This unifies the dynamic (SSO) and fixed (linked together) versions.
We emit piles of NIR as if we were doing the dynamic version, but
replace the tess config field access with constant values. It all
should optimize away back to something reasonable. We lower these
directly to URB read/write intrinsics.
It also rewrites the dynamic version to directly read/write the URB
rather than going through temporaries. The old version was broken
in that tessellation control shader invocations can technically use
the shared output area for cross-invocation data sharing with barriers,
although doing so using the built-in tesslevel patch outputs is very
unlikely.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We were using this for indirect loads of the shader input thread
payload, but there's no reason we can't use it for constant access
too. In this case we can just MOV from the ATTR file directly
without a special opcode that turns into MOV_INDIRECT later.
We also allow it to load multiple components now. This is useful
for say, returning vec4 pushed inputs. And, we allow it in more
stages than just the fragment stage.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We're going to change the intrinsic to a load(...) which puts "load" in
the name. Also, it's just more consistent with our usual terminology.
We also rename the corresponding backend opcode so they remain matched.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
These are a ton more convenient. When the TCS and TES were linked
together, the legacy layouts were a hassle, but didn't impose any
significant cost. With unlinked TCS and TES, the legacy layouts
involve significant runtime code for scrambling the data, whereas
the reversed layouts are substantially less overhead.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
st/nir lowers this for iris, and brw_link_shaders lowers this for anv,
but for unlinked tessellation control / evaluation shaders, the lowering
was not happening for TCS.
Just do it unconditionally when lowering TCS outputs and TES inputs.
This lets the remapping code just assume vectors all the time, rather
than getting single component stores with nir_intrinsic_component set
(which came from nir_lower_io lowering compact arrays).
This also requires changes to the dynamic unlinked TCS/TES lowering to
temporaries, which needs to use vectors rather than arrays with this
change. That code is going away in future patches anyway, but this
keeps it going for now to avoid interim breakage.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We now call remap_tess_levels before remap_non_header_patch_urb_offsets.
The latter already excludes tess levels anyway, so the order shouldn't
matter.
This paves the way for remap_tess_levels to skip handling some header
values in certain cases, because with reversed layouts, many of them no
longer need any special handling and we can just let the generic pass
handle them.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Just have a single remap_tess_levels that does either the
statically-known-primitive or the dynamic (unlinked) mode.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Our current legacy patch header layout handling doesn't actually care
which is which slot, and remaps everything to its correct spot anyway.
For using the newer "reversed" patch header layouts, it will be more
convenient to have outer as slot 0, and inner as slot 1, as that just
works with no special remapping needed for both quads and triangles
(but unfortunately isolines are still a pain).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
These work the same regardless of stage.
v2 (Ken): Rebase, move from mesh to all stages, add reorderable load
variant, allow channel masks to be non-constant even on Xe2.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Be consistent with lowering that happens after, so that it gets a full
vector register and can stride into it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We add several new intrinsics for accessing URB handles:
- load_urb_output_handle_intel
- load_urb_input_handle_intel
- load_urb_input_handle_intel_indexed
The latter is used by stages like TCS and GS where each input control
point has a unique handle. The index is which ICP to read from. The
others are for most stages, where all inputs or outputs are accessed
via a single handle.
Then we have URB load and store operations, split for Xe2+ (URB via LSC)
and earlier (HDC OWord messages):
- load_urb_vec4_intel
- load_urb_lsc_intel
- store_urb_vec4_intel
- store_urb_lsc_intel
The legacy vec4 variants take a handle and a 128-bit OWord offset as
sources. Additionally, stores take a set of channel enables to mask
off and avoid writing vec4 components. We don't use the WRITE_MASK
const-index as our channel enables are not required to be constant.
The Xe2+ LSC variants are simpler. Handles are byte offsets into the
URB memory region, and offsets are expressed in bytes. So we simply
add them into a single "address" source. We don't support writemasks
here, as they aren't really necessary with the better addressability.
(Plus, the store_cmask operations work significantly differently than
the previous HDC OWord messages). We will lower disjoint writemasks
to multiple stores.
Based on earlier code by Lionel Landwerlin.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Current extent can be 0xFFFFFFFF (-1) which indicates there is no current extent, and the swapchain will inherit that of which the application provides.
Check this before applying the hack.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31134>
only these case were not reporting anything
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38602>
Noticed renderdoc complaining about our size :
RDOC 692028: [18:08:18] vk_core.cpp(2272) - Warning -
VkPhysicalDeviceDescriptorBufferPropertiesEXT.imageCaptureReplayDescriptorDataSizeis too large at 32
(must be <= 16), can't support capture of VK_EXT_descriptor_buffer
Since we only need 2 pointers (main + private), we can shrink this to
16bytes. The 1/2 planes have a relative offset from the base.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38625>
If the threshold of the linear mipmap fallback for compressed
format is reached at a different mipmap level than the
size-compatible non-compressed formats the image can be viewed as,
then we have to disable the fallback. Otherwise, for some levels,
texels would be read from the wrong locations due to the tiling
mismatch.
NOTE: Prop driver falls back to LINEAR in this case.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38655>
Starting from at least A6XX gen3 there is an option for compressed
formats to have linear fallback threshold in compressed blocks, instead
of threshold in raw texels. However, proprietary driver doesn't enable
it on A6XX, so to be safe we also enable it only on A7XX.
Additionaly, clarify what the linear fallback threshold is really about.
It's about tiling, not strictly about UBWC.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38655>
fdl6_layout_image() already has the fallback-to-linear check. And with
gen8 this is no longer a fixed constant. So lets just consolidate all
the logic in fdl layout helper.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38655>
Attachment preserve is not correctly handled due to multiple issues:
- No knowledge that we need to do that from render pass
- Setting VkEvent from command buffers break MTLRenderPass
- Pipeline barriers break MTLRenderPasses
- vkCmdBeginRendering and vkCmdEndRendering don't account for it
Need to handle all those above first before we can have efficient
MTLRenderPasses implemented.
Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38612>
vulkan_gfxstream.h contains custom protocols not found
in vk.xml (vk_gfxstream.xml).
gfxstream_vk_entrypoints.h is codegen by Mesa common code,
and it does not accept the custom XML.
So avoid generating implementations for them:
guest/vulkan_enc/gfxstream_guest_vk_autogen_impl/gen/func_table.cpp:5321:1:
note: declare 'static' if the function is not intended to be
used outside of this translation unit
5321 | void gfxstream_vk_CollectDescriptorPoolIdsGOOGLE(
| ^
| static
fatal error: too many errors emitted, stopping now [-ferror-limit=]
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38632>
Fixes errors like:
ResourceTracker.cpp:62:6: error: no previous prototype for function 'zx_handle_close'
[-Werror,-Wmissing-prototypes]
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38632>
Workaround:
src/gfxstream/guest/connection-manager/GfxStreamConnectionManager.cpp:82:51:
error: null passed to a callee that requires a non-null argument [-Werror,-Wnonnull]
82 | tss_set(gfxstream_connection_manager_tls_key, nullptr);
| ^~~~~~~
Ultimately, the Bionic headers look wrong. Passing NULL to tss_set
is completely legit.
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38632>
Allows us to better determine if we need Z/W payload delivery.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36392>
In updateable AS, we keep all nodes active even if they're
degenerate/NaN, because too many games ignore API rules about not
making inactive nodes active (and some vendor tips outright advise this
behavior). We also need to match this by keeping everything active in
the update side. The ALWAYS_ACTIVE macro has been long removed and
replaced by VK_BVH_BUILD_FLAG, too. Since updating only happens to
updateable AS, don't even check for the flag, just implement the
always-active handling.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38488>
If we're using the singleton, we need to add to it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38638>
Not having the uses_printf will drop the printf info in serialization.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38638>
Allow for Mesa to be built with VAAPI support, without having to install
libva headers first.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38271>
The next commit will optimize b2f(not(a)) and b2i(not(a)),
so handle those in other patterns to prevent regressions.
No Foz-DB changes on its own.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38530>
Kopper is not supported on Android, and attempting to use it breaks zink
on the platform.
Disable kopper automatically when running on Android, fixing zink without
`LIBGL_KOPPER_DISABLE`.
Fixes: 3294cad341 ("egl: Rename dri2_detect_swrast() and also detect kopper")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14331
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38626>
RADV_PERFTEST=sparse is a new option to enable experimental
support for sparse features when they aren't enabled by default:
- gfx6 supports sparse, albeit with a reduced feature set
- gfx7 supports 3D images (with non-standard block shape)
and unaligned mip sizes
- gfx8 supports the same feature set as gfx7
(Polaris behaves more stable than other gfx8, so we had
already enabled it by default on Polaris for a long time.)
We pass all dEQP-VK.*sparse* tests on gfx6-8 when running on
a single thread however it may cause hangs or failures
when executing the tests on multiple parallel jobs.
We plan to enable this by default when we deem it stable enough.
Until then, users can already test some games that use it.
Note, at the moment there are some unsolved problems in the
amdgpu kernel driver regarding sparse bindings on these GPUs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38553>
The following sparse features are not supported by all GPUs, so
keep track of their support individually:
has_sparse_image_3d
has_sparse_image_standard_3d
has_sparse_unaligned_mip_size
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38553>
Don't use ADDR_TM_PRT_2D_TILED_THIN1 because it is not supported
on CI/VI according to CiLib::HwlOverrideTileMode, and it is also
missing from SiLib::HwlOverrideTileMode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38553>
Change has_cp_dma_with_null_prt_bug to cp_dma_supports_sparse
to know when CP DMA supports sparse. CP DMA doesn't support
sparse on any gfx6-9 chip.
Sources:
- d2669628 already documented this on gfx6 in 2018
- e259f405 added a radeonsi workaround for gfx9 in 2023
- 235f70e4 added a radv workaround for Polaris in 2025
Now RADV will use compute copy and fill for sparse resources
on all gfx6-9 chips (previously only did on Polaris and newer).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38553>
anv sets device->uses_ex_bso on verx10 >= 125 and then sets the
compiler->extended_bindless_surface_offset to that.
iris was not setting anything. However, LSC_ADDR_SURFTYPE_SS used for
scratch on Gfx12.5 is bindless, and Xe2 uses ExBSO for all UGM access,
so we need to be setting this.
Just set it in the compiler so both drivers have it set.
Fixes piglit arb_tessellation_shader-tes-gs-max-output -small -scan 1 50
on iris.
Fixes: 80c89909f3 ("brw: fixup immediate bindless surface handling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38645>
Unlike GFX10.3, on GFX11+ VRS override is part of PA_SC_VRS_OVERRIDE_CNTL
which also controls whether the VRS surface is enabled or not. This
new dirty state will allow us to re-emit that state without re-emitting
the complete framebuffer for VRS flat shading.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38527>
In the WAIT_ALL case in spin_wait_for_sync_file(), we were returning the
moment we saw the first success. However, this isn't a wait-all, it's a
bad wait-any. We should instead just continue on to check the next sync
until we've ensured that every sync in the array has a sync file. The
only reason this wasn't blowing up in our face is because it only
affects non-timeline drivers (pretty rare these days) and because most
of the places where we use WAIT_PENDING on non-timeline drivers is to
guard a sync file export and those typically have only a single sync in
the array.
Cc: mesa-stable
Reviewed-by: Gurchetan Singh <gurchetansingh@google.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38635>
The a618-gles-asan job is not a turnip job, so turnip rules don't apply
to it. Change it to use GL-related set of rules.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38643>
Mark spec@egl_chromium_sync_control jobs as passing after the commit
a6bf07e7c2 ("dri: avoid sending too many present reuqests when app
start or pause").
Fixes: a6bf07e7c2 ("dri: avoid sending too many present reuqests when app start or pause")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38565>
When discarding a fragment in Metal, it will not be demoted to helper. At
least for Apple Silicon M1 and M2. Call nir_lower_is_helper_invocation to
work around this.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38590>
If the library_path is just a basename like `libvulkan_lvp.so`, then we
can share the same JSON manifest like `lvp_icd.json` between all of the
architectures, like we already do for Vulkan layers. The library will
be looked up in the dynamic linker's default search path in this case,
and in practice will be found in `${libdir}`. This is how the Mesa's
EGL driver and Vulkan layers work, how Mesa is packaged in Debian 13,
and also how the Nvidia proprietary driver works; it makes installation
simpler for distros, especially on multiarch systems like Debian and
the freedesktop.org SDK.
However, if we want a separate manifest per architecture in order to
be able to write the full path into it, we still need per-architecture
filename disambiguation like `lvp_icd.x86_64.json`.
We presumably still want a separate per architecture on Windows, because
the concept of a single monolithic `${libdir}` is less common there, and
it can also be helpful during development when setting `$VK_DRIVER_FILES`
to force the use of a specific driver installed in a non-default location.
Use the following parameter to passed to vk_icd_gen:
'--icd-lib-path', vulkan_icd_lib_path,
'--icd-filename', icd_file_name,
output : 'virtio_icd.' + vulkan_manifest_suffix,
and the output is passed by '--out', '@OUTPUT@',
so we can detect vulkan_manifest_per_architecture from the --out parameter in script.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13745
Signed-off-by: Simon McVittie <smcv@collabora.com>
Co-authored-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37314>
If the library_path is just a basename like `libvulkan_lvp.so`, then
we can share the same JSON manifest between all of the architectures,
like we already do for Vulkan layers. This is also how the Nvidia
proprietary driver works, and how Mesa is packaged in Debian 13.
However, this will only work if we don't mark the manifest as being
architecture-specific.
This partially reverts commit f7aa6ba9 "vulkan: Specify library_arch in
ICD files".
Signed-off-by: Simon McVittie <smcv@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37314>
They shouldn't have any effects because on GFX12 DCC is transparent
to the userspace driver, and they should improve performance for the
games listed below:
- DOOM (2016)
- Wolfenstein II
- Red Dead Redemption 2
- WWE 2k23
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38481>
In 80c89909f3 ("brw: fixup immediate bindless surface handling") I
forgot that we have a special usage for the only _SS surface (the
scratch surface).
Because it's only delivered in the 31:10 bits of R0 and because we
want to minimize the amount of shader instructions for scratch
messages, the surface offset in shifted right by the driver to align
things properly for the 31:6 extended descriptor format.
This is unfortunately incompatible with the full 32bit format of
ExBSO. So this surface type currently cannot be considered bindless.
We might revisit later if we start using _SS surfaces for other
things.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 80c89909f3 ("brw: fixup immediate bindless surface handling")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38618>
This adds several fixes so that kumquat can build.
TEST=meson setup -Dvulkan-drivers="gfxstream" -Dgallium-drivers="" \
-Dzlib=false -Dopengl=false -Degl=false \
-Dvirtgpu_kumquat=true
--cross-file ${CROSS_PATH}
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
A transitive dependency is a dependency of a dependency.
So if, for example, mesa3d_util does explicitly use
zerocopy-derive, it should not need to a depend on it.
The package that pulls in transitive should already
include it has a dependency. This provides clearer
methodologies in maintaining Rust code, similiar to
how Cargo.toml handles it.
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
Useful for Kumquat use cases. Actually, useful for pretty much
anything Windows-related. This are foundational crates supported
officially by our friends at Microsoft.
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
The virtgpu_kumquat deps -- used by gfxstream -- don't work
with cross-compile. This is because they set "native: true" even
for host machine deps.
In addition, some proc-macro deps (which need "native: true") try
to link against host machine deps.
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
The shader-db functionality was interfering with the error
filters.
Two new options are added: R600_DEBUG=shaderdb and
R600_DEBUG=precompile. The option precompile is added
to maintain the compatibility with the shader-db repository.
This change fixes 22 of these tests:
deqp-gles31/functional/debug/error_filters/case_.*: warn pass
deqp-gles31/functional/debug/error_groups/case_.*: warn pass
Fixes: 28d6a5af25 ("r600: Add shader precompile and shader-db support.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38485>
Without these fixes, H.265 streams using long-term references would
fail to decode correctly as the decoder wouldn't distinguish between
short-term and long-term reference frames.
Fixes: 896f95a37e ("vulkan/video: fix h265 decoding with LT enabled.")
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38571>
An H.265 SPS can contain multiple short-term reference picture sets.
Fix the code to properly store and copy all sets instead of just one.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38571>
Moving `ci-tron:priority:` out of the variable because an empty value
will not be authorized, and this makes it obvious if that bug ever
happens (job will not be picked up and gitlab will complain that
`ci-tron:priority:` is not a tag registered by any runner), instead of
getting picked up by any runner that will then reject (fail) the job.
(This is caused by GitLab's API not allowing tags to be enforced when
picking up jobs, resulting in jobs with missing tags being picked up by
any runner, like the bug we had with the generic fd.o runners a few
months ago.)
v2 (Martin Roukala):
* use the priority tags in all amdgpu jobs
* add missing tags in etnaviv jobs
* add missing tags in broadcom jobs
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37897>
Fixes this test on Xe2+:
INTEL_DEBUG=no32 ./deqp-vk -n dEQP-VK.spirv_assembly.instruction.maint9_vectorization.bit_field_u_extract.result_v16i-base_v16i-offset_s64u-count_s16i
Generate invalid code for that platform:
and(16) g37<1>UW g65<16,4,4>UW 0x000fUW { align1 1H I@5 };
ERROR: Invalid register region for source 0. See special restrictions section.
Several helpers like has_subdword_integer_region_restriction() do not
see the final type of the source, so compute it early.
Maybe new_src could be used in more cases. Being conservative for now.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38548>
The libvulkan_intel does not need the decoder when buildtype=release
where the debugging is disabled.
However, the decoder implementation is decided by the dep_expat
which may be turned on by like -Dtools=intel and the binary size
of libvulkan_intel increase unexpectedly.
This change creates the stub dependency and decide the exact
decoder dependency of libvulkan_intel by the buildtype.
Test: meson setup builddir -D build-tests=true -Dbuildtype=release --reconfigure && ninja -C builddir && cd builddir && meson test
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Andy Hsu <hwandy@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38569>
On Linux, drm-shim is the replacement.
On Windows, the project to support a compile-only device has been
abandonned since a while, so it's fine to not allow creating NULL
devices for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38544>
We already supported 1.4 which has VK_KHR_global_priority and the
globalPriorityQuery feature.
tested with:
dEQP-VK.api.device_init.create_device_global_priority*
dEQP-VK.synchronization.global_priority_transition.*
Signed-off-by: spencer-lunarg <spencer@lunarg.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38340>
When probing on generic Linux platforms, the loading of d3d12 and
the first init of could fail, but the error returned causes a
loader warning to be printed.
Use the correct error return to stop this.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38611>
Enable more CI-Tron jobs to be run in parallel with baremetal ones.
Disable the baremetal ones for those who CI-Tron jobs has already been
tested enough.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38610>
The option's description is:
> Whether to use LLVM for the Gallium draw module, if LLVM is included.
Let's disable it right away if LLVM is disabled, to avoid some
configurations from failing.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38558>
Prior to commit b8b38d38b1 ("meson: reinstate LLVM requirement for r300
and enforce it for i915 too") it was possible to build and use r300 for
architectures that do not have LLVM (e.g., alpha).
The only SWTCL chips are integrated graphics in x86 systems, and are not
available in discrete cards.
Fixes: b8b38d38b1 ("meson: reinstate LLVM requirement for r300 and enforce it for i915 too")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38580>
Swizzle can include PIPE_SWIZZLE_0/_1 (4 and 5) which result in indexing
beyond the channel array.
Reported-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Fixes: 76e350671f ("freedreno/a6xx: Sysmem clear fixes")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38593>
This allows us to fully define crashdec in all situations, even when
its essential dependencies are not found. The result is that if
libarchive or lua is not found, the tests become disabled automatically.
This avoids the situation where `-Dbuild-tests=true`, but at least one
of libarchive and lua are missing, in which `crashdec` is used
undefined.
Acked-by: Rob Clark <rob.clark@oss.qualcomm.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38579>
There are cases where the freedreno `crashdec` program will not be
built, but will still be used. By making dep_lua a disabler, we move
closer to being able to have those tests automatically disabled when
crashdec isn't built.
Acked-by: Rob Clark <rob.clark@oss.qualcomm.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38579>
This is the last level layer of emission, we want the tracking to be
added above that, so that when flushing of previously accumulated
reasons happens, another pointless reason isn't added.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38542>
Removes encoder's use resource utilities. All memory allocations
are now tracked in the VkDevice level residency set. This is
accomplished by tracking buffer objects at create/destroy.
Also removes all descriptor set residency tracking since it is
no longer needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38505>
Mesa builds using -Dperfetto=true generate a pps-producer binary.
Include it in the artifacts for use in runtime performance tracing
tests.
Signed-off-by: Laura Nao <laura.nao@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38517>
Enable Perfetto tracing support in Mesa's x86_64/arm64 builds for Linux
and Android. This enables GPU performance counter collection via pps and
sets up the environment for runtime GL tests with support for CPU, GPU
and system-wide tracing. Information captured by Perfetto will provide
driver developers insight into the test environment and help identify
factors affecting performance.
Signed-off-by: Laura Nao <laura.nao@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38517>
This should be ignored on non-compute stages but AGX changes 3D shaders
to compute without setting the workgroup size and blows up if it claims
variable workgroups. The safest thing is to only set it from
spirv_to_nir for stages that actually have workgroups.
Fixes: 6d9f563960 ("spirv: Assume variable workgroup size unless it's set")
LoLed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38555>
This is strictly a GL thing. iris can manage it in its own program keys
without polluting the compiler with stuff nobody else cares about.
We can also drop a lot of padding that was introduced in commit
a18835a9ca which doesn't appear to be
necessary.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38556>
We're storing iris keys here, not brw keys. This worked because brw
keys are larger so you could fit any iris key in the memory.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38556>
The versioning scheme changed in v45.0 (the previous version was
3.48.0). As such, this version check would wrongly accept e.g. 48.0.
Fixes: e9341568fa ("meson: require sysprof-capture-4 >= 4.49.0")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38557>
We don't have to enter the lower-IO-to-temps block for TCS at all.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38470>
Previously legacy_gs_info calculated based on
gs_info->legacy_gs_info.esgs_itemsize which is calculated based on gs
input varyings.
However, when using ESO vs/tes can have outputs not read by gs, which
leads to underestimating LDS usage.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38514>
One might think that the `64B` bit might be affecting just the
memory size, but that's not the case as it affects the register
count too. Setting `64B` with `CNT` of `1` actually copies 2
registers, and not 1.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38535>
This makes it possible for these to be used in static_asserts or if
we want to get a register offset (gen agnostic) and have it be
marked as constexpr. E.g.
```
constexpr uint32_t src_reg = __TPL1_A2D_SRC_TEXTURE_BASE<CHIP>({}).reg;
```
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38549>
Attempting to separate things out by gen is a bit arbitrary. And gets
increasingly awkward as we introduce gen8 support, which builds on gen7
but changes some a6xx values.
Just move it all into a single 'props' struct.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38515>
These two regs only exist in a6xx. And only need a static value. So
move them to raw_magic_regs and drop the fd_dev_info field and
corresponding driver code.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38515>
Sadly this probably won't change anything in terms of perf as the CCS
engine has a bunch of other restrictions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 243c01c703 ("anv/iris: implement Wa_18040903259")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38484>
When there are no color outputs in the rendering state, but color write
enable/write aren't masked out (which seems legal with
VK_EXT_dynamic_rendering_unused_attachments), the driver must emit
CB_DISABLE to disable CB rendering completely.
Otherwise, if there is also a depth/stencil attachment in the rendering
state, CB0 is always set to 32_R for RB+. That means, the pixel shader
would still export fragments but to the previously bound color
attachment.
VKCTS is missing coverage.
Fixes: 4580293ab2 ("radv: implement RB+ depth-only rendering for better perf")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14319
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38509>
The old one was abandoned without so much as a README note.
This will also allow using newer releases than 47; the current one being
53, but this MR doesn't address that, as it aims to be a simple no-op
change.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38525>
Previously, we update the sfb dst slot upon vn_SignalSemaphore so that
vn_GetSemaphoreCounterValue can poll just the feedback slot itself.
However, that can race with pending sfb cmds that are going to update
the slot value, ending up with stuck sync progression.
This change fixes it by disallowing vn_SignalSemaphore to touch the sfb
dst slot. To ensure counter query being monotonic, vn_GetSemaphoreCounterValue
now takes the greater of signaled counter and the sfb counter read.
Test with dEQP-VK.synchronization* group:
- w/o this: stuck shows up within 2 min with 8 parallel deqp runs
- with this: no stuck for multiple full runs of the same
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14304
Fixes: 5c7e60362c ("venus: enable timeline semaphore feedback")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38516>
The logic here is a bit scattered around and is about to get more
complicated. This adds a helper which better documents the interactions
as well as an info field to make the driver's life easier.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38504>
This potentially results in better code because we don't add def uses where
undef is allowed.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38468>
Saves unncessary PC and stall during encode phase.
Thanks to Felix for pointing out that CCS always needs a CS stall once
we add a pipe control, that will kill the performance for BVH
construction.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38513>
Up to 4 reasons can be saved and displayed. Previously, we were
only displaying one reason for Perfetto.
Co-authored-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38500>
--create-unlinked also creates entrypoints for the functions, and
obviates the need to create a dummy entrypoint. This is one step closer
to removing glsl2spirv and aligns us with other users of glslang.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38088>
This change forces image->buffer->image copy path for pretty much
all the cases now.
Metal's image to image copy only allows same format and sample
count. Previously we were only taking the image->buffer->image
path for compressed formats. This just seemed to work, but we may
run into issues in the future. Metal does not report any
validation layer error.
Acked-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38459>
This fixes an issue a bunch of different components were all working
around themselves where sometimes we don't have a workgroup size but
workgroup_size_variable is false. This also fixes asahi_clc, which
didn't have the workaround and was assuming zero (but not variable!)
workgroup sizes everywhere.
LoLed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38538>
vkQueueBeginDebugUtilsLabelEXT and vkQueueEndDebugUtilsLabelEXT
require queue to be externally synchronized, which means these functions
require the lock. Unfortunately, there's no guarantee that the debug
markers will be matched in the multithreaded case, but I suppose this is
better than crashing.
Fixes: 015eda4a41 ("zink: deduplicate VkDevice and VkInstance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>
We currently only create one queue per queue family on the device. The
device can be shared between multiple zink_screens, so having one lock
per screen can still lead to multiple locks per queue. Fix this by
allocating queue_lock along with the device.
This fixes an issue that was causing crashes with nvk+zink and
QtWebEngine with QTWEBENGINE_FORCE_USE_GBM=1 This can be reproduced by
resizing the window in either:
* anki - https://apps.ankiweb.net/ or
* Qt's simplebrowser example
https://doc.qt.io/qt-6/qtwebengine-webenginewidgets-simplebrowser-example.html
which would then cause this dmesg error:
nouveau 0000:01:00.0: anki[92007]: Failed to find syncobj (-> in): handle=40
along with a context loss.
With VK_LOADER_LAYERS_ENABLE=VK_LAYER_KHRONOS_validation we would additionally
get warnings like:
Validation Error: [ UNASSIGNED-Threading-MultipleThreads-Write ] | MessageID = 0xa05b236e
vkQueueSubmit(): THREADING ERROR : object of type VkQueue is simultaneously used in current thread 139824449189568 and thread 139823901816512
Objects: 1
[0] VkQueue 0x557a666783e0
Fixes: 015eda4a41 ("zink: deduplicate VkDevice and VkInstance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38414>
For geometry shaders, we're going to need to compile various graphics
shaders down to compute shaders. This means that they'll look like
compute shaders to much of the compile pipeline but ultimately get
executed as graphics shaders. Most of the time, the compiler will just
happily take whatever offset you give and try to load the sysval from
there so you can load a graphics sysval from a compute shader just fine.
However, for the common ones, we switch on the shader stage and load
from a different offset for 3D vs. compute. This breaks the moment you
have a compute shader that's going to actually load from a 3D sysval
space.
The solution here is to ensure that any common sysvals (currently just
the push uniforms address and the printf buffer) are at exactly the same
offset in both. This is done by adding a panvk_common_sysvals struct,
some static asserts, and a bit of macro magic to keep things eurgonamic.
This also changes push uniform upload to just swap in the push uniform
address instead of writing it to the command buffer on every iteration.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38508>
Our backend compiler explains the limits as :
32 bytes for the patch header (tessellation factors)
480 bytes for per-patch varyings (a varying component is 4 bytes and
gl_MaxTessPatchComponents = 120)
16384 bytes for per-vertex varyings (a varying component is 4 bytes,
gl_MaxPatchVertices = 32 and
gl_MaxTessControlOutputComponents = 128)
In all that's :
* 32 patches * 128 components (counting tessellation factors)
* 32 vertices * 128 components
8192 total components.
I'm not sure why the limit was set so low, maybe leftover from older platforms?
Bump the limit to something like competition.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38523>
We were overflowing an array during bifrost disassembly. This was
only a problem if the user explicitly set an environment variable,
so unlikely to occur in casual use, and also only could be triggered
in very specific, dense code. But we still should get this right!
The specific CTS test that caused the assert is:
'dEQP-VK.graphicsfuzz.stable-quicksort-for-loop-with-injection'
with environment variable `BIFROST_MESA_DEBUG=shaders`. One of the
shaders has a clause with 6 constants (the maximum) and this overflowed
the array because we assume we always have an extra slot (used for
modifier processing).
Cc: mesa-stable
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38501>
Having file names and dates in the generated file affects
reproducibility. Build systems (like OE) error out on the gen_header.py
output, because it can contain full paths. Drop file list from the
generated file.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38528>
Having file names and dates in the generated file affects
reproducibility. Build systems (like OE) error out on the gen_header.py
output, because it can contain full paths. Drop file list from the
generated file.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38528>
Skip TU_CMD_FLAG_WAIT_FOR_BR wait whenever concurrent binning is disabled.
Without CB there is nothing to wait for, so the sync only adds overhead,
and in workloads with thousands of tiny renderpasses the cumulative overhead
becomes too big.
In one real-world workload I saw the following timings:
- 99.20 ms without disabling TU_CMD_FLAG_WAIT_FOR_BR
- 65.15 ms with TU_CMD_FLAG_WAIT_FOR_BR disabled
- 64.92 ms with TU_DEBUG=nocb
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
Disable concurrent binning by default so regular renderpasses have access
to all vertex fetch resources. When a renderpass can actually enable CB,
walk back to the CB barrier at submission time and re-enable CB for all
patchpoints between CB barrier and the renderpass.
Because we expect at most one or two renderpasses with CB per frame,
the number of patches stays small.
The reduced vertex fetch resources resulted in up to 10% performance loss
seen in targeted benchmark and in a few game captures.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
The sync emitted on TU_CMD_FLAG_WAIT_FOR_BR didn't disable CB
when CB was previously disabled for the renderpass, this resulted
in less resources vertex processing resources available for BR.
We can just not emit the sync instead, since next time CB is enabled
it will force the sync.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
We have to disable CB when lrz fast-clear is disabled, but if there
is no depth image at all, we can keep it enabled. This means that
RP without depth won't effectively be a CB barrier.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
ALU instructions typically have a maximum of 3 operands, and even when combining
instructions, the peak count will not go above 4.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
Still a bit more aggresive than the classic is_used_once,
but it should still prevent most regressions for patterns
that use min/max/mul as outer instruction.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38150>
Trying to cut down apply_pipeline_layout a bit and also allowing some
reuse for a new extension.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38495>
This reduces some code-repetition, and makes it a bit easier to reason
about what this actually tests for.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
Unlike textures, we can't easily do format-conversion of the data
before, because the source is a buffer object and not a texture object.
But we already have a hammer for this in Mesa, which means we'll drop
the ARB_texture_buffer_object extension support, but only for the OpenGL
compatibility profile. We still get GL 4.6, both core and compatibility.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
Unlike textures, we can't easily do format-conversion of the data
before, because the source is a buffer object and not a texture object.
But we already have a hammer for this in Mesa, which means we'll drop
the ARB_texture_buffer_object extension support, but only for the OpenGL
compatibility profile. We still get GL 3.1 exposed.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
Unlike textures, we can't easily do format-conversion of the data
before, because the source is a buffer object and not a texture object.
But we already have a hammer for this in Mesa, which means we'll drop
the ARB_texture_buffer_object extension support, but only for the OpenGL
compatibility profile. We still get GL 3.2 exposed.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
While OpenGL 3.1 does require texture buffer objects, the ARB spec for
this requires support for texture buffers with alpha, luminance,
luminance-alpha and intensity formats in addition to RGBA formats. The
version of texture buffer objects that ended up in the OpenGL spec (even
in the compatibility spec) does not require these formats.
But, we don't even need to check this, because this is already included
in the GLSL 1.40 requirement that's also checked. So this shouldn't make
us expose GL 3.1 in cases where it isn't supported in the first place.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
This code-path hasn't been solely about ARB_texture_buffer_object for a
long time, let's make the error message more generic to not confuse
people. While we're at it, remove the comment that brings the same
confusion.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
GL_EXT_texture_buffer_object requires support for alpha, luminance,
luminance-alpha and intensity formats. If we can't support those, we
can't enable the extension.
Fixes: 45ca7798dc ("glsl: handle interactions between EXT_gpu_shader4 and texture extensions")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
Most of the time, we remember to check for both extensions. But in one
case, it seems we forgot the GLES extension. Whoops.
Let's switch to a helper here, so we don't have to repeat the logic over
and over again.
Fixes: b4c0c514b1 ("mesa: add OES_texture_buffer and EXT_texture_buffer support")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
CP DMA isn't coherent with L2 on GFX12, but {SRC,DST}_ADDR_TC_L2 means
MALL.
Only small buffers are using copy/fill CP DMA operations, so this
shouldn't have much effect.
Found by inspection.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38449>
There's a single underlying bo mapping shared by the initial alloc here
and the later import of the same. The mapping size has to be initialized
with the real size of the created blob resource, since the app can query
the exported native handle size for re-import. e.g. lseek dma-buf size
Similar to virtgpu_bo_create_from_device_memory, the app can do multiple
imports with different sizes for suballocation. So on the initial
import, the mapping size has to be initialized with the real size of the
backing blob resource.
Backport-to: 25.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38443>
If the allocation originates from the same instance, the tracker map
size follows the allocationSize. After export and re-import, mapping the
whole dma-buf can exceed the original map size. This change backs out
the offending changes.
Test: dEQP-VK.api.external.memory.*.suballocated.host_visible.*
Fixes: 442f242a49 ("venus: requests whole blob mem size for non-dedicated import")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38443>
Found when running glxgears with vblank enabled and modesetting DDX.
glxgears will send many present requests at the beginning, but most
of them get complete event with skip mode. This problem causes
glxgears report ~75fps on a 60Hz monitor at the first record.
This change reduces it to 60fps.
Vulkan side X11 WSI does not have this problem as it will wait first
present request's complete event before send second present request.
How the problem happens:
1. client send present request 1 with target msc = 1
2. server side current msc is 100, so it find request 1 is
outdated and queue it for vblank with target msc = 101
3. client send present request 2 with target msc = 2
4. server side current msc is still 100, so it find request 2
is outdated and queue it with target msc = 101, and find
request 1 will be overridden, so mark it as skipped and
send idle notify for it.
5. client get the idle notify for request 1, and reuse the
request 1 buffer for new back buffer to send present
request 3.
6. this keeps going until client send present request N, and
server finally process the vblank queue before 101 msc
arrive and send complete event for all these requests back
to client.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38178>
Rewrite ANV's encode.comp, the final intel-specific raytracing shader
used for bvh-build. Performance is greatly improved for this shader
by adding the following features:
1) Find children early. All threads speculative find their children
before they know if they are valid (not collapsed). This makes more
work overall but reduces latency for propagating valid nodes from
root to leaves. Nodes find out if they are valid faster if all nodes
know who their children are upfront.
2) Hoist code used for intra-thread communication. Communicate
to children as soon as possible, minimizing wait time for later
threads.
3) Multithread encoding. Still launching 1 simd lane per node, same
as before, but encoding of nodes and children are parallelized across
multiple lanes. This works well because most nodes are collapsed
without any encode work required.
4) Hash globalID. Reduce chance that the thread processing a node
will also need to process node's children, which was found to
degrade performance, particularly for root node processing.
Measured RT game speedups:
* Hitman3 +48%
* F1'22 +10%
* Indiana Jones +8%
* GravityMark +2.5%
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36937>
This is quite a lot of logic to duplicate verbatim just to deal with the
slightly different synchronization.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38496>
This simplifies cases where we need to match on all of the possible
iter_sb values, which occurs frequently.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38496>
This commit also checks for and issues errors for the following:
INVALID_OPERATION is generated if the application attempts enable pixel
local storage while the value of SAMPLE_BUFFERS is one.
INVALID_OPERATION is generated if the application attempts to enable pixel
local storage while the current draw framebuffer is a user-defined frame-
buffer object and has an image attached to any color attachment other than
color attachment zero.
INVALID_OPERATION is generated if the application attempts to enable pixel
local storage while the current draw framebuffer is a user-defined frame-
buffer and the draw buffer for any color output other than color
output zero is not NONE.
INVALID_FRAMEBUFFER_OPERATION is generated if the application attempts to
enable pixel local storage while the current draw framebuffer is
incomplete.
INVALID_OPERATION is generated if pixel local storage is disabled and the
application attempts to issue a rendering command while a program object
that accesses pixel local storage is bound.
INVALID_OPERATION is generated if pixel local storage is enabled and the
application attempts to bind a new draw framebuffer, delete the currently
bound draw framebuffer, change color buffer selection via DrawBuffers, or
modify any attachment of the currently bound draw framebuffer including
their underlying storage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
Shaders might declare PLS vars as inout but might just use them as in
or out but not both. This pass detects those cases and adjusts the
variable/deref modes accordingly.
This pass should be called before nir_lower_io_vars_to_temporaries(),
otherwise the copy_derefs will be inserted, turning unused variables
into used ones.
This should ideally be called after DCE to make sure we don't leave
PLS inout variables behind.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
Pixel local storage variables are like fragment shader outputs that
might be read, written or both. Teach nir_lower_io_vars_to_temporaries()
about these variables so they can be lowered along with the regular
fragment outputs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
Rather than adding another boolean to optionally lower PLS vars, pass
the types we want to lowers through a nir_variable_mode bitmask.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
The pixel local storage load and store instructions keep track of the
format of the pixel local storage variables. This allows drivers to insert
the appropriate conversions on load/store.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
Property names no longer have a maximum length in Android 26+,
support longer names to fix truncated property names.
Signed-off-by: Allen Ballway <ballway@google.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13043
Test: vendor.mesa.custom.border.colors.without.format is untruncated
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38453>
Some GPU hangs witnessed in the wild on RDNA4 in Control and Arc Raiders
seem to point towards closest-hit shaders reading a stale value for the
SGPR pair containing the currently-executing shader's address.
This SGPR pair was read by VALU in the preceding traversal shader,
making it susceptible to VALUReadSGPRHazard. Inserting
VALUReadSGPRHazard mitigations before accessing the s_setpc target seems
to fix the hang. We don't have conclusive proof that this is hazardous,
but given that all signs point towards it and we have a reasonably
simple workaround, let's roll with this for now to mitigate the hangs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38290>
This basically remaps color attachment formats for the resolve
operation.
Co-Authored-by: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38442>
When no workgroup size is specified we try to run with the most optimal one
possible. However we didn't take into account that we shouldn't run a
workgroup of higher dimensionality than requested by the application.
Fixes: 376d1e6667 ("rusticl: implement cl_khr_suggested_local_work_size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38375>
There were two issues:
1. The global_work_offset parameter is optional but we errored on NULL
2. We didn't return the reqd_work_group_size when set on the kernel.
Fixes: 376d1e6667 ("rusticl: implement cl_khr_suggested_local_work_size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38375>
There is no disable_screen_content_tools in AV1 spec, instead this
should be seq_choose_screen_content_tools. But we don't need that either
as we keep the effective value in force_screen_content_tools.
Same for seq_choose_integer_mv and force_integer_mv.
Also stop overriding these values and instead fix frame header coding
to work with all combinations.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38260>
Before this commit, nested loops aren't counted correctly:
-------------
V |
-> A --> B --> C ->
^ |
-------
A is both predecessor and successor of B but A isn't in B's loop.
Instead a block B is in loop header H's block if H is the successor
of B and H dominates B.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38393>
Instead of inserting the spill instruction before the instruction that
caused the spill, instead insert it either right after the definition
or at the end of the block that contains the definition.
This helps reduce code size and also moves STOREs outside of loops on
average.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38238>
_math_matrix_is_dirty() should only be used to decide if we need to
run _math_matrix_analyse(). We already decided that we had a new
texture matrix when we called _mesa_update_texture_matrices() so
we need to set _TexMatEnabled correctly otherwise we might
incorrectly return _NEW_FF_VERT_PROGRAM | _NEW_FF_FRAG_PROGRAM in
the following if-statement.
Fixes: ec978e002f ("mesa: only update fixed-func programs on texture matrix enablement changes")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14286
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38473>
This add support to the lowering the reduction operations.
Thanks to Georg Lehmann for a lot of the ideas and optimising in
this.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
This adds some new call operations to handle various parts of the
reductions.
cmat_reduce: is the initial toplevel operation from SPIR-V
this is used after lowering for row/col operation on single hw
supported matrix sizes. The spir-v operation is lowered into
multiple of these on flex dimensions, but also can be lowered into
others.
cmat_reduce_finish:
after multiple reduction operations on a flexible dimension matrix,
there is often subsequent operations on the output matrices to
finish the operation.
cmat_reduce_2x2:
this takes 4 input matrices, and 1 dst to do a 2x2 reduction op.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
With coopmat2 a bunch of functions need a lot of lowering passes
to happen before they can be lowered, so mark them as to be lowered
later.
Drivers needing these should call the nir_remove_non_cmat_call_entrypoints
where they remove entrypoints now, and call the original nir_remove_non_entrypoints
after lowering coopmat2.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
This adds a new instruction type to handle cooperative matrix calls.
This clones the call instr, drops callee, and adds a single metadata
slot and a call operation (dummy only for now).
(Not NACKed by Alyssa)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
If the shader has flat varyings but API level flatshade is disabled, we
need to switch to flat shading in PA.CONFIG to ensure proper interpolation.
Passes dEQP-GLES3.functional.rasterization.flatshading.* on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Tested-by: Daniel Lang <dalang@gmx.at>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36404>
Build utilities to retrieve the build id are now exposed
through the new HAVE_BUILD_ID instead of HAVE_DL_ITERATE_PHDR
since this will allow adding support for platforms that do
not support HAVE_DL_ITERATE_PHDR
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38420>
MESA_KK_DISABLE_WORKAROUNDS provides a way to disable workarounds
we've had to apply to get Vulkan conformance. In hopes that Metal
bugs get fixed in upcoming macOS releases.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38426>
This extension seems to be supported on GC3000 (HALTI2) and later hardware.
While no explicit feature bit documents this capability, testing
confirms that the required vertex formats work correctly on these GPUs.
This patch adds the missing B10G10R10A2 vertex format variants
(UNORM, SNORM, USCALED, SSCALED), gates support behind the HALTI2
feature check, and updates features.txt to reflect the new capability.
All relevant piglit tests pass.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38446>
The VMA of VkDeviceMemory has to accomodate all the resources that can
be bound to it. For sparse images it's 64KiB alignment, for other
tiled images it's 4KiB. But we also have a workaround that requires a
64KiB alignment for Tile4 images.
The initial version of the slab allocator missed the 4KiB alignment.
This fix adds the workaround handling too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: dabb012423 ("anv: Implement anv_slab_bo and enable memory pool")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38480>
This is useful for running multiple dependent precomp shaders in a row.
We could do this using PANLIB_BARRIER_CSF_WAIT and then immediately wait
on the syncobj, but it's a little silly to do that on the same subqueue.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Acked-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37971>
In some cases, we don't need to signal the syncobj yet because we will
be issuing a second asynchronous instruction afterwards blocking on the
first.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Acked-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37971>
SB_ID(LS) is currently equal to zero, so this is not a behavior change,
but worth setting it explicitly for clarity and in case the sb
assignments change.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 885805560f ("panvk/csf: fix case where vk_meta is used before PROVOKING_VERTEX_MODE_LAST")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38458>
We check fn_set_fbds_provoking_vertex_stride == 0 to determine whether a
previous function variant has already been allocated, so this value must
be initialized to zero before we start the loop. We could fix this by
explicitly initializing just that field, but I figure it's simpler and
safer to just zero-initialize the whole struct.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 885805560f ("panvk/csf: fix case where vk_meta is used before PROVOKING_VERTEX_MODE_LAST")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38458>
This helps for debug when wanting to check which pipeline mode the
driver has selected for a given section of a frame.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38317>
These vertex fetch flushes aren't required in gen9+ because the display
driver will take care of this invalidation on QueueSubmit. So let's remove it.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38317>
send.ugm (1|M0) r125 r0 null:0 0x0 0x0200651F {$9} // wr:1+0, rd:0; fence invalid flush type scoped to tile
When destination of Send(s) is not null, the response length must not be 0.
Should only affect DG2 products.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38478>
To be able to support multiple GPU architectures, we need to thread
carefully with HW defs. So let's limit the availability of the HW defs
to where it's needed. We do this by moving the HW def includes and
helpers to query them to end of the source-files.
In the long run, we probably want something a bit more formal to get
access to HW-dependent values based on the hw-info. But there's some
work in progress to change how that works, so let's kick the can down
the road a bit on that part.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
This makes us either use the rogue-define, or a single pixel. This
feature isn't available in any HW we support yet, but that is going to
change soon.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
This doesn't actually need fixing; there's no Rogue HW with this
feature. Instead, let's start populating this when we fill in new
architectures, which does support this.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
This is the only part of pvr_descriptor_set_create() that actually
depends on architecture specific details (in particular, the
write_sampler calls), so let's factor this out to a separate function.
This is going to be helpful when we're doing multi-arch support.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
Same story as the previous commit; this let's us store an architecture
specific structure inside an architecture agnostic structure.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
The border-table is architecture specific, but it's stored in the device
structure that isn't. We need to store a pointer to it instead, because
the size will vary from architecture to architecture.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
When we do multiarch, we want to be able to refer to the headers
separately from the sources here, so let's split this dependency in two.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
Right now all supported devices are Rogue devices. But in the future,
we're going to support multiple architectures. So let's add a way to
query this at run-time.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
RGX_FEATURE_USC_ITR_PARALLEL_INSTANCES is no longer a derived feature
for Volcanic, it is a core feature. This improvement is related to the
introduction of Volcanic device information.
Signed-off-by: leonperianu <leon.perianu@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
The flag mega_fetch should be set on rv770 for a
read scratch operation (as written in the r700
documentation p357). Without this flag, read scratch
does not work and a gpu hang could be triggered.
Here are the tests fixed:
shaders/glsl-predication-on-large-array: fail pass
spec/glsl-1.10/execution/temp-array-indexing/glsl-fs-giant-temp-array: fail pass
spec/glsl-1.10/execution/temp-array-indexing/glsl-vs-giant-temp-array: fail pass
spec/glsl-1.30/execution/fs-large-local-array: fail pass
spec/glsl-1.30/execution/fs-large-local-array-vec2: fail pass
spec/glsl-1.30/execution/fs-large-local-array-vec3: fail pass
spec/glsl-1.30/execution/fs-large-local-array-vec4: fail pass
spec/glsl-1.30/execution/fs-multiple-large-local-arrays: fail pass
Fixes: 9c48a139b0 ("r600g: Support emitting scratch ops")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38353>
This code was no longer needed after switching to os_read_file, but I
accidentally left it around, whoops!
Fixes: 49183bfb79 ("pan/bi: use os_read_file-helper")
CID: 1665295
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37903>
If it is, util_last_bit() will return zero, and the subtraction that
follows will underflow. Make it obvious that this can't happen, by
adding an assert here.
CID: 1665297
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37903>
Open-coding the size-parsing here is fragile. Let's use a common helper
for this instead.
CID: 1665346
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37903>
The PANFROST_JM_CTX_PRIORITY values aren't bitmasks, but enum values.
But the kernel interface uses the BIT()-macro on them, so we need to do
the same. We don't have the macro, but it's trivial to do this with a
bitshift instead.
Fixes: f04dbf0bc0 ("pan/kmod: query and cache available context priorities from KMD")
CID: 1666511
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37903>
We can already tell if we're writing the first variable by looking at
sig_offset. So let's drop the needless variable here.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37903>
If only invalid surfaces are passed, we end up using an undefined array
as a string. And while this might not be possible, it is hard to reason
about, especially for new readers and tools. So let's initialize the
buffer as an empty string.
CID: 1666581
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37903>
The dummy dest will not be allocated, so we must not
count it.
In the disassambler write PV and PS if the ALU dest GPR is
only used via PS/PV.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37321>
This appears to be stable now, and running on multiple threads fixes the
the timeout problems we were hitting in lavapipe-vkd3d.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38476>
This was copied from radeonsi which expected seq_force_screen_content_tools = 2
and seq_force_integer_mv = 2.
Fixes: 37e71a5cb2 ("radv/video: add support for AV1 encoding")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38371>
Unmap bo in destroy_host_blob when hb->cpu_addr is not NULL.
This avoid memory leak caused by bo refcount is not 0 when
amdvgpu_bo_free is called.
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38440>
This is helpful to tell which path is taken:
1. explicit modifier: legacy_scanout=0, prime_blit=0
2. prime blit: legacy_scanout=0, prime_blit=1
3. legacy scanout: legacy_scanout=1, prime_blit=0
To be noted, venus doesn't advertise legacy scanout support, but we
implicitly support it for gamescope compatibility.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38457>
Adreno 200 is an old GPU implementing GL ES 2.0. Add nightly jobs to
test for regressions on this hardware. It is currently limited to GL CTS
tests, because Piglit gives hard time, mostly crashing the GPU.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38383>
Also add references to their conterparts in old PAL code.
This makes it easier to remember whether we mitigated the
same issues as PAL did.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38304>
To improve consistency between the two drivers.
This excludes Hawaii from the workaround on RADV.
Also add the same to ac_null_device_create().
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38304>
For consistency with RADV.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38304>
Will be necessary for the subsequent commit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38304>
Disable sparse mappings on GFX7-8 due to GPU hangs in the VK CTS,
except Polaris where it happens to work "well enough" to pass
the VK CTS and run some games already.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38304>
Also disable the sparse binding queue and other related features.
Using sparse on GFX6-8 can cause GPU hangs at the moment.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38304>
In Panfrost compute shader, store_scratch instructions with 64-bit const
address are generated by `nir_lower_vars_to_scratch`. This address didn't
pass `bi_emit_cached_split` before `bi_emit_store` and `bi_emit_load`,
which assumes the low and high parts of the 64-bit address are split.
A "missing bi_cache_collect()" abortion is then triggered at `bi_extract`.
This bug fix checks and splits 64-bit addresses in the `bi_emit_store` and
`bi_emit_load` instructions. Test on RK3588 with Mali-G610.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36969>
Enable official Khronos OpenGL ES 2.0 conformance test suite (glcts)
for all etnaviv GPU variants in CI. This runs the gles2-khr-main
mustpass list to verify specification compliance.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38456>
CI-tron currently requires a number of minutes to also be set, so let's
just pick some huge value that will never actually trigger, leaving the
OVERALL timeout as the only one that applies.
Fixes: ee5a95319d ("broadcom/ci: automatically reboot rpi3 when they fail to find the root device")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38461>
This calls nir_separate_merged_clip_cull_io in zink, which is better
than having to handle separate clip & cull arrays in all passes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38452>
Only needed by zink. This clip/cull distance separation pass is needed
to remove nir_io_separate_clip_cull_distance_arrays, so that all shared
GLSL code only uses merged clip+cull distance outputs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38452>
Use mesa_logi_v(..) in sparse_debug(..).
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38190>
This aborts if a pass would make any progress. It can be used to assert that:
- our minimalist pass invocation loops in drivers are sufficient and don't
leave any unoptimized code in the shader
- our lowering is sufficient and other passes don't add instructions that
would cause lowering having to be repeated
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38406>
Microsoft required the ability to preserve fp32 denorms via a shader
flag in shader model 6.2, but Adreno does not support this. Instead
Qualcomm's DX12 driver uses soft floats. Implement something similar to
expose the equivalent Vulkan feature for vkd3d-proton. In practice no
apps should actually use this but it lets us go from SM6.0 to SM6.6 with
vkd3d-proton.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37608>
Based on existing softfloat64 support and Berkeley SoftFloat. This is
targeted at drivers that can't preserve denorms, so operations where
denorm support is irrelevant like conversions to/from integers aren't
handled.
Because the existing mechanism used by Gallium for softfloat64 doesn't
support includes, we unfortunately can't extract common code into a
header. This can be done later if we switch Gallium to using glslang and
spirv-to-nir.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37608>
We have to invert the condition to select the non-NaN source, instead of
just swapping a and b. The equivalent float32 function was failing
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.input_args.denorm_nmax_nan_preserve_frag
without this fix.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37608>
When possible avoid making a vector if we can use a swizzle. The
swizzle will be lower by `bi_lower_swizzle` if not supported by the
instruction.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36637>
Support for this capability in llvmpipe expose
support for GL_EXT_depth_bounds_test, as well as supporting
the `depthBounds` device feature in lavapipe.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36487>
We need to make sure the data part returned by sampler messages is
always aligned to a physical register. Just like the residency data
lives in a single physical register after the data.
Lowering a vec3 16bits per components led to a half a physical
register allocation which then confused the descriptor lowering
(expecting physical register units).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 295734bf88 ("intel/fs: fix residency handling on Xe2")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12794
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34008>
For weird reasons, on SDMA4-5 color<->stencil only copies don't work
correctly. I compared NAVI21 (SDMA 5) vs NAVI31 (SDMA 6), everything
is bits-to-bits exact but the same test doesn't pass on NAVI21. So,
it's potentially a hardware bug on SDMA < 6.
Fixes dEQP-VK.api.ds_color_copy.*_tq on GFX9-GFX10.3.
Fixes: 0034f5a948 ("radv: allow ds<->color copies on compute/transfer queues")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38377>
To avoid incompatibility between the compiler implementations used by
the driver and the renderer, seq_cst ordering is picked here, which has
required a full mfence instruction. Then the renderer side acquire is
ensured to be ordered after the cache flush of ring cs updates.
Perf wise, there's no regression in headless vkmark runs. In theory,
the overhead introduced here weighs trivially as compared to the ring
cs encode/decode part. So we should go for better robustness.
Test: venus on windows guest works with renderer on Linux
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14277
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38435>
This is unused at the moment but the backend incorrectly assumes
immediate handles are for the binding table (therefore not bindless).
Some new CTS tests are using an immediate bindless handle which is
broken.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38359>
With VK_EXT_unused_attachments, we may have a case where the FS writes
to attachments 0 and 1, both have valid formats and are enabled, yet the
renderpass only has 1 color attachment. In this case we would set
RB_PS_MRT_CNTL to 2, but since we never emitted RB_MRT_BUF_INFO[1] and
so on, we would get garbage attachment info from the last render pass
and end up writing to an attachment that doesn't exist.
Fix this by disabling attachments that are unused. We can't move setting
RB_PS_MRT_CNTL to emitting when we emit color RT state, because then we
have the inverse problem of a FS that writes to attachments 0 and 1, a
renderpass that has 2 attachments, but a blend state that only includes
1 attachment (and therefore disables color writes for attachment 1). At
least one side (blending or RT emission) has to assume that the other
side may have more RTs enabled and disable the rest of the RTs up to
MAX_RTS.
Fixes: c2eb768eb2 ("tu: Expose VK_EXT_dynamic_rendering_unused_attachments")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38250>
Either we need to save this pointer or toss it.
==146166==ERROR: AddressSanitizer: heap-use-after-free on address 0x7bfe77013920 at pc 0x7b9e6fd5b978 bp 0x7ffc30ef18e0 sp 0x7ffc30ef18d8
READ of size 4 at 0x7bfe77013920 thread T0
#0 0x7b9e6fd5b977 in get_header ../src/util/ralloc.c:83
#1 0x7b9e6fd5b977 in ralloc_parent ../src/util/ralloc.c:382
#2 0x7b9e6fd5b977 in reralloc_size ../src/util/ralloc.c:198
#3 0x7b9e6fd5b977 in reralloc_array_size ../src/util/ralloc.c:241
#4 0x7b9e705f83c2 in range_minimum_query_table_resize ../src/util/range_minimum_query.c:21
#5 0x7b9e7018af1d in realloc_info ../src/compiler/nir/nir_dominance_lca.c:33
#6 0x7b9e7018af1d in nir_calc_dominance_lca_impl ../src/compiler/nir/nir_dominance_lca.c:126
#7 0x7b9e6ff9815c in nir_metadata_require ../src/compiler/nir/nir_metadata.c:42
#8 0x7b9e6ff998e4 in nir_metadata_require_most ../src/compiler/nir/nir_metadata.c:200
#9 0x7b9e6f8aab4d in st_finalize_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:735
#10 0x7b9e6f0afb14 in st_create_common_variant ../src/mesa/state_tracker/st_program.c:858
#11 0x7b9e6f0be2d3 in st_get_common_variant ../src/mesa/state_tracker/st_program.c:973
#12 0x7b9e6f0bf9cf in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:1478
#13 0x7b9e6f0bf9cf in st_finalize_program ../src/mesa/state_tracker/st_program.c:1596
#14 0x7b9e6f8b0127 in st_link_glsl_to_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:633
#15 0x7b9e6f8b3611 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_nir.cpp:816
#16 0x7b9e6f7bcf51 in link_program ../src/mesa/main/shaderapi.c:1412
#17 0x7b9e6f7bcf51 in link_program_error ../src/mesa/main/shaderapi.c:1474
#18 0x0000004020b0 in main._omp_fn.0 /home/alyssa/shader-db/run.c:872
#19 0x7f9e7893dd65 in GOMP_parallel (/lib64/libgomp.so.1+0xdd65) (BuildId: 9cc501fdca53b5d4ab094f709486781c98573bc9)
#20 0x000000400d6a in main /home/alyssa/shader-db/run.c:689
#21 0x7f9e78011574 in __libc_start_call_main (/lib64/libc.so.6+0x3574) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
#22 0x7f9e78011627 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3627) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
#23 0x000000401014 in _start (/home/alyssa/shader-db/run+0x401014) (BuildId: a83b8d830cc265be3f54ea3e7a21a0fb5156624b)
0x7bfe77013920 is located 0 bytes inside of 64-byte region [0x7bfe77013920,0x7bfe77013960)
freed by thread T0 here:
#0 0x7f9e782e5beb in free.part.0 (/usr/lib64/libasan.so.8+0xe5beb) (BuildId: cab80046dbc1c97c6e14490acc37d079701f8d9a)
#1 0x7b9e6fd5bc39 in unsafe_free ../src/util/ralloc.c:319
#2 0x7b9e6fd5bc39 in ralloc_free ../src/util/ralloc.c:264
#3 0x7b9e70063d81 in nir_sweep ../src/compiler/nir/nir_sweep.c:219
#4 0x7b9e6f0bf499 in st_finalize_program ../src/mesa/state_tracker/st_program.c:1585
#5 0x7b9e6f8b0127 in st_link_glsl_to_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:633
#6 0x7b9e6f8b3611 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_nir.cpp:816
#7 0x7b9e6f7bcf51 in link_program ../src/mesa/main/shaderapi.c:1412
#8 0x7b9e6f7bcf51 in link_program_error ../src/mesa/main/shaderapi.c:1474
#9 0x0000004020b0 in main._omp_fn.0 /home/alyssa/shader-db/run.c:872
previously allocated by thread T0 here:
#0 0x7f9e782e5e4b in realloc.part.0 (/usr/lib64/libasan.so.8+0xe5e4b) (BuildId: cab80046dbc1c97c6e14490acc37d079701f8d9a)
#1 0x7b9e6fd5a883 in resize ../src/util/ralloc.c:167
#2 0x7b9e705f83c2 in range_minimum_query_table_resize ../src/util/range_minimum_query.c:21
#3 0x7b9e7018af1d in realloc_info ../src/compiler/nir/nir_dominance_lca.c:33
#4 0x7b9e7018af1d in nir_calc_dominance_lca_impl ../src/compiler/nir/nir_dominance_lca.c:126
#5 0x7b9e6ff9815c in nir_metadata_require ../src/compiler/nir/nir_metadata.c:42
#6 0x7b9e6ff998e4 in nir_metadata_require_most ../src/compiler/nir/nir_metadata.c:200
#7 0x7b9e6f8b0ede in st_link_glsl_to_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:550
#8 0x7b9e6f8b3611 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_nir.cpp:816
#9 0x7b9e6f7bcf51 in link_program ../src/mesa/main/shaderapi.c:1412
#10 0x7b9e6f7bcf51 in link_program_error ../src/mesa/main/shaderapi.c:1474
#11 0x0000004020b0 in main._omp_fn.0 /home/alyssa/shader-db/run.c:872
Fixes: 17876a00af ("nir: Add a faster lowest common ancestor algorithm")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38412>
At present, this is the value mandated by the KMD's uAPI, or 4096 bytes.
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38027>
Commit a4ffd2395f ("mesa: Implement label sharing from GL objects with
UM drivers") enabled GL clients to tag objects at a UM driver level. In
the case of Panfrost, and for both KMDs, maximum label size is set to
4096, but the Mesa limit is much lower.
Since glObjectLabel() allocates object labels dynamically, there's no
need to have this value chiseled in stone, so allow Gallium driver
implementers to set their own limit through a pipe screen capability.
Keep the same default maximum label length as before.
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38027>
Add some code so that people don't get confused when debugging a
fault and the fault doesn't appear since they forgot to update the
generate-rd.
This should also help catch any build issues earlier on.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38081>
This was causing build issues when using the generate-rd.cc file,
due to redeclaration caused by the mixing of the generated header
file and the custom definitions.
Also adding some missing dependencies now introduced due to the
header file include.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38081>
VCN requires the luma/chroma VAs to be 256 aligned. On VCN5, the
collocated buffer was not 256 aligned which can cause these VAs to be
unaligned.
This fixes VVL PositiveVideoEncodeH264.Basic on VCN5.
Fixes: 37e71a5cb2 ("radv/video: add support for AV1 encoding")
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38408>
Netcat locks on to the first connection so if one tried to use
breadcrumbs again Netcat will appear as if it didn't receive
anything. Use `-k` so that it accepts another connection.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38419>
These test that nothing crashes for any possible input. With print=true,
it can also be used to compare the behaviour of two different
ac_nir_lower_mem_access_bit_sizes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37995>
CTA-861-G section 6.9.1 Static Metadata Type 1 declares that zero values
for different groups of HDR Metadata properties are allowed, including
zero nits values for max display mastering luminance, max content light
level, max frame-average light level and min display mastering luminance.
A zero value is meant to be treated by the video sink as "undefined" /
"unknown", and handled accordingly. This is common for dynamically
generated visual content.
The is_hdr_metadata_legal() function in the Vulkan/WSI/Wayland HDR backend
currently declares HDR light level metadata as invalid if the mastering
display min_luminance and max_luminance light levels are set to the legal
level of zero nits. This causes valid HDR metadata as set by the client
via vkSetHdrMetadata() to be not sent to the compositor.
Fix this by skipping checks that don't apply if min_luminance or
max_luminance are zero. If max_luminance is zero then we skip sending
of mastering display min/max luminance to Wayland, as sending a a
max_luminance <= min_luminance would trigger a protocol error. All
other valid data is still send, ie. color primaries, white-point,
content light levels.
Fixes: cb7726bb2c ("vulkan/wsi: validate HDR metadata to not cause protocol errors")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Co-authored-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Xaver Hugl <xaver.hugl@kde.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38326>
Setting up a new `iL` `lua_State` in which `r` is injected as
the equivalent to `rnn.init(<gpu>)` and a `priv` library allows
accessing extra functionality (not available to user provided
scripts) currently just allowing writing to registers.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38001>
Allow the driver to work without a display (card) node by removing
strict display controller checks.
Signed-off-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Tested-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38082>
../src/freedreno/afuc/emu-ui.c:64:11: warning: ‘p’ may be used uninitialized [-Wmaybe-uninitialized]
64 | while (*p && isspace(*p))
| ^~
../src/freedreno/afuc/emu-ui.c: In function ‘emu_packet_prompt’:
../src/freedreno/afuc/emu-ui.c:459:10: note: ‘p’ was declared here
459 | char *p;
| ^
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38104>
Addresses:
../src/freedreno/ir3/ir3_shader.h: In function 'void ir3_link_add(ir3_shader_linkage*, uint8_t, uint8_t, uint8_t, uint8_t)':
../src/freedreno/ir3/ir3_shader.h:1326:16: error: comparison of integer expressions of different signedness: 'int' and 'long unsigned int' [-Werror=sign-compare]
1326 | assert(i < ARRAY_SIZE(l->var));
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38104>
Also ensure the printfs are read even if the device is lost or ran
into a fault.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38358>
To help figure out whether a CCS related corruption is tied to
modifier setup or internal driver state tracking.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38382>
Fixes deathloop/01f8d58bf245663b with gfx1201.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 668259ef0b ("aco/scheduler: move clauses through RAR dependencies")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38402>
Some fragment shader may be per-primitive when mesh pipeline,
per-vertex when vertex pipeline. We sort these inputs always
after other per-vertex inputs in nir_recompute_io_bases, so
fragment shader code is same, just need to set different reg.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38044>
The #undef ALIGN is also not needed anymore, remove it.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38365>
This is done by grep ALIGN( to align(
docs,*.xml,blake3 is excluded
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38365>
As now all ALIGN usage is on 32bit integer
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38365>
When the input is size_t, cast the input to uint32_t, as the output is expect uint32_t not size_t
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38365>
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:
* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*
Plus nir_def_instr() where there's no more suitable helper.
Also an existing helper is renamed to unify all the names, while we're
churning the tree:
* nir_src_as_alu_instr -> nir_src_as_alu
..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.
Acked-by: Marek Olšák <maraeo@gmail.com>
---
To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
gcc has a a false positive here, silenced with the pragmas, use separate commit
for easily revert latter once gcc fixed it.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
../src/compiler/nir/nir.h:1152:11: attention: « buf_index » pourrait être utilisé sans être initialisé [-Wmaybe-uninitialized]
1152 | return src;
| ^~~
../src/panfrost/compiler/bifrost_compile.c: Dans la fonction « lower_texel_buffer_fetch »:
../src/panfrost/compiler/bifrost_compile.c:6234:13: note: « buf_index » a été déclaré ici
6234 | nir_def *buf_index;
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
gcc and clang do not have equal set of warnings, so when warning are specific to CLANG or GCC, using PRAGMA_DIAGNOSTIC_IGNORED_CLANG or PRAGMA_DIAGNOSTIC_GCC_IGNORED instead
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
otherwise &container_of(..)->foo won't work, need extra parens. gcc version is
fine.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
This adds from_wsi field to v3dv_image, so we can apply simulator stride
alignment only to WSI images.
Handling VK_STRUCTURE_TYPE_WSI_IMAGE_CREATE_INFO_MESA at
v3dv_GetPhysicalDeviceFormatProperties2 also removes debug warnings like:
MESA: debug: v3dv_GetPhysicalDeviceImageFormatProperties2: ignored VkStructureType Unknown VkStructureType value.(1000001002)
Fixes: 562bb8b62b ("v3dv: align width to 256 when using simulator")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38374>
Even though "shader" uniquely means something here and we don't need to
specify what stage, "cs" is still shorter, obvious, and matches what we
do all over the 3D code so it saves some cognitive load when bouncing
back and forth.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38403>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38403>
Previously the type info for nested values was copied from the source
operand, rather than propagating the new type from the destination
operand.
Fixes: 4c363acf94 ("vtn: Allow for OpCopyLogical with different but compatible types")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38248>
VGT_OUTPRIM_TYPE should be programmed correctly when PointMode is only
set in TCS with ESO.
Fixes dEQP-VK.shader_object.tessellation.hlsl.point_mode.
Fixes: c6d9b9b4e0 ("radv: support more tessellation parameters with TCS for ESO unlinked shaders"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38376>
Non-fast-clear path to clear LRZ is rather slow, plus without LRZ
fast-clear we cannot enable concurrent binning.
VK_EXT_fragment_density_map_offset makes us add the maximum possible
tile size to the depth image size, because we don't know the tile
size that will be selected later on for a framebuffer. In practice,
this caused LRZ fast-clear to be disabled in many cases due to
tile_max_w/tile_max_h being rather large.
Now, instead of the most pessimitic case we can do the following:
- Calculate the biggest possible tile size that could be added to
the depth image that won't disable LRZ fast-clear.
- When calculating tiling config use the info about maximum tile
size from the images, or in case of image-less framebuffer
recalculate maximum possible tile size and it.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38218>
this code is invalid after the refcounting rework
Fixes: b3133e250e - gallium: add pipe_context::resource_release to eliminate buffer refcounting
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38329>
This is what happens when you leave MR unreviewed for months.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d39e443ef8 ("anv: add infrastructure for common vk_pipeline")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38400>
These should be fixed now.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38364>
For GS streamout, we need the following LDS scratch space:
- Repacking streamout vertices takes 1 dword per 4 waves per stream
(max 16 bytes for Wave64, max 32 bytes for Wave32)
- 1 dword per stream for buffer info
(16 bytes)
- 1 dword per buffer for buffer info
(16 bytes)
Previously, the space used for buffer info aliased with the
space for repacking the output vertices in ngg_gs_finale(),
and there was no barrier in between, which caused a race
condition, resulting in random failure.
Fix this by allocating a few more LDS dwords so that aliasing
is not required, which also allows us to remove an extra
workgroup barrier.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12705
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38364>
Blob generates this with the glmark2:texture benchmark on STM32MP257.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38363>
LAVA jobs already have a global 1h timeout in GitLab. This exists because
GitLab jobs must start before we can determine whether a device is
available for testing.
Jobs themselves do not normally run that long, most of the delay comes
from waiting in the LAVA queue.
Dropping these overrides for pre-merge jobs fixes cases where the LAVA
job isn't picked up in time.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38395>
This is used by Steam Link VR (driver_vrlink) to avoid doing YUV conversion itself.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37500>
1. The prolog needs to have a null check. Libraries don't have prologs.
2. We only need to print the shaders actually included in this pipeline.
Libraries were already printed separately.
3. The traversal shader was wrongly omitted from the output.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38355>
Add the string length parameter to the set_name(),
set_value() function to remove the conversion from
char* to std::string which takes extra work like
calling strlen() to compute the string length.
From the callback sampling in the perfetto tracing,
the ratio of trace_payload_as_extra_intel_end_draw_indexed
to intel_ds_end_draw_indexed drops from 63.80% to 59.65%
with this change.
v2: Add the data of the callback sampling to the description.
Signed-off-by: Andy Hsu <hwandy@google.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38073>
Adds build instructions and workarounds documentation.
Workarounds documentation only has the biggest offenders and
there are probably way more in code that need yet to be
documented.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38232>
This is more robust than smashing the variable to mediump and then
asking for mediump to be lowered later. It's also faster because it
only involves one compiler pass, not two.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38379>
On Mali, we need not only clamp but also convert to float16 on Valhall+.
We could have a separate pass for this but it fits in nicely with the
rest of nir_lower_point_size() so we might as well put it there.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38379>
Per ARB_vertex_program spec result registers are 4-component and initially
undefined, and the FF fragment program expects its intputs to be
4-component too. So, if the client's vertex program does not write the
whole vector it will cause misrenderings unless the same client also
supplies fragment program that expects less than 4 componens.
This commit adds a workaround that initializes results to vec4(0, 0, 0, 1)
which seems to be an expected behavior for such clients.
Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38295>
NIR is actually pretty good at optimizing UBO, SSBO, and shared memory
access but in order to do so, we actually have to run the optimizations
before we lower it all. Same for I/O. By doing all our lowering in
panvk before we ever run the optimization loop, we risk hampering it
significantly.
Ignoring loop changes (several get unrolled now), fossil-db on Sascha
Willems demos and a few others looks lik
Instrs: 189054 -> 187802 (-0.66%); split: -0.67%, +0.01%
CodeSize: 1756160 -> 1747072 (-0.52%); split: -0.52%, +0.01%
Estimated normalized CVT cycles: 771.367106999997 -> 766.0311719999971 (-0.69%); split: -1.05%, +0.36%
Estimated normalized SFU cycles: 1407.21875 -> 1406.9375 (-0.02%); split: -0.03%, +0.01%
Estimated normalized Load/Store cycles: 17477.0 -> 16917.0 (-3.20%)
Maximum number of threads: 1257 -> 1213 (-3.50%); split: +0.08%, -3.58%
Number of hardware loops: 283 -> 278 (-1.77%)
Totals from 186 (19.81% of 939) affected shaders:
Instrs: 102588 -> 101336 (-1.22%); split: -1.23%, +0.01%
CodeSize: 834432 -> 825344 (-1.09%); split: -1.10%, +0.02%
Estimated normalized CVT cycles: 463.226562 -> 457.890627 (-1.15%); split: -1.74%, +0.59%
Estimated normalized SFU cycles: 1021.84375 -> 1021.5625 (-0.03%); split: -0.05%, +0.02%
Estimated normalized Load/Store cycles: 8425.0 -> 7865.0 (-6.65%)
Maximum number of threads: 334 -> 290 (-13.17%); split: +0.30%, -13.47%
Number of hardware loops: 63 -> 58 (-7.94%)
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38334>
We need to lower outputs to get rid of output reads and so that we can
fix up layer writes on Bifrost. However, there's really no point in
lowering reads besides moving them to the top. Even then, NIR can
probably copy propagate the copies and we'll end up reading straight
from the input variable anyway.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38334>
Neither nir_lower_io() nor nir_lower_indirect_derefs() know what to do
with copy_deref so we need to get rid of those first. Also, there are
some NIR passes which can insert more copy_deref or propagate an
indirect load to the I/O variable so we want to lower those away right
before lowering I/O.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38334>
These two passes are a prerequisite for basically anything that
optimizes on variables.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38334>
GTK is missing a semaphore between QueueSubmit() and QueuePresent()
causing the WSI submit to be "unordered" and to immediately signal the
semaphores (because it's missing a wait semaphore in QueuePresent()).
The workaround is to disable unordered WSI submits until GTK fixes it
properly.
Cc: "25.3"
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14087
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38351>
Xe2 uses byte offsets rather than OWord offsets. We've been storing the
per-slot offsets in bytes on Xe2 for a while, but kept the global offset
immediate in OWords for some reason, choosing to lower it during logical
send lowering.
This patch makes both offsets (global immediate, per-slot) in the same
units, so they could be added together if necessary without scaling.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
Much clearer, especially since we're dealing with at least four
different kinds of intrinsics. These helpers were introduced years ago,
but probably didn't exist when we first wrote this code.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
I noticed that our backend was completely ignoring writemasks, despite
them appearing on many of the intrinsics we're implementing.
Rhys Perry pointed out that nir_lower_mem_access_bitsizes is removing
all non-trivial writemasking today, so ssbo/global/shared/scratch/etc.
stores should only ever see all components enabled. Which means what
we're doing is legitimate, if non-obvious. Add an assert to make it
obvious.
Thanks a lot to Rhys for helping me rediscover what made this work.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
The backend has been fully ignoring all writemasks for a long time,
so it really doesn't make sense to have them on our custom intrinsics.
I'm not sure they even make sense for some of the block intrinsics.
Also, the store_ssbo -> store_ssbo_intel pass was not setting writemask
at all, leaving it at the default value of 0 (aka write nothing, if it
had been respected...)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
Just compare against the size that was declared.
This is probably overkill. I couldn't figure out what vulkan says wrt
OOB access of shared memory. D3D however (which is very strict about
these things) says that for TGSM writes the entire contents of the TGSM
becomes undefined, for reads the result is undefined. Hence, rather
than masking out such accesses, to avoid the segfaults it would be
enough to just clamp the offsets to valid values.
nir doesn't seem easily able to tell us if an access is guaranteed
in-bound (unlike for ssbo access), so assume always potentially OOB.
v2: fix rusticl - for cl we don't know the shared size at compilation
time, this is only provided at launch_grid() time, the nir shader info
shared_size might be zero. Hence pass through the size via cs jit
context, there already actually was a member in there which looks
like it was intended for that (interestingly enough, the cs jit context
was actually unused, since resources are passed elsewhere nowadays).
Reviewed-by: Brian Paul <brian.paul@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38307>
Use tristate for the aligned setting, otherwise it is always
first disabled which contributes to the condition if we set the
new stride active.
v2: set ByteStride in dword units and take secondary cmdbuf
in to account (Lionel)
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38349>
More fallout from strict NIR validation but easy to fix. I hit this when
attempting to CTS changes for parent_instr.
Closes: #14245
Fixes: 2f6b4803ab ("nir/validate: expand IO intrinsic validation with nir_io_semantics")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38356>
Release builds are noisy about flush_type and scope being used
uninitialized, even though they are always set.
Initialize them to the final else values to make GCC happy.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38357>
In ray tracing dispatch, we have dispatch.threads set to 0 since we
calculate the local_size_x/y/z based on the launch sizes.
This change takes 0 threads into an account and returh the TG size 8 in
such scenarios. Before this change, we were setting TG size to 2.
Fixes: 0c4e1c9efc ("intel/common: Add helper for compute thread group dispatch size")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38229>
In some cases propagating through a bcsel may be harmful. If the bcsel
uses are unlikely to be eliminated in both branch of an if statement,
propagating through it may result in extra moves for phi instructions
and extended live ranges.
v2: Fix missing parameter in call. Noticed by Rhys. I fixed this on the
test machine, but I must have forgotten to propagate the change back to
my dev machine.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38321>
Rusticl reports `CL_DEVICE_VENDOR_ID` using the `vendor_id` property
defined in Panfrost. The value is not set so a `0` is reported
instead.
Initialise the value to `0x13B5`, which is Arm's PCI vendor ID.
Add the definition in `lib/pan_props.h` so it can be shared with
Gallium Lima, Panfrost and PanVK.
Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38283>
With the current stack configuration the rv770 seems to be unable
to go beyond three with the "vs-output-array-float-index-wr-before-gs.shader_test"
test. Anyway, the value four seems to be sufficient for the other tests.
This issue was triggered on rv770, for instance, with:
"piglit/bin/shader_runner tests/spec/glsl-1.50/execution/variable-indexing/gs-output-array-float-index-wr.shader_test -auto -fbo"
"piglit/bin/shader_runner tests/spec/glsl-1.50/execution/variable-indexing/vs-output-array-float-index-wr-before-gs.shader_test -auto -fbo"
Fixes: 713edb5998 ("r600/sfn: handle the IF predicate in the scheduler")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38213>
Drivers already have to track this workaround, so remove the logic
from Blorp and let the driver manage this.
Also in Anv don't accumulate this workaround, emit it directly in
place right after COMPUTE_WALKER. Accumulating can be problematic when
you want to dispatch concurrent compute shaders that do not need any
cache flush interaction (typical example with the internal
simple_shader framework).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3e0ad0176b ("anv: Emit state cache invalidation after every compute dispatch")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38306>
The Vivante GPUs have a hardware bug where trilinear filtering
(MIP=LINEAR) produces incorrect results when used with depth/stencil
textures that have shadow comparison enabled, leading to GPU hangs.
Work around this by forcing MIP=NEAREST for depth/stencil formats,
downgrading from trilinear to bilinear filtering as done by binary blob
too.
Fixes dEQP-GLES3.functional.texture.shadow.*.linear_mipmap_linear.*
except DEPTH32F ones on all GPUs I have access to.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38308>
../src/gfxstream/guest/platform/kumquat/vulkan-mapper/GfxStreamVulkanMapper.cpp: In static member function ‘static GfxStreamVulkanMapper* GfxStreamVulkanMapper::getInstance(std::optional<DeviceId>)’:
../src/gfxstream/guest/platform/kumquat/vulkan-mapper/GfxStreamVulkanMapper.cpp:208:30: error: ‘os_get_option’ was not declared in this scope
208 | const char* driver = os_get_option(VK_ICD_FILENAMES);
|
Fixes: 222b85328e ("mesa: replace most occurrences of getenv() with os_get_option()")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38331>
This description was incorrect in that it impiled we supported Hopper
and Blackwell A, which is not currently the case (see nvk_is_conformant
in nvk_physical_device.c).
Fixes: edd0cb6d56 ("docs/nvk: Update hardware support")
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38320>
CmdWriteAccelerationStructuresPropertiesKHR writes the data with MI
commands, we no longer dispatch shaders to write the properties.
As a result, we don't need to flush untyped cache.
Fixes: f0e18c475b ("intel: remove GRL/intel-clc")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38291>
You can have a group with 0 shaders in it. See also febe90e109
("vulkan: remove incorrect assert"). Fixes assertion failure while
compiling fossils/q2rtx/q2rtx-rt-pipeline.976f4ab1c0fee975.1.foz on
Intel platforms.
Fixes: e05a9b77b6 ("vulkan/runtime: split rt shaders hashing from compile")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38318>
This fixes the VVL PositiveVideoDecodeAV1.* tests, which trigger error
concealment. These DPB addresses would not be normally used, but get
used by the error concealment path.
Fixes: d103b76ad6 ("radv/video: add VK_KHR_video_decode_av1 support.")
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38311>
Texel buffers are currently described by a TextureDescriptor,which leads
to restrictive limits on size and alignment.
These limits can be avoided by using a BufferDescriptor instead.
This requires first embedding a ConversionDescriptor into some of the
currently empty space of the BufferDescriptor, and modifying the
compiler so that instead of outputting TEX_FETCH, it will:
1. Load the ConversionDescriptor with LD_PKA
2. Get the buffer address with LEA_BUF[_IMM]
3. Use LD_CVT to get the value
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
Expands the format table with a dedicated bit for texel buffer use. We
can fit this by setting the size of the hw-field to 21, which is fine as
we never encode more than 21 bits (see MALI_PACK_FMT).
This bit is set for all formats that support PAN_BIND_SAMPLER_VIEW and
PAN_BIND_STORAGE_IMAGE.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
Adds a pass that lowers texel buffer accesses for textures/images to use
BufferDescriptors. This needs to be done late in case the resource
indices must be lowered first.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
Add a field in BufferDescriptor to hold a ConversionDescriptor to
prepare for changing texel buffers to use BufferDescriptor instead of
TextureDescriptor
Also re-orders the descriptor based on word offset where appropriate.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
The Register Format-field in ConversionDescriptor is not used since v9
and should be left as zero.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
Adds LD_CVT instruction for loading memory with conversion.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
Gives the src for LEA_BUF_IMM a more descriptive name and specifies the
size of the register.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
This aligns with internal naming and removes confusion with
LEA_BUF[_IMM].
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
Add resource table and index check to instruction equality function.
This prevents CSE from mistakenly eliminating LEA_BUF_IMM instructions
that load from different resources, but with the same buffer offset.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
The RGBA4/BGRA4 formats had the PAN_BIND_STORAGE_IMAGE set, but we
cannot support that.
Fixes: d95423686f ("pan/format: Add PAN_BIND_STORAGE_IMAGE flag")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
This was mapped to RG16F, while R16F should be correct.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
The base address used for bounds checking the entry was wrong. Directly
pass the end_of_entry address instead.
Fixes: db4bcd48d7 ("panvk: Fix IUB decode")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37007>
The size and stage parameters are left-overs from history. Originally,
the function acted on a list and so it needed an explicit stage and size
output. Now that it takes a NIR shader and a mode, we can just take the
stage from the shader and set num_(in|out)puts.
The one caller that actually used the explicit output parameter was
turnip. However, given that the helper sorts and re-numbers all the I/O
variables, it's not like changing num_(in|out)puts instead of writing it
to some other location is that big of a deal.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38297>
Add a new MESA_LOG_PREFIX environment variable that allows fine-grained
control over log prefixes in file logger output on Linux. The variable
accepts a comma-separated list of options:
- "tag": include the tag prefix (e.g., "MESA:")
- "level": include the level prefix (e.g., "info:")
By default, both tag and level are included. Users can customize the
prefix by setting MESA_LOG_PREFIX to any combination (e.g., "tag",
"level", "tag,level", or empty string for no prefix), making the output
more flexible and readable for different use cases.
Other loggers (syslog, Android logcat, Windows debugger) are unaffected
and continue to include tag and level information as appropriate for
their format.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38217>
Refactor the intermediate buffer copy path to use a generic callback
approach, making the code more maintainable and easier to extend with
new format conversions.
The core copy_intermediate() function is now format-agnostic, accepting
a conversion callback that handles the actual data transformation. This
moves format-specific logic (RGB<->RGBA conversion and ASTC
decompression) into dedicated callback functions, making the conversion
path explicit at each call site rather than hidden inside the copy
function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37691>
Instead of allocating a buffer for the entire RGB->RGBA conversion
process. Just allocate a smaller buffer that is the size of a tile and
do the conversion one tile at a time.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37691>
Move ASTC decompression code from mesa/main to src/util to make it
available for use by Vulkan drivers. This allows the Intel ANV driver
to use CPU-based ASTC decompression for host image copy operations
on hardware that doesn't natively support ASTC formats.
The _mesa_unpack_astc_2d_ldr() function signature is updated to use
pipe_format instead of mesa_format for better integration with the
util format system.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37691>
The needs_temp_copy() function was incorrectly identifying some
depth/stencil formats as needing RGB<->RGBA conversion.
VK_FORMAT_D32_SFLOAT_S8_UINT maps to PIPE_FORMAT_Z32_FLOAT_S8X24_UINT,
which has 3 channels (F32 depth, UP8 stencil, X24 padding). The
component count check (== 3) was matching this as an RGB color format,
causing depth/stencil images to incorrectly use the RGB conversion path.
Add an explicit vk_format_is_depth_or_stencil() check before the
component count test to ensure depth/stencil formats always use the
direct copy path.
Fixes: f97b51186f ("anv: intermediate RGB <-> RGBX copy for HIC")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37691>
At least some drivers need a full modeset to change the Colorspace
property or to en-/disable HDR mode. E.g., at least amdgpu-kms as
tested under Linux 6.8 on Polaris needs it. Otherwise the atomic
commit for disabling HDR in _wsi_display_cleanup_state() will fail,
and the connector stays stuck in HDR mode after vkDestroySwapchainKHR().
Fixes: 1ed78dd7ec ("wsi/display: Clean up DRM hdr/color state on swapchain destruction")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37880>
For a selected non-default imageColorSpace during swapchain creation,
make sure that proper HDR setup also works even if a client app does not
explicitly call vkSetHdrMetadataEXT() in time.
Assign the EDID provided metadata here, so the 1st atomic commit will
set Colorspace and HDR metadata properties on the connector, to make sure
HDR or other wide color gamut modes get enabled.
Without this, the chain->color_outcome_serial would stay at zero and
the properties would not ever get assigned during drm_atomic_commit(),
leaving HDR disabled on the display sink.
Fixes: 13137393f6 ("wsi/display: Expose HDR10 colorspace based on EDID")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37880>
CTA-861-G section 6.9.1 Static Metadata Type 1 declares that zero values
for different groups of HDR Metadata properties are allowed, including
zero nits values for max display mastering luminance, max content light
level, max frame-average light level and min display mastering luminance.
A zero value is meant to be treated by the video sink as "undefined" /
"unknown", and handled accordingly. This is common for dynamically
generated visual content.
Therefore don't assert on some minimum nits level > 0, but only check for
a non-negative level.
Fixes: b4176393a0 ("wsi/display: Implement VK_EXT_hdr_metadata on KHR_display swapchain")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37880>
The quality level may change every frame, but we can only enable
pre-encode on first frame because it changes context buffer layout
and currently we only allow the context buffer to grow (append recon
pics at the end).
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38257>
We've been building the Piglit replayer in the test-vk container/rootfs,
but all trace-replay testing in CI is actually done using the test-gl
rootfs.
Despite the naming, the "gl" and "vk" rootfs variants don't correspond to
the graphics API being tested - just the different sets of tools
bundled.
The required tools for trace replay are already included in the test-gl
rootfs, so there's no need to build or use the test-vk variant for this
purpose.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38282>
fixes O(N^2) memory usage and runtime around liveness/scheduling/spilling/RA,
and proves out the design for the common code sparse bitsets (I did need to make
an adjustment for this - worth the effort).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37908>
Some shaders, especially RTPSO shaders that have parts of the PSO
inlined, can become absolutely huge. Using a sparse bitset avoids
quadratic complexity in memory consumption for the liveness information.
This reduces peak memory usage in worst-case tests (hammering
compilation of many huge RTPSOs on 32 threads concurrently) by ~60%,
from 43GB to 18GB.
CPU time (seconds) differences for a workload with mostly small shaders:
Difference at 95.0% confidence
-5.27 +/- 1.08963
-0.88811% +/- 0.183626%
(Student's t, pooled s = 0.629735)
Peak resident set usage for the mostly-small workload:
Difference at 95.0% confidence
30809 +/- 13394.3
1.59276% +/- 0.69246%
(Student's t, pooled s = 7741.09)
CPU time for the heavy workload did not show any difference.
Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37908>
Useful for potentially huge bitsets that are expected to be mostly
filled with zeroes, reducing memory consumption by assuming bits being
zero by default (without wasting memory to store zeroes).
Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37908>
The code in question multiplies `uint32_t`s together and assigns them to
a `uint64_t`. It seems rather unlikely at there would be an overflow,
but we might as well do the cast.
CID: 1649587
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38289>
Our exposed limits say we shouldn't be able to, but let's add an assert
in case something changes, and to help Coverity out.
CID: 1662103
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37583>
For AHB VkBuffer import, the allocationSize comes from the raw external
AHB props query and it can be larger than the underlying buffer memory
requirement. So we must respect the allocationSize for the actual mem
import to support mapping the whole AHB size, and the dedicated buffer
info has to be stripped to obey the spec.
Test: CtsNativeHardwareTestCases no longer crashes on debug build panvk
Fixes: 66bbd9eec8 ("panvk: implement AHB image deferred init and memory alloc")
Tested-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38274>
I routinely don't update the docs because the build (hawkmoth in
particular) is a pain. The prior instructions almost worked, except that
on Debian you need to use a venv (which is a good idea to do for py3-clang
as well for reproducibility, anyway), and the versions that were listed
didn't actually run any more due to a revoked dependency.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38266>
We want to know whether or not we successfully initialized before
filling out physical device properties.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38284>
CTS regressed due to c2a6fb6419 not considering cmd stages for
signaling and waiting. Before this patch panvk did consider cmd stages,
but not semaphore stage masks.
With this fix we consider both which is what the spec requires
Fixes: c2a6fb6419 ("panvk: cull semaphores in unrelated subqueues")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38138>
Setting MESA_VK_VALIDATE_SHADER_BINARIES will cause the shader code to
round-trip every shader through [de]serialize and only ever use the
deserialized version. This catches bugs where the driver may drop
things in the [de]serialization process. It also deserializes the new
shader again and compares it against the original to ensure that
deserialize -> serialize is idempotent.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36647>
This is just a wrapper around ops->compile() for now but we'll extend it
in the next commit to add some validation.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36647>
To match the VK_MAX_PIPELINE_BINARY_KEY_SIZE_KHR of only 32B.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36647>
We'll want to have only hashes in vk_pipeline_stage so the
vk_pipeline_stage_is_null() helper won't work.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36647>
In order to implement VK_KHR_pipeline_binary, we need to be able to
build hash from pipeline creation structures without looking at the
cache.
The blake3 hash on precomp shaders prevents that as its loading from
cache and potentially apply transformation to NIR.
Let's stick to the hash generated by vk_pipeline_hash_shader_stage(),
it does not look at NIR (except for internal shaders) and already hash
the same information :
* shader code (SPIR-V, identifier, hash)
* robustness state
* specialization constants
* pipeline flags
* entry point name
* subgroup information
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36647>
Following bc64ea2815 ("vulkan: fix shader linking with common
pipelines") we're always linking pre-rasterization shaders together
and the shader hashes are hashed together, so there is no point
hashing :
- a bitfield of active shaders
- merged tesselation information
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36647>
We're doing the same in vk_pipeline_precomp_shader_create().
Also fixes valgrind warning due to uninitialized fields
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36647>
Panvk calls pan_preprocess_nir() from its preprocess hook that it hands
off to the Vulkan pipeline code. That hook gets called before we have
the opportunity to lower geometry shaders. This means that we get our
viewports lowered for the VS and then the geometry shader is trying to
work on lowered viewports, which is wrong. Instead, we want to lower
later and only apply the viewport transform in the shader that runs as
the hardware VS.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>
This is more robust because it ensures that we only ever check the
location on something that we know is an outupt. Also, if it's an
output then we know (thanks, validation!) that it's a variable.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>
While we're here, make the variable handling a little more robust by
checking the deref mode before assuming there's a reachable variable.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>
The comment explicitly calls out pan_nir_lower_store_component(), which
is in a different function call so it's a bit weird to have it in the
caller. Also, we already do this in postprocess() on midgard so it
makes more sense to just move it into bifrost.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>
Every caller of pan_shader_lower_texture() immediatly called
pan_shader_postprocess() and every caller of pan_shader_postprocess()
lowered textures except blend shaders and those don't texture anyway.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38265>
This started failing at some point between a35a12dd38 and 2f7b1e8453,
but the nightly CI was out for a long time in between, so it's hard to
tell what commit it to blame without bisecting. I'm going to leave that
up to someone else for now.
To complicate things further, it seems like this one only fails in the
nightly runs, but not in pre-merge. So we list it as a flake instead of
a consisten failure.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38279>
These are helper instructions for the spill_preserved pass to insert
reloads for registers that are preserved by the ABI, yet
clobbered by the callee shader.
There is one p_reload_preserved instruction at the end of each block.
This allows us to insert reloads early, to alleviate the high latency of
scratch reloads.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37381>
For intra only we need to allocate internal dpb.
This was previously implemented for VP9 decode only, using a buffer.
Change it to image so that we don't have to handle it as special
case everywhere.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38158>
Always copy parameters that are not guarded by a flag, zero init
the structs if not provided by application.
Fixes vk_layer_validation_tests PositiveVideoEncode*.GetEncodedSessionParams
Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38158>
The standard way to query options in mesa is `os_get_option()` which
abstracts platform-specific mechanisms to get config variables.
However in quite a few places `getenv()` is still used and this may
preclude controlling some options on some systems.
For instance it is not generally possible to use `MESA_DEBUG` on
Android.
So replace most `getenv()` occurrences with `os_get_option()` to
support configuration options more consistently across different
platforms.
Do the same with `secure_getenv()` replacing it with
`os_get_option_secure()`.
The bulk of the proposed changes are mechanically performed by the
following script:
-----------------------------------------------------------------------
#!/bin/sh
set -e
replace() {
# Don't replace in some files, for example where `os_get_option` is defined,
# or in external files
EXCLUDE_FILES_PATTERN='(src/util/os_misc.c|src/util/u_debug.h|src/gtest/include/gtest/internal/gtest-port.h)'
# Don't replace some "system" variables
EXCLUDE_VARS_PATTERN='("XDG|"DISPLAY|"HOME|"TMPDIR|"POSIXLY_CORRECT)'
git grep "[=!( ]$1(" -- src/ | cut -d ':' -f 1 | sort | uniq | \
grep -v -E "$EXCLUDE_FILES_PATTERN" | \
while read -r file;
do
# Don't replace usages of XDG_* variables or HOME
sed -E -e "/$EXCLUDE_VARS_PATTERN/!s/([=!\( ])$1\(/\1$2\(/g" -i "$file";
done
}
# Add const to os_get_option results, to avoid warning about discarded qualifier:
# warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
# but also errors in some cases:
# error: invalid conversion from ‘const char*’ to ‘char*’ [-fpermissive]
add_const_results() {
git grep -l -P '(?<!const )char.*os_get_option' | \
while read -r file;
do
sed -e '/^\s*const/! s/\(char.*os_get_option\)/const \1/g' -i "$file"
done
}
replace 'secure_getenv' 'os_get_option_secure'
replace 'getenv' 'os_get_option'
add_const_results
-----------------------------------------------------------------------
After this, the `#include "util/os_misc.h"` is also added in files where
`os_get_option()` was not used before.
And since the replacements from the script above generated some new
`-Wdiscarded-qualifiers` warnings, those have been addressed as well,
generally by declaring `os_get_option()` results as `const char *` and
adjusting some function declarations.
Finally some replacements caused new errors like:
-----------------------------------------------------------------------
../src/gallium/auxiliary/gallivm/lp_bld_misc.cpp:127:31: error: no matching function for call to 'strtok'
127 | for (n = 0, option = strtok(env_llc_options, " "); option; n++, option = strtok(NULL, " ")) {
| ^~~~~~
/android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/../sysroot/usr/include/string.h:124:17: note: candidate function not viable: 1st argument ('const char *') would lose const qualifier
124 | char* _Nullable strtok(char* _Nullable __s, const char* _Nonnull __delimiter);
| ^ ~~~~~~~~~~~~~~~~~~~
-----------------------------------------------------------------------
Those have been addressed too, copying the const string returned by
`os_get_option()` so that it could be modified.
In particular, the error above has been fixed by copying the `const
char *env_llc_options` variable in
`src/gallium/auxiliary/gallivm/lp_bld_misc.cpp` to a `char *` which can
be tokenized using `strtok()`.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>
Use os_get_option_dup in
src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
src/util/tests/process_test.c
because the string is going to be modified.
Use os_get_option_dup in device_select_layer.c are because the string is being assigned to a
struct member (protection for the future), and also because the consecutive usages (protection for the present).
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>
consecutive calling to getenv or os_get_option is invalid usage, so convert to use os_get_option_dup instead
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>
This is for take care of GetEnvironmentVariableA/getenv/secure_getenv/os_get_android_option
consistently across windows/posix/android
And also secure_getenv should not directly used in mesa, remove it from u_debug.h
os_get_option_secure is using getenv for windows before, that also pull the drawback of getenv, so convert it also
using GetEnvironmentVariableA on windows instead.
This is done the same as commit bed69133cd for os_get_option_secure
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38128>
When available, query and use explicit layout info otherwise fallback to
implicit layout with tiling query.
This fixes aligned layouts of multi-planar formats that were getting
misaligned when adding surfaces with implicit layouts.
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38039>
When EXT_descriptor_buffer is not enabled, we can assume we're in
legacy descriptor mode and not do any switching for secondaries.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38256>
We skip the stall emission for STATE_BASE_ADDRESS since this one can
be skipped on Gfx12.5+ and instead add a new sba tracepoint that has
valid timestamps.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0147908a89 ("anv: predicate emission of STATE_BASE_ADDRESS")
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38256>
This binary that perfetto builds automatically launches traced and
traced_probes, doesn't need tmux installed, doesn't need you to know tmux
ui, and doesn't need the tmux helper script hacked to not call the wrong
arch's ninja if you cross compiled.
Also, it's silly to send people into an explanation and links to docs,
when we have the instructions they actually want below.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37826>
to complement BITSET_CALLOC for when you want a memctx in there.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38245>
The sm8350-hdk devices have been fixed in the lab with help from
Qualcomm.
Also adjust the parallelism of the job, as we're still within the time
limit with just 2 devices instead of 3.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38239>
Some new CTS tests have geometry shader looking like this :
void main()
{
gl_Position = gl_in[0].gl_Position;
EmitVertex();
EndPrimitive();
// <-- some storage buffer write
}
The generate shader has :
- a message to write the position
- a message to write to the storage buffer
- a final message to end the thread
This generates an empty EOT URB messages which is apparently not legal
(simulation complains, HW hangs) :
send(8) nullUD g126UD nullUD 0x04088007 0x00000000
urb MsgDesc: offset 0 SIMD8 write masked mlen 2 ex_mlen 0 rlen 0 { align1 1Q A@1 EOT };
Instead emit a write with actual data and the mask set at 0 to discard
the effect :
mov(8) g127<1>UD 0x00000000UD { align1 WE_all 1Q };
mov(8) g125<1>UD 0x00000000UD { align1 1Q };
send(8) nullUD g126UD g125UD 0x04088007 0x00000040
urb MsgDesc: offset 0 SIMD8 write masked mlen 2 ex_mlen 1 rlen 0 { align1 1Q A@1 EOT };
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38243>
This commit changes SSA based spilling of values in loops.
As described in the paper by Hack, W_entry should consider which values
are used inside of the loop since we would really like to avoid spilling
those because we need to do so every loop iteration.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38181>
The Vulkan spec change hasn't been released yet but the VKCTS test
is public, so let's merge the fix to make VKCTS green again locally.
Fixes dEQP-VK.shader_object.tessellation.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38209>
Moved helper functions for binary upload and dump
code from si_shader.c to new file si_shader_binary.c
Signed-off-by: Saroj Kumar <saroj.kumar@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38166>
This would have caught recent regressions.
Haswell covers crocus.
Broadwell covers iris's oldest platform.
Skylake and derivatives are widely deployed.
Meteorlake is reasonably representative of recent Intel hardware.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38231>
This is a framebuffer fetch for blend equation advanced lowering. We're
using a binding table index as the offset, which is not a slot.
Also, validate the shader after setup_binding_table so that we catch
errors here at the right place, rather than deeper in the compiler.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38231>
Now that lavapipe sparse resource support on external memory has been
fixed, expose sparse resource support from venus on lavapipe. Meanwhile,
make remaining failures explicit (failed in lavapipe as well).
This CL also drops an obsolete comment and updates expectations from
full nightly runs.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38074>
Store a dup fd for OPAQUE type as required by sparse binding. Use
os_map_memory_fd_placed to handle binding for opaque fd backed memory,
while the unbind will still correctly share the same code path with
other cases. This adds sparse resource support with OPAQUE type in
addition to DMA_BUF, and venus no longer has to hide sparse resource on
lavapipe.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38074>
Since it early returns, refactor our with a dedicated helper for
cleaness and to prepare for further refactors.
No behavior change involved.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38074>
This is mainly to ensure internal anonymous file based memory allocs do
not collide with opaque fd type. Since we are here, add INVALID type and
assign to non-Linux internal alloc.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38074>
This completes the opaque fd api coverage, and is needed by lavapipe for
sparse binding support with opaque fd external memory.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38074>
This change:
1. add new helpers to handle dmabuf alloc/import/free
2. use FREE to match with alloc macro
3. simplify opaque fd code path
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38074>
There's already the type, and there's no concurrent usage of fd and
dmabuf fd, so just drop it. This fixes sparse resource to work with
dmabuf external memory for lavapipe. In addition, this makes sparse
resource from venus on lavapipe to work with dmabuf backing.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38074>
This change:
1. Move size validation within sparse binding, but not escape to
non-sparse code path.
2. Error out if sparse is requested on unsupported platforms.
Fixes: d747c4a874 ("lavapipe: Implement sparse buffers and images")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38074>
Not the most optimal solution but 64-bit vertex attributes are rarely
used. Could still revisit if we find a real use case that matters.
This fixes recent VKCTS coverage:
dEQP-VK.pipeline.fast_linked_library.vertex_input.component_mismatch.r64g64b64.*_to_dvec2
dEQP-VK.pipeline.shader_object_.*.vertex_input.component_mismatch.r64g64b64.*_to_dvec2
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14243
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38237>
Most of the infrastructure was already in place for 8 bit (we just
had to add the AFBC mode to use). We also needed to add support for
the 10 bit format (X6R10X6G10_X6R10X6B10_UNORM). Note that this
10 bit format is only supported for AFBC, for linear we have to fall
back to the old multi-plane way of handling it.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35771>
When importing an image, check that the specific combination of
format plus modifier is supported, rather than just checking the
format. This will allow drivers to support some YUV formats in
special cases (specific modifiers, like AFBC for panfrost) while
also allowing us to fall back to generic multi-plane formats when
those modifier+format combinations are not supported by
hardware.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35771>
Including the 10 bit variant X6R10X6G10_X6R10X6B10_UNORM. Only the
RG_RB variants seem to have fourccs, so those are the only ones being
added for now, although they would, obviously, be easy to add).
These are used for Y210, Y212, and Y216 fourccs. In particular Y210
is interesting for panfrost, as it is the fourcc used to indicate a
10 bit single plane 4:2:2 encoded as AFBC (similar to how YUYV is
the canonical AFBC for 10 bit 4:2:0).
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35771>
We had assumed AFBC superblocks were always tiled in an 8x8 pattern,
but this is true only for 32bpp and lower formats; for larger formats
the pattern is 4x4. This isn't an issue yet, but will be when we
support R16G16B16A16 in AFBC.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35771>
This is required because otherwise dri2_get_modifier_num_planes will be
confused: dri2_get_mapping_by_fourcc gives the emulated 2 plane YUYV,
but panfrost itself supports a hardware single plane YUYV. This is
arguably a problem with dri2, but other drivers have implemented
dri2_get_modifier_num_planes so we may as well. It also gives us a
hook for supporting more exotic planar configurations, should that
be necessary in the future.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35771>
panvk_meta_get_unorm_format_for_blk_size() requires a block size of 4
or less, but we didn't actually check for that before calling it. Fix
that, and also rename the function because what we actually care about
isn't whether it is a unorm format, but a blendable format.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35771>
pps initializes perf counter multiple times, once from
GpuDataSource::register_data_source and once from
GpuDataSource::OnSetup. This is fine, except we should replace
failing assert with skip on second call.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38224>
nir_parallel_copy_instr can be emulated using an intrinsic for each
entry and an array of arrays that is used by the pass to remember which
copies belong together.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36483>
The shaders in question use:
(memory_load + (gl_SubgroupSize - 1)) & ~(gl_SubgroupSize - 1)
My guess is that this is supposed to be the subgroup size of whatever
produced the value, not the subgroup size in this shader.
And because in the consumer the workgroup size is 32, we use wave32.
Fixes: a2d3cbac2a ("radv: determine subgroup/wave size early")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14187
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38214>
Align with gallium side. When fixed-function blending is not available,
the internal blend shader is used. This is handled by a single ST_TILE
in the blend shader with the current sample ID, which requires sample
shading enablement.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38129>
We have the invariant that zero-initializing is legal but it's not
obvious in the source code, so add a sentinel value for it to make code
that uses it crystal clear.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38189>
Some implementations can emit tracepoints when copying u_trace
buffers. It's important to reserve the slots we want to copy into
before emitting the copies so that both processes don't clash with one
another.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38199>
If renderpass has D/S attachment, but pipeline has D/S as UNDEFINED,
D/S should be properly disabled for the pipeline. The easiest way is to
ensure that D/S state is valid when pipeline's D/S format is UNDEFINED.
So we always create VkPipelineDepthStencilStateCreateInfo.
CC: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37550>
Note that currently autostrip is disabled globally with
Wa_14021490052 for some gfx versions and steppings.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37975>
Note that currently autostrip is disabled globally with
Wa_14021490052 for some gfx versions and steppings.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37975>
These registers need to be whitelisted by kernel so that we can use
it to disable autostrip at will. This is about Wa_14024997852.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37975>
This prevents mesa from segfaulting if GLX is working on some screens, but not all screens.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: stefan11111 <stefan11111@shitposting.expert>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38201>
The handling in dedup_srcs was incorrect because it would apply the
modifier from srcs[i] to the LUT without removing the modifier from the
instruction. We can fix and simplify this code by removing all modifiers
before the dedup_srcs() call, which we were doing immediately after the
call anyway.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13966
Fixes: 66c9c40f68 ("nak: Handle modifiers in dedup_srcs() in opt_lop()")
Reviewed-by: Seán de Búrca <sdeburca@fastmail.net>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38220>
Coverity notices that in this case part of the decision to go down the
locked path invovles reading a flag, which is turn set inside the
protected code. Additional threads can decide they need to go down the
locked path, and then wait on the lock, even though the first thread
willse the device_registered to true, which would have otherwise
prevented them from going down the locked path.
What Coverity doesn't know, is that it is a violation of the Vulkan API
contract to call this function from two different threads, so in
practice that cann't happen. We'll move the setting of the
image_device_registered out of the locked area to see if that pacifies
Coverity, and if not then we'll just ignore it.
CID: 1662067
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37582>
The value depends on the tgsi_interpolate_loc which is not constant for
the loop. llvm should be able to cse in cases where they are the same.
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38197>
The size queries for images do not use function pointers so we need to
be careful that width, height and depth are 0.
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38197>
The code was checking for the define to exists, which it always
does, and is set to 0, where it should instead have checked its
value.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38210>
write_memory is used after encoding every frame to mark the feedback
buffer as ready. Only use it when write_memory can work without PCIe
atomics support.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38184>
Dumping disassembly when not explicitly requested to do so was probably
helpful during early CSF bringup. But the CSF code is relatively robust
now, so let's move this to only happen if PAN_DBG_TRACE is set, similar
to what we do in the JM backend.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38182>
This is already set in `.panfrost-test`.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38182>
Fixes import of planar formats like NV12 in gtk4. Allows
`gst-launch-1.0 v4l2src ! gtk4paintablesink` to use vulkan instead of
falling back to OpenGL.
Closes: #14217
Cc: mesa-stable
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38200>
This reduces indirect indexing of 1-element arrays to indexing with 0.
Without this, we fail an assertion later.
Discovered when writing a test.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38113>
The code for lowering get_ssbo_size will be different in future chips,
so do it in common code to reduce duplication in the future.
Lower get_ssbo_size to ssbo_descriptor_amd + nir_channel.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38097>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38097>
We need to handle plane offsets everywhere. I noticed this broken before but
didn't realize it was a GL driver issue. Fix is easy, wrote this on my sofa
while waking up in the morning.
Fixes gst-launch-1.0 v4l2src ! glimagesink
Note that cheese & snapshot both still hang for some reason due to
libgstpipewire, but the Mesa side should be fine now.
Closes: #14217
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38160>
No shader-db changes on any Intel platform.
fossil-db:
Skylake
Intel(R) HD Graphics 530 (SKL GT2)
Totals:
Cycle count: 57669758527 -> 57669757913 (-0.00%); split: -0.00%, +0.00%
Totals from 10 (0.00% of 1736875) affected shaders:
Cycle count: 274949 -> 274335 (-0.22%); split: -0.36%, +0.14%
This change is likely due to subtle differences of different registers
being allocated.
In addition, fossils/google-meet-clvk/BgBlur.1f58fdf742c27594.1.foz and
fossils/google-meet-clvk/Relight.1f58fdf742c27594.1.foz stopped failing
EU validation on Gfx9 platforms.
Closes: #14171
Fixes: e7b7d572b3 ("intel/fs/ra: Re-arrange interference setup")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38122>
The llvm::orc::ThreadSafeContext object wraps an llvm::Context and keeps
its reference.
As we are no longer able to squeeze out Context from ThreadSafeContext
in LLVM 21, do not let ThreadSafeContext create Context implicitly for
LLVM 21, instead explicitly create Context and then remember it.
This also eliminates the code creating a Context that is never disposed.
Fixes: cd129dbf8a ("gallivm: support LLVM 21")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37684>
This composition comes up a bunch.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38169>
in extensive testing, using no atomics at all is a bit too loose for
basic/common cases like sharing a vertex bufer between contexts, readily
leading to unintended behavior
keeping the atomics internal ensures that they remain in the same ccx,
which avoids impacting performance
it does require a little trickery to avoid an extra atomic in the buffer
decrement case, however
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38176>
This helps us to more accurately count the number of registers that
need to be spilled to keep us below the maximum.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37188>
If the AR is loaded from a register changing that register in a loop was
resulting in a scheduling failure because the AR load was made dependend
on a later instruction. Fix the dependencies by only using dependencies on
older instruuctions in the same block.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14114
Fixes: d21054b4bc ("r600/sfn: Add pass to split addess and index register loads")
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38056>
This fixes mesh shader performance of RADV for GravityMark by stopping
the lowering of ClipDistance[64][4] indirect access for mesh shader outputs.
The perf improvement is 14% on Navi48.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38155>
Blob generates such norm_mul for glmark2:shadow benchmark on STM32MP257.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38172>
I'm not really sure why Coverity doesn't tag the `delete[]` as a
potential leak since it also happens after ASSERT macros, like it did
with the call to `fclose()`.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37744>
Coverity points out that if the asserts fail, then the file won't be
closed, and therefore wont be deleted. We'd like to avoid littering the
temp directory with useless files.
This uses GTEST's `TEST_F` feature with a custom class to manager the
creation and destruction of the tmpfile.
CID: 1666502
CID: 1666525
CID: 1666579
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37744>
Slight differences due to different optimization order.
Totals from 135 (0.17% of 79839) affected shaders: (Navi48)
Instrs: 287852 -> 287527 (-0.11%); split: -0.15%, +0.03%
CodeSize: 1522972 -> 1521764 (-0.08%); split: -0.12%, +0.04%
Latency: 1806803 -> 1825754 (+1.05%); split: -0.08%, +1.12%
InvThroughput: 242693 -> 244703 (+0.83%); split: -0.02%, +0.84%
VClause: 4092 -> 4084 (-0.20%)
SClause: 7462 -> 7478 (+0.21%)
Copies: 20509 -> 20401 (-0.53%); split: -0.74%, +0.21%
Branches: 6395 -> 6386 (-0.14%)
PreSGPRs: 7334 -> 7337 (+0.04%); split: -0.03%, +0.07%
PreVGPRs: 6375 -> 6382 (+0.11%)
VALU: 151787 -> 151595 (-0.13%); split: -0.15%, +0.02%
SALU: 52967 -> 52910 (-0.11%); split: -0.23%, +0.12%
VMEM: 6704 -> 6696 (-0.12%)
SMEM: 12099 -> 12129 (+0.25%)
Tested on a small collection of 2518 shaders from Dredge with callgrind using RADV:
baseline:
nir_opt_algebraic was called 12917 times from radv_optimize_nir()
nir_opt_cse was called 15204 times from radv_optimize_nir()
relative time spent in radv_optimize_nir(): 31.48%
total instruction fetch cost: 28,642,638,021
with nir/algebraic: ad-hoc constant-fold ALU instructions
nir_opt_algebraic was called 12797 times from radv_optimize_nir()
nir_opt_cse was called 12963 times from radv_optimize_nir()
relative time spent in radv_optimize_nir(): 30.63%
total instruction fetch cost: 28,284,386,123
=> ~1.27% improvement in total compile times
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
This prevents bugged CTS tests from tripping over with the following commits.
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp32.generated_args.denorm_sstep_denorm_flush_to_zero
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.generated_args.denorm_sstep_denorm_flush_to_zero_*
These tests exhibit undefined values where the result depends on the ordering
of nir_opt_algebraic and nir_opt_constant_folding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
In Gfx9+ the destination should be set to ARF null in all those cases, the
use of IP was a requirement of old versions only. The already zeroed
bits will encode ARF null, so no need to set.
Skipping the helper avoids setting unwanted bits (like hstride), which
in Gfx12+ are MBZ.
This patch adjust the expectations of the asm tests to remove the dst
type and dst stride fields -- will expect them all zeroed.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36454>
This is better than using the generic helper since will not set unwanted
bits (e.g. hstride) and it is already handling their case for Gfx12+
anyway.
There's an extra helper now for the case where src1 is not used. In
Gfx9-11 it needs to be set to ARF but with a matching type of src0.
Assembler was updated to follow the same approach.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36454>
Even though some platforms support int64 they don't support indirect
movs with 64-bit values. Effectively this is only supported for non-LP
Gfx9.
This fixes various tests in dEQP-VK.spirv_assembly.instruction.compute.untyped_pointers.*.push_constant.*64*
on BMG.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38125>
Use same approach as the other code checking for this vstride. Argument
could be made we want to reuse the same enum value for both the encoded
and decoded version, but for now follow the existing practice.
This will cause
dEQP-VK.spirv_assembly.instruction.compute.untyped_pointers.vulkan_memory_model.type_punning.load.push_constant.int64_to_uint64
and similar tests to fail validation on BMG. Later patch will fix that.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38125>
If you somehow have MESA_LOADER_DRIVER_OVERRIDE= in your environment,
you certainly weren't trying to force load the driver named "".
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38165>
New host feature allows usage of external_memory_host extension
for external memory support, mainly for software renderers which
don't implement platform specific extensions.
Also moves enablement of queue_family_foreign outside of android,
to allow it also on linux guests (ref: github PR#74), and removes
the check for VK_MVK_moltenvk extension in favor of metal mode.
Test: -gpu lavapipe -feature VulkanNativeSwapchain on windows
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38153>
The root cause is "Add initial Vulkan Base support", which refactors
Vulkan versions into smaller features (base + graphics + compute).
Not sure if this is the cleanest solution, but someone needs to
refactor gfxstream codegen anyways.
There's also can issue with VkRenderingArea.
Fixes: 61c71733c8 ("vulkan: update spec to 1.4.330")
TEST=libgfxstream_vulkan.so + gfxstream_backend.so (host) commpiles
with new changes
Reviewed-by: David Gilhooley <djgilhooley.gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38153>
The logic here before was wrong. In the case where the set is the same,
it would avoid the flush but then re-initialize anyway, loosing the
dirty information and causing us not to actually flush out all the
descriptors.
Fixes: 1f0fda22f7 ("nvk: Flush descriptor set maps")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38163>
Add additional error logging which can be enabled with
FD_MESA_DEBUG=layout to make it easier to debug handle
import failures.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37144>
Current code clears register writes after a register read is
encountered, this handles the first WaR but hides the write from the
reads that will succeed the first one. Ignoring subtle WaRaR hazards.
To fix this, we don't clear writes when a register read is encountered.
Thanks to Karol Herbst for finding and distilling this issue.
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37108>
Currently each block schedules instruction independently from other
blocks. Instructions in the block must then be scheduled conservatively
to remove possible hazards that can occur in previous blocks.
Replace the algorithm with an optimistic data-flow pass that takes into
account all the following blocks. Gains a minor performance improvement
across every shader (1-2%) and should never have any runtime performance
degradation.
Benchmarks:
furmark 34932 -> 35597 (+2%)
pixmark-piano 9027 -> 9113 (+1%)
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37108>
Replace direct FILE* operations (fputs/fprintf to stdout) with the
mesa_log_stream API for pipe control debug output.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38157>
The Vulkan spec says:
"VUID-vkCmdDraw-maxFragmentDualSrcAttachments-09239
If blending is enabled for any attachment where either the source
or destination blend factors for that attachment use the secondary
color input, the maximum value of Location for any output attachment
statically used in the Fragment Execution Model executed by this
command must be less than maxFragmentDualSrcAttachments"
Which means it must be disabled.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14190
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38107>
The hardware requires FLOAT type in NFE.GENERIC_ATTRIB.CONFIG0 when
using 4-component 32-bit integer vertex attributes.
Passes dEQP-GLES3.functional.default_vertex_attrib.*
Signed-off-by: Daniel Lang <dalang@gmx.at>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38136>
Struct virgl_renderer_capset_drm has a varying size depending on whether
AMDGPU driver is enabled or not. This breaks offset of struct vdrm_device
members for non-AMD drivers when Mesa is built with multiple native context
drivers including the AMD driver. Place varying capsets in the end struct
vdrm_device to mitigate the issue.
Fixes: 5736280730 ("virtio/vdrm: add ENABLE_DRM_AMDGPU for c_args")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38096>
This change adds the minimum support for VK_EXT_device_memory_report,
which only reports device memory events at this point. We can make it
more useful later (like what's done in ANV) if desired by some tools.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37987>
v2: - Correctly test in multi-slot split whether the group has kill if
we want to add a multi-slot op.
- update group_has_predicate if an according vector op was added
Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38112>
NIR+ACO is the best SSA-based shader compiler for AMD GPUs that exists.
There are many reasons why NIR+ACO is better than LLVM, and I have a long
list that I've collected over the years, but the major ones are better GPU
performance (faster GPU memory access thanks to better clauses and
scheduling, a lot less SGPR/VGPR spilling, better loop support, slightly
smaller shader binaries), 8x lower shader compile times, and smaller memory
footprint of the IR.
It also shows that NIR is a mature SSA-based shader compiler that helps
drivers generate optimized code very quickly.
And most importantly, radeonsi has slightly better Viewperf performance
with NIR+ACO than LLVM, and that's difficult to ignore.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38070>
tu_GetQueryPoolResults() currently asserts that the passed-in data size is
larger than the multiplication result of the specified stride and query
count. Such assert isn't useful when retrieving results for a single query
since the specified stride isn't important and can be any value.
The assert is removed, incorrect data sizing should be easily detectable by
existing validation tools.
Fixes dEQP-VK.query_pool.occlusion_query.stride_max in VKCTS 1.4.4.0.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38028>
radeon_surf stops being an input to ac_compute_surface. It's only an output
now.
This makes it clear which fields affect ac_compute_surface.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38093>
Use DI_SRC_SEL_AUTO_XFB as the source select mode for
CmdDrawIndirectByteCountEXT. Previously DI_SRC_SEL_AUTO_INDEX was used to
match the proprietary driver, but this mode doesn't correctly utilize the
counter offset value.
Fixes: dEQP-VK.transform_feedback.simple*.*_counter_offset_*
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38108>
During streamout setup, emit PC_DGEN_SO_CNTL for any shader type if
supported, not just tess eval. This avoids excluding degenerate
primitives from stream output.
Fixes: dEQP-VK.transform_feedback.primitive_restart.*
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38108>
Otherwise these primitives will be discarded.
This fixes `spec@!opengl 1.1@point-line-no-cull`.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38105>
This warn-by-default lint introduced in Rust 1.89.0 causes the following
warning:
warning: hiding a lifetime that's elided elsewhere is confusing
--> ../../src/gallium/frontends/rusticl/core/semaphore.rs:276:14
|
276 | fn state(&self) -> MutexGuard<SemaphoreState> {
| ^^^^^ -------------------------- the same lifetime is hidden here
| |
| the lifetime is elided here
|
= help: the same lifetime is referred to in inconsistent ways, making the signature confusing
= note: `#[warn(mismatched_lifetime_syntaxes)]` on by default
help: use `'_` for type paths
|
276 | fn state(&self) -> MutexGuard<'_, SemaphoreState> {
| +++
Follow the compiler's suggestion to fix this.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38132>
Update python-gitlab from 4.x to 5.x to fix AttributeError with
__annotations__ when running on Python 3.14. The 4.x series is
incompatible with Python 3.14 due to changes in how class annotations
are handled.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38123>
Remove the finally block with a return statement in GitlabGQL.query()
method. This pattern causes a SyntaxWarning in Python 3.14 as the return
in finally will override any return value or exception from the try/except
blocks.
Move the return statement to the end of the except block where it belongs,
maintaining the same error recovery behavior while fixing the warning.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38123>
The compressed and uncompressed Tile4 modifiers are supported on Xe2+.
The uncompressed TileY and Tile4 modifiers are easily supported on older
platforms.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38095>
Intel & AMD Direct3D drivers modify their rounding behaviour for texturing to
match Direct3D expectations. Such behaviour is not conformant in Vulkan, and
Intel hardware lacks a reasonable way to get NVIDIA's behaviour (which uniquely
works for Vulkan & Direct3D). The second best choice is to use
Direct3D-compatible behaviour for Proton (via driconf) and our current
Vulkan-conformant behaviour everywhere else. Given the APIs diverge and there is
no Vulkan extension to control the behaviour explicitly, driconf'ing on the
engineName is the reasonable solution.
anv already has a anv_force_filter_addr_rounding driconf option to force
Direct3D behaviour for certain Direct3D titles. Here we simply apply it to all
D3D10+ titles, aligning us with the Windows driver.
Note that D3D9 does not have this behaviour. We therefore use standard Vulkan
behaviour for D3D9 to avoid breaking D3D9 titles, even though the engineName is
the same as D3D10+.
This is the same solution radv uses, they call it radv_disable_trunc_coord. We
could unify the driconf entries later.
See https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38098#note_3166306
for a more detailed analysis, as well as the linked references:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27337https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25911https://github.com/HansKristian-Work/vkd3d-proton/pull/1884
This fixes misrendering in piles of Direct3D games run on anv via Proton,
including Assassin's Creed Valhalla.
Cc: mesa-stable
Closes: #13886
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Co-authored-by: Calder Young <cgiacun@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38114>
Remove two unused fields, and move a lonely boolean a bit up to plug the
remaining hole.
Because I was looking around and it bothered me.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38116>
The mechanism implemented in the hardware to synchronize against Write
after Read hazards with the visibility stream for concurrent binning is
for BV and BR to keep track of the number of render passes they have
finished and BV waits until
BR_count >= BV_count - vis stream count. For example, if
there are two visibility streams and the user submits three
renderpasses, before starting renderpass #3 BV will wait for BR to
finish renderpass #1. It's assumed that renderpass #3 and #2 use
different visibility streams, so it's safe to start working on #3 once
#2 is done.
This mechanism is assumed to work across renderpasses and even submits,
and the only way to reset the BR/BV counts is via
CP_RESET_CONTEXT_STATE which is only done by the kernel when
switching contexts. This vastly complicates things for Vulkan,
where we have no idea what order command buffers will be submitted. This
means that we have to defer emitting the actual pointers until
submission time and create patchpoints instead. This gets unfortunately
very complicated with SIMULTANEOUS_USE_BIT where we have to update the
patchpoints on the GPU.
I've taken the liberty of also deferring the allocation of the
visibility stream until submit time. This will help us later move to
per-queue visibility streams, which will be necessary for supporting
multiple simultaneous queues.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
Start introducing commands to setup BV. We need to run the initial
register setup with both BR and BV enabled, and similarly we need to
setup a bin preamble for BV. A few magic registers are BR-only and
should be skipped when initializing BV. The VPC attribute carveout
registers are a bit special because they must be initialized in BR and
BV bin preambles, so they are pulled into a separate function.
This commit also switches the "default" thread control from BR with
concurrent binning disabled to BR with concurrent binning enabled. GMEM
renderpasses now explicitly disable concurrent binning. This is
necessary because switching from CB enabled to disabled and vice versa
imposes a synchronization, and we want BV to be able to skip over
compute dispatches, sysmem renderpasses, etc. to find the next binning
pass. GMEM renderpasses re-enable concurrent binning at the end to keep
the "default" thread control and avoid having to sprinkle
THREAD_CONTROL(BR) around all the other Vulkan commands that can run
outside of a renderpass (compute dispatch, blits, query pool operations,
etc.).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
With concurrent binning, reading directly from the VSC_STATE registers
for conditional loads/stores doesn't work because BR and BV have their
own copy of VSC registers and the BV is running ahead of BR. Instead,
the blob's solution is to allocate extra space in memory for the
contents of VSC_STATE and upload it there. It has to be allocated
together with draw stream/prim stream because it is also written by BV
and read by BR and therefore it also needs to be duplicated to avoid
hazards. The memory is then read in BR via CP_MEM_TO_SCRATCH_MEM, which
is purpose-made for this.
Make turnip use this strategy on a7xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
The BV mempool is even further cut down compared to the "small mem pool"
layout which seems to be used by a610. It also shrinks the block size to
4 chunks instead of 8. This layout happens to be shared by a702, so
abstract out the layout into a "mempool size" enum.
While we're here, fix a bug with how the mempool offset for chunks is
printed. This accounts for the test diff.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
This was already broken for BINDLESS_BASE, but we didn't notice it
because we weren't using the builders. We have to cast fields that we OR
in, and we need to return uint64_t from the legacy field functions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
The arr::in_bounds field was set unconditionally for every deref created
for a chain. For struct derefs, which don't have this field, this would
write to an unused memory location, which is probably why this never
caused issues.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: f19cbe98e3 ("nir,spirv: Preserve inbounds access information")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38110>
memmove was slow on large number of descriptor when destroying some of
them.
util_vma_heap is perfect for the task, ANV and RADV already use it.
It also simplifies the code.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38053>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Some GPUs, like AMD, has specific stride align requirements in order to
display the content correctly.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
As the BOs created in GPU needs to be accessible from the simulator,
create them in GTT memory, with CPU access.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Some GPUs, like AMD, require specific stride alignment in order to
display the content correctly.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
As the BOs created in GPU needs to be accessible from the simulator,
create them in GTT memory, with CPU access.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Disable the max-fails feature in deqp-runner for this job, since it
aborts the run due to failures that don't occur otherwise.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38051>
This job runs on MT8196 Rauru Chromebooks.
Also remove the old G725 expectation files, as G725 is a smaller variant
of G925.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38051>
VCN requires 64x16 alignment for HEVC. When the app requests non-aligned
resolutions, make up for it with conformance window cropping.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38061>
doorbell does not require va address mapping from userspace and hence
amdgpu_bo_va_op_common() function is not called and therefore doorbell
bo->vm_timeline_point is not updated. Currently to wait for all mappings
to be ready doorbell vm_timeline_point is used which is incorrect.
This patch updates vm_timeline_point to wait for all bos. The bos
can be real bo or slab bo. slab bo can be from old buffer and hence
there is a check to update vm_timeline_point to wait only if it is
new.
Reported-by: Zhang, ShanYi (Ken) <ShanYi.Zhang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38059>
We don't actually advertise compute-only or depth-only queues right now
but nothing in the spec says you have to advertise the queues in order
to advertise the bits. Setting them now ensures we don't forget them
when compute-only or transfer-only queues get added.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38094>
This contains resolve modes which override the format-based defaults as
well as resolve flags to allow disabling sRGB conversion.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38094>
Add documentation describing Android system property usage in Mesa. For
example, how environment varible names are translated by
os_get_option(), how to get/set values, and corresponding example
commands.
A new section is added to doc/envvars.rst which points to the full
details within the new "Android System Properties" section in
docs/android.rst.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37942>
In the dynamic rendering case, the images are allocated statically out
of the command buffer and may be reused, so we have to make they are
zeroed to match the normal tu_CreateImage() path. Otherwise we may get
garbage from previous usages of the image.
Fixes: f6c7f16322 ("tu: Implement VK_EXT_multisampled_render_to_single_sampled")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38032>
Subpasses can have different view masks, although this isn't often used.
So we can't use the view mask of the last subpass when deciding what to
store, instead we have to use the same used_views field that's used by
loads and clears.
Noticed by upcoming tests for VK_QCOM_multiview_per_view_render_areas.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38064>
These extensions are implemented in shared Vulkan/WSI code and
not driver specific. A Vulkan driver just needs to support
VK_KHR_timeline_semaphore, which Honeykrisp already supports
since its inclusion into Mesa.
Successfully tested on Apple MacBookAir 2020 with M1 SoC on
top of KDE KWin 6.4 and GNOME mutter 48.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38069>
Not writing to color attachments in the same as blending from LRZ
viewpoint.
Though there is one case when we can avoid disabling LRZ writes,
when a renderpass starts with depth-only draw calls, or consists entirely
of them, but also has color attachments.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38034>
Same case as a drawcall not writing to some color attachments, but not
trying to make LRZ work in cases where we can prove that LRZ can work.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38034>
While accounting for an input register's merge set when resetting the
file start after the preamble, we implicitly assume that the allocated
register is the preferred one by asserting that the register's merge set
offset is not smaller than its physreg (to prevent an underflow).
However, inputs are not guaranteed to have their preferred register
allocated which causes the assert to get triggered.
Fix this by only taking the whole merge set into account for inputs that
actually got their preferred register allocated.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 9d4ba885bb ("ir3/ra: make main shader reg select independent of preamble")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37905>
The virgl-traces job is timing out randomly in different trace jobs with
the 6.17 kernel. Until we have a fix, mark these tests as Linux 6.16
only
Signed-off-by: Ritesh Raj Sarraf <ritesh.sarraf@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37935>
RADEON_FLAG_VM_UPDATE_WAIT can be passed to wait for VM updates at
allocation time instead of delaying them at submit time. There is no
reason to delay the waiting when the memory is bound to images/buffers
because in DX12 ressources are allocated and bound immediately.
This flag will be used to workaround an use-before-alloc in FH5
(game bug) which causes GPU hangs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38031>
Wrapping jump instructions that are located inside ifs can break SSA
invariants because the else block no longer dominates the merge block.
Repair the SSA to make the validator happy again.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37957>
Now sfb cmd can be VK_NULL_HANDLE on incompatible queues. Avoid passing
that to set cmd, and explicitly assert below.
Test: dEQP-VK.synchronization2.basic.timeline_semaphore.multi_queue can
properly assert on the driver side when assert is enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38049>
Previously, we either have to filter out incompatible queue or disable
fence feedback. Now we track whether the queue can do feedback, and will
mark the fence feedback not pollable if the fence is submitted on an
incompatible queue.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38049>
When XFB was disabled, we were incrementing primitives_generated but not
primitives_emitted, which caused the overflow query to return true, but
it should have returned false because XFB was disabled.
This disables counting primitives_generated when there is no
primitives_generated query. When both primitives_generated and the overflow
query are enabled simultaneously and XFB is disabled, it will be incorrect
again, but that had been equally incorrect with the non-NGG codepath too,
just not discovered because of the lack of tests.
This commit just changes NGG streamout queries to behave the same as legacy.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37849>
We may require a bigger more than 16KiB to handle the image copy.
We now always allocate a buffer to handle it properly fixing the
remaining failures on VKCTS 1.4.4.0 for HIC.
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38060>
We were assuming that every formats used for HIC had a block widgh and
height of 1x1.
This is wrong for compressed formats like BC5, ASTC, ect.
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38060>
Without this warning, if a driver try to use the
nir_lower_fp64_full_software option, would get a crash without any
message pointing that the problem is that doesn't fulfill the
requirements of current mesa fp64 sw implementation.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Ol¨ák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38025>
Most of the time, we can infer the type to append in
util_dynarray_append using __typeof__, which is standardized in C23 and
support in Jesse's MSMSVCV. This patch drops the type argument most of
the time, making util_dynarray a little more ergonomic to use.
This is done in four steps.
First, rename util_dynarray_append -> util_dynarray_append_typed
bash -c "find . -type f -exec sed -i -e 's/util_dynarray_append(/util_dynarray_append_typed(/g' \{} \;"
Then, add a new append that infers the type. This is much more ergonomic
for what you want most of the time.
Next, use type-inferred append as much as possible, via Coccinelle
patch (plus manual fixup):
@@
expression dynarray, element;
type type;
@@
-util_dynarray_append_typed(dynarray, type, element);
+util_dynarray_append(dynarray, element);
Finally, hand fixup cases that Coccinelle missed or incorrectly
translated, of which there were several because we can't used the
untyped append with a literal (since the sizeof won't do what you want).
All four steps are squashed to produce a single patch changing every
util_dynarray_append call site in tree to either drop a type parameter
(if possible) or insert a _typed suffix (if we can't infer). As such,
the final patch is best reviewed by hand even though it was
tool-assisted.
No Long Linguine Meals were involved in the making of this patch.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>
It is in C23 and Jesse reports that his MSMSVCV has it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com> [over IRC]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>
For platforms that don't have native float64 support, skip the
arithmetic and clustered ops. While they would work, the lowering
for float64 for those increase significantly the shader for some
of those operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35844>
Instead of doing selectively and with different supporting passes, just
run the complete set (special algebraic before and cleanup optimizations
after) at the end of brw_postprocess_nir_opts().
No changes to fossil-db on ICL, TGL, ACM and BMG.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35844>
We don't support it, everyone dropped support for that, let's not expose it.
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38054>
Coverity notices that `nir_get_io_index_src_number` could return -1, and
that we use it to index an array. It cannot understand that -1 only
happens for unhandled enum values, but all of these are handled. Add an
assert to help it out.
CID: 1667234
Fixes: 37a9c5411f ("brw: serialize messages on Gfx12.x if required")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38007>
Coverity notices that if `util_last_bit()` returns 0, and we subtract 1,
then the unsigned will overflow before being converted. We could cast to
eliminate that error, but the entire optimization function would do
nothing if tex->required_params == 0 (the way that we would get here),
so let's just not do work if we know we don't need to *and* avoid this
overflow.
CID: 1667241
Fixes: efcba73b49 ("brw: switch to new sampler payload description scheme")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38009>
Assuming disk I/O isn't painfully slow compared to running the whole
compiler (OS file system caching should help), this should massively
reduce the number of shaders compiled.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38019>
This way we get the same cache UUIDs for identical sources and build
environments, even if they're built at different times.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38019>
This one just hangs out and prevents unnecessary duplicate compilation.
We set .weak_ref = true so it doesn't actually hold references forever.
It just hangs onto any shaders that exist in some VkPipeline or VkShader
somewhere. But if an app compiles a pipeline it already has, it will
let us avoid a duplicate compile, even if the app doesn't provide a
pipeline cache.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38019>
The primary_ref_frame is used to load CDF data and has to match between
VCN and the header. Unless the default CDF tables are used, VCN
currently always assumes the CDF data will be loaded from
reference_frame_index (for VCN4) or lsm_reference_frame_index[0] (for
VCN5). This may cause conflict with the application provided
primary_ref_frame.
Check to make sure the primary_ref_frame is the same frame as the used
reference, and override it if necessary.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38030>
Rather than calling REQ_RES during subqueue_init, only call it during
submit the first time we see a command buffer that requires the use of
specific resources.
This ensures queues processing compute-only workloads (i.e. not actually
requiring tiler/fragment resources) don't preempt vertex/fragment work
on other queues due to resource congestion.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37890>
The previous implementation was horribly slow with a larger number
of descriptor sets.
The new approach uses util_vma_heap (like ANV) which is a perfect fit.
This fixes stuttering in Indiana Jones because that games seems to use
a huge number of descriptor sets which can also be freed.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13901
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37976>
VK_KHR_maintenance9 requires that vertex attributes in shaders which map
to vertex attributes that aren't bound at the API return a consistent
value. In order to do this, we need toemit SET_VERTEX_ATTRIBUTE_A, even
for unused attributes. The RGBA32F format was chosen to ensure we
return (0, 0, 0, 0) from unbound attributes.
Fixes: 7692d3c0e1 ("nvk: Advertise VK_KHR_maintenance9")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38040>
vs may be null when mesh shader enabled. mesh shader and
vertex shader may share the GS user sgpr, so need to call
si_shader_change_notify to mark shader pointers dirty.
Also remove some init code which will be done anyway when
vs bind first shader in si_shader_change_notify now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37932>
In !35844, there was some discussion about allowing 64-bit bcsel that
would be lowered in the driver. One challenge there would be if a 64-bit
bcsel was transformed into integer min or max by an algebraic
optimization. I believe these were the only algebraic patterns that
could create new integer min or max that would not be immediately
constant folded.
There were no shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>
Fixes the following building error happening with clang:
../src/util/os_file.c:291:29: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct epoll_event evt = {};
^
1 error generated.
Fixes: 17e28652 ("util: mimic KCMP_FILE via epoll when KCMP is missing")
Cc: "25.3"
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37944>
The index of each RT is the remapped color attachment index, so we have
to use the remapped indices when telling the HW the number of RTs.
This fixes KHR-GLES3.framebuffer_blit.scissor_blit on ANGLE once we
enabled VK_EXT_multisampled_render_to_single_sampled, which switched
ANGLE to using dynamic rendering with
VK_KHR_dynamic_rendering_local_read.
Fixes: d50eef5b06 ("tu: Support color attachment remapping")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37990>
Hardware doesn't support programmable sample locations with 1x MSAA, so
VK_SAMPLE_COUNT_1_BIT shouldn't be included in the
VkPhysicalDeviceSampleLocationsPropertiesEXT::sampleLocationSampleCounts
bitmask. With this sample count MSAA will also be disabled through relevant
control registers, effectively forcing all samples to the center position.
Fixes failures in VK_SAMPLE_COUNT_1_BIT tests under
dEQP-VK.pipeline.*.*.sample_locations_ext.verify_interpolation.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38026>
brw_prog_tcs_data::instances can be divided by vertices per threads on
earlier generations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a91e0e0d61 ("brw: add support for separate tessellation shader compilation")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38036>
Geometry, Color and Depth pipelines count are needed for collecting some
metrics from perfetto.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37072>
Adds end of pipe to end of pipe timestamp parsing. Does not require
inserting stall between events to get accurate values but events
will sometimes be running in parallel.
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37983>
It's not really clear whether or not it should use gamma 2.2 or the piece-wise
transfer function, or how clients would use it for wider gamut in general.
Currently no compositors I know of support ext_srgb, so this shouldn't affect
applications in practice.
Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Fixes: 4b663d56 ("vulkan/wsi: implement support for VK_EXT_hdr_metadata on Wayland")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36444>
The gs_rta_support feature is currently bugged and may cause the
driver to assert; disabling it will instead use a fallback method
which is functional.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38024>
With this commit we do two things:
* Remove pan_shader_info.fs.outputs_written because it is not used,
as there is already pan_shader_info.outputs_written
* For pan_shader_info.fs.outputs_read we direcly copy the nir
info.outputs_read, without a shift, for consistency and also
because it is not really required.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38022>
When NIR_DEBUG=serialize or NIR_DEBUG=clone is used, NIR_PASS recreates
nir_function_impl and nir_variable objects, causing use-after-free since
insert_rt_case() keeps pointers to those in local variables and var_remap.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>
This doesn't work with NIR_DEBUG=serialize or NIR_DEBUG=clone.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>
When NIR_DEBUG=serialize or NIR_DEBUG=clone is used, NIR_PASS recreates
nir_function_impl and nir_variable objects, causing use-after-free since
radv_nir_lower_rt_abi() keeps pointers to those in local variables.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>
When NIR_DEBUG=serialize or NIR_DEBUG=clone is used, NIR_PASS recreates
nir_function_impl and nir_variable objects, causing use-after-free since
ac_nir_lower_ngg_nogs() keeps pointers to those in local variables.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13946
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37573>
Feedback requires transfer capability, so we must skip feedback cmd
pool initialization on incompatible queue families. Meanwhile, use
pool_handle for all validity check needed.
- fence and semaphore feedback: skip feedback cmd alloc and record when
pool_handle is VK_NULL_HANDLE
- event feedback: not affected as we patch in-place upon recording
- query feedback: assert the feedback cmd alloc is on supported queue
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38016>
The framebuffer dimension exposed to apps is still 16k but since the
driver allows 32k image on GFX12+, meta operations might perform
operations (like a copy) using graphics.
While we are at it, use the correct bitfield for setting BR_X/BR_Y on
GFX12.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37974>
enums2names.py is only uses in one place. I propose to remove the -I
argument that is not strictly necessary as we can already get the
header name from the `-H` argument.
That modification is motivated by the need to help ninja-to-soong to
generate proper rule for the Android build system.
ninja-to-soong can't differenciate output file location and a string
matching the output file name.
Ref #14072
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37785>
Apparently various tessellation parameters come specified from
TESS_EVAL stage in GLSL while they come from the TESS_CTRL stage in
HLSL.
We switch to store the tesselation params more like shader_info with 0
values for unspecified fields. That let's us merge it with a simple OR
with values from from tcs/tes and the resulting merge can be used for
state programming.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a91e0e0d61 ("brw: add support for separate tessellation shader compilation")
Fixes: 50fd669294 ("anv: prep work for separate tessellation shaders")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37979>
It must not be larger than maxInlineUniformBlockSize.
Fixes VKCTS 1.4.4.0's
dEQP-VK.api.maintenance3_check.support_count_inline_uniform_block*.
Cc: mesa-stable
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38002>
Based on RADV.
The Vulkan spec says:
"If bindingCount is zero or if this structure is not included in
the pNext chain, the VkDescriptorBindingFlags for each descriptor
set layout binding is considered to be zero. Otherwise, the
descriptor set layout binding at
VkDescriptorSetLayoutCreateInfo::pBindings[i] uses the flags in
pBindingFlags[i]."
Fixes dEQP-VK.api.maintenance3_check.* in VKCTS 1.4.4.0.
Cc: mesa-stable
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38002>
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
The subgroup lowering may generate new fp64 vector operations, so
ensure that those are lowered before calling nir_lower_doubles().
Issue spotted by Georg Lehmann.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38003>
hk_heap is called during command buffer recording, which may be
concurrent, so writing dev->heap without synchronization is a data race.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37973>
lowering bitsize before lowering idiv is silly, since then it forces us
down the software int32 division path instead of the much faster
int8/int16 lowered path. Relevant CTS tests:
dEQP-VK.spirv_assembly.type.scalar.i16.div_comp,
dEQP-VK.spirv_assembly.type.scalar.i8.rem_comp,
Go from:
SIMD8 shader: 46 instructions. 1 loops. 4716 cycles. 0:0 spills:fills
SIMD8 shader: 1008 instructions. 0 loops. 3600 cycles. 0:0 spills:fills, 8 sends
to:
SIMD8 shader: 17 instructions. 1 loops. 2556 cycles. 0:0 spills:fills
SIMD8 shader: 464 instructions. 0 loops. 1394 cycles. 0:0 spills:fills, 8 sends
No stats change on fossil-db (which has very little int8/int16 and even
less integer division, apparently).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37966>
Skip waiting/signaling on semaphores with stages not related
to a given subqueue
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37810>
- rename vk_stage_to_subqueue_mask -> vk_stages_to_subqueue_mask
- handle stage masks instead of single stages.
- Add which sync scope it is reading to better reason with the mask semantics.
- Handle ALL_COMMANDS as well as TOP/BOTTOM (using sync scopes)
- add timestamp utility vk_stage_to_timestamp_subqueue_mask
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37810>
Until now, the driver has been using a single set for internal state
while leaving maxBoundDescriptorSets to the remaining 15.
This gives us no room for optimizations of driver sets, which might
become an issue in the future.
To remedy this, we therefore reduce maxBoundDescriptorSets to 7. This
aligns with the proprietary driver and gives us the space to optimize
the driver sets.
We might increase this in the future if we see that we don't need all
the driver sets we now reserve.
Reviewed-by: John Anthony <john.anthony@arm.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37978>
nir_lower_compute_system_values will attempt to lower
load_workgroup_size unless workgroup_size_variable is set. For precomp
shaders, the workgroup size is set statically for each entrypoint by
nir_precompiled_build_variant. Because we call
lower_compute_system_values early, it sets the workgroup size to zero.
Temporarily setting workgroup_size_variable while we are still
processing all the entrypoints together inhibits this.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 20970bcd96 ("panfrost: Add base of OpenCL C infrastructure")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>
Turing and newer Nvidia cards can work with up to 8 discard rectangles
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33476>
This can happen with (for example) 32x2 loads with
align_mul=4,align_offset=2.
This patch does bit_size=min(bit_size,bytes) to prevent num_components
from being 0.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 52cd5f7e69 ("ac/nir_lower_mem_access_bit_sizes: Split unsupported shared memory instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
Not sure about creating u64vec16 loads, but creating unaligned loads is
possible with opt_if_rewrite_uniform_uses.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
The actual chances of this happening seem dubious, but the cleaned up
code seems nice. printf returns a value >= 0 on success, which is the
number of characters it writes a return < 0 means that an error
occurred, and then errno is set. Which negative value doesn't seem to be
specified, but it also seems unlikely that any implementation would
return `-MAX_INT`...
Anyway, this is fixed by converting the generic `print_repeated` to a
`print_separator` that avoids the need to do arithmetic at all by just
stopping the loop at 1 instead of 0, and then printing a newline.
CID: 1666497
CID: 1666256
CID: 1666531
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37746>
Coverity is pointing out that we should check this, and in reality if
this isn't what we expect the rest of the test is probably invalid
anyway.
CID: 1666504
CID: 1666544
CID: 1666552
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37750>
This was missed in the original maintenance9 MR.
Fixes the flakes in test
dEQP-VK.synchronization2.op.single_queue.event.write_ssbo_compute_read_ssbo_compute.buffer_16384_maintenance9
Fixes: 7692d3c0 ("nvk: Advertise VK_KHR_maintenance9")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37964>
With untyped pointer we might get a deref_cast with a 0 ptr_stride. But we
were supposed to ignore the stride information on the pointer anyway, so
let's do that properly now.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 05dca16143 ("nak: extract nir_intrinsic_cmat_load lowering into a function")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37941>
The normal encode pass writes batches to a section in build scratch
memory. Those batches contain information about the internal node and
the primitive nodes. The encoder is split to avoid the register
pressure of the compressor and maximize occupancy.
The compressor works in two passes because one pass can not guarantee
that every primitive node (except) has at least two triangles. This
guarantee is used to advertise a smaller acceleration structure size to
the application.
During compression, every invocation processes at most two triangles.
Groups of 8 invocations are used to support the maximum triangle count
of 16 that the hardware supports.
The first step of compression is loading the triangle(s). Shared
vertices are deduplicated early to avoid doing it in the compression
loop. The compression loop tries to add triangles to a list of triangles
until the computed node size needed for storing the triangles reaches
the hardware node size. For this, each invocation first deduplicates
vertices with the triangles that have already been picked. It then
computes the node size of the picked triangles plus the candidate
triangles of the current invocation. The invocation that computed the
smallest size is added to the list.
Because it may not be possible to fit every triangle into the same node,
there can be multiple hardware nodes which are written in parallel for
optimal performance. If there are no nodes with only one triangle, all
nodes are written. If there is, compression of the batch is aborted and
the index of the batch is written to build scratch memory. The second
compression pass will repeat the steps above but only for those aborted
batches. The nodes with only one triangle can and are now merged.
It can not be determined during box node encode which triangles will be
compressed together so the encoder also has to fix up the parent box
node's child infos.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36965>
Upon acquiring an external image from external/foreign queue family,
skip AFBC metadata invalidation if the app has explicitly requested
acquireUnmodifiedMemory. This also applies to CRC which may or may not
get hooked up later.
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37972>
Add a patch for dEQP: for non-APK builds running on Android, duplicate
dEQP output to both stdout and Android's logcat, making test output
visible in logcat.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37933>
Those are normally uniform always, but for the purpose of fused
threads handling, we need to check their sources.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ca1533cd03 ("nir/divergence: add a new mode to cover fused threads on Intel HW")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>
I didn't test "nvk: Fix maxVariableDescriptorCount with iub" as
thoroughly as I should have and it regressed
dEQP-VK.api.maintenance3_check.descriptor_set because we were then
violating the requirement that maxPerSetDescriptors describes a limit
that's guaranteed to be supported (and reported as supported in
GetDescriptorSetLayoutSupport).
That commit was also based on a misreading of nvk_nir_lower_descriptors.c
where I thought that the end offset of an inline uniform block needed to
be less than the size of a UBO. That is not the case - on closer
inspection that code gracefully falls back to placing IUBs in globablmem
if necessary. So, we can afford to be less strict about our IUB sizing
and only require that IUBs follow the existing limit imposed by
maxInlineUniformBlockSize.
Fixes: ff7f785f09 ("nvk: Fix maxVariableDescriptorCount with iub")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37922>
It assumes you don't have dead writes to variables in the last block, and
will copy-propagate consts from the first write it finds.
Without this, the upcoming nir_opt_copy_prop_vars() change to have more
restricted write masks caused less nir_opt_dead_write_vars() (since it
doesn't trim write masks for dead writes, only removes fully-dead writes),
and then zero-initialization of variables at the top of a shader got
propagated, rather than the final store of the used channels of the
variable.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>
Usage of implicit input file in valhall_parse_isa makes it very
complicated for tools like ninja-to-soong to generate the Android
equivalent build file.
Instead use an explicit argument.
It also deduplicate the location of the input file name to have it
only in 'meson.build'.
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37742>
The extension is emulated on top of traditional transient attachments,
with the driver creating extra internal attachments for each subpass
where the MSRTSS sample count doesn't match the color or depth/stencil
attachment's sample count and insert unresolves/resolves around them,
except for the cases where the original attachment is unused
before/after the subpass respectively. An important case is if the
original attachment would be cleared at the beginning of the subpass, in
which case we rewrite the clear to apply to the driver-internal
multisample attachment. This requires redirecting the clear colors in
CmdBeginRenderPass2.
We create images, image views, and backing memory for these attachments
that are part of the framebuffer with classic renderpasses or allocated
per-render-pass with dynamic rendering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37919>
These ops replicate the single-sampled source attachment to the
multi-sampled destination attachment before the start of a subpass. This
is the new hardware feature for
VK_EXT_multisample_render_to_single_sampled, and the actual
implementation of the extension emulates everything on top of these.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37919>
This is an old leftover from the skeleton stage of the driver, and we
have never needed anything other than the image view. Having this in
the way made it impossible to write generic code that reads the
attachments in the !image_framebuffer and dynamic rendering cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37919>
This will be necessary so that the 3d path can handle "unresolves",
where we select the single-sampled shader for the single-sampled source
but enable multisampling *without* per-sample shading to replicate it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37919>
We need to always use the FMT6_Z24S8_AS_R8G8B8A8 format for GMEM even if
UBWC is disabled, as already done for the 2d store path. Because we
use the pre-baked RB_MRT_BUF_INFO register value, this means we have to
override it.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37919>
This was already present in the 2d paths but not in the 3d path,
probably because the flag was moved there only on a7xx and it was
missed. Prevents page faults from bad flag buffer accesses.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37919>
This can happen if we resolve to a resolve attachment and then use that
resolve attachment as an input attachment in a later subpass. We don't
need to put it in GMEM, but it's still considered "written" because
input attachment reads need a dependency after the resolve.
MSRTSS input attachment tests effectively created such a scenario after
lowering to transient multisample attachments and inserting resolves.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37919>
If the first use of an attachment is as an input attachment, but it has
LOAD_OP_CLEAR, then we have to clear the attachment in GMEM and patch
the input attachment to refer to GMEM. Noticed by inspection.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37919>
Trivial but still better than cs_add32 workaround on platforms
supporting MOVE_REG32.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37951>
This change:
1. Use a different loop var for outer multidraw repeat
2. Make sure fau total_count matches the filled faus per loop
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37951>
Fix a regression from an unfortunate typo.
Fixes: 48e8d6d207 ("panfrost, panvk: The size of resource tables needs to be a multiple of 4.")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37951>
Should only set once outside the multidraw loop so that per draw can
patch its own own desc attribs when needed.
Fixes: a5a0dd3ccc ("panvk: Implement multiDrawIndirect for v10+")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37951>
bi_load_tl's dst arg was named src, this was confusing.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37955>
IADD_IMM and uniform loads should not be spilled. Make them
rematerializable. Restrict to loads that write at most one register
for simplicity for now.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37955>
Mesh shader workgroups always have the same amount of subgroups.
When the API workgroup size is the same as the real workgroup
size, this is a small optimization (using a constant instead of
a shader arg).
When the API workgroup size is smaller than the real workgroup
size (eg. when the number of output vertices or primitves is
greater than the API workgroup size on RDNA 2), this fixes a
potential bug because num_subgroups would return the "real"
workgroup size instead of the API one.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37947>
The object buf is referenced at the beginning of the
r600_draw_rectangle() function and should be freed
at the end. This issue was introduced with cbb6e0277f.
Fixes: cbb6e0277f ("r600: stop using util_set_vertex_buffers")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37617>
We don't request feedback for create and destroy commands, so it's
not needed to allocate feedback buffer.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37884>
We don't request feedback for create and destroy commands, so it's
not needed to allocate feedback buffer.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37884>
Until we have a stable kernel which contains these fixes.
This applies to all jobs except NAVI21/NAVI31 which still use 6.6 and
POLARIS10 which is stucked to 6.15.9 for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37934>
Currently when LCRA spills it also fills on every single usage of the
spilled value. This leads to a lot of cases where a spill is immediately
followed by a fill.
This patch reduces fills by LCRA by simply not filling until reaching
either the index that caused LCRA to fail allocating registers, or the
end of a block.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37299>
The info was moved to radeon_info, but it was only set for the amdgpu
kernel driver. It was uninitialized for radeon.
Fixes: d82eda72a1 - ac/gpu_info: move HS info into radeon_info
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37910>
df1876f615 ("nir: Mark negative re-distribution on fadd as imprecise")
fixed the fadd case by marking it as imprecise. This commit fixes the
ffma case for the same reason.
However, "imprecise" isn't necessary and nowadays we have "nsz" which is
more accurate here. Use that for both fadd and ffma.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>
This is already supported by the compiler and all the relevant conformance
tests pass.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37913>
If a life range of one register starts in the same instruction where the
life range of another register ends, then
the two ranges don't overlap.
v2: Fix test
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37847>
This hint tells KMD and firmware to turn into low latency but high
power usage mode.
i915 already had it now it was implemented in Xe KMD.
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214>
Lets query if this feature is supported only once, also in the next
patches support for this feature will be added to Xe KMD.
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214>
This moves most of the code to a new home: src/poly.
Most precomp kernels logic that could be moved are provided by poly now.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
Introduce new NIR intrinsics to handle getting a "sink" read-only
address and another intrinsic to handle conversion of address to
read-write (allowing implementation to replace the "sink" read-only with
another address like required for Asahi)
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
We run agx_preprocess_nir as the last step of each new compute shaders
in agx_nir_lower_gs but we could move this out of the pass and makes it
the driver responsability to call it.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
Only a small detail but git will not go too crazy when I move
everything around at least.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
This is used by the geometry lowering that we are going to move to
common code.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
No need to actually generate NIR bindings for anything we don't need
to.
We are going to copy most of this in Panfrost and that will be required
there...
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
This was a hack to allow things to build but still could break in the
future, let comply by passing scratch as an argument instead.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
This is an alternative Curro proposed to counting the number of
serialized messages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37816>
The intermediate buffer between the 2 images is linear, its stride
should be a function of the tile's logical width.
Normally this should map to the values reported by ISL except for
TileW where for some reason it was decided to report 128 for TileW
instead of the actual 64 size (see isl_tiling_get_info() ISL_TILING_W
case)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37902>
It's valid for the tiler desc to be 0 when the tiler isn't being
used. Currently, we set the descriptor based on an offset over the
pointer in the gfx state, and if this is 0, we end up setting it to
just the offset when there are more than 8 layers on a target.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37837>
Adds opt-in support for AFBC WSI swapchain image creation by
adding the supported modifiers to the lists expected by mesa WSI.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
External images given to us by WSI use this tiling mode instead
of optimal, and we want to allow this.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
This seems too restrictive with our current set of supported
modifiers, as we frequently end up skipping through the entire list
of AFBC modifiers.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
External importers of AFBC dmabufs expect these to be in terms
of bytesizes instead of direct superblock counts. This makes these
calculations aligned with panfrost and fixes WSI imports.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
If there isn't any modifier info coming from the compositor, we go
through our internal image path and pick the best modifier that
supports the image. This causes problems on X11, as it actually
expects the image to be in a linear layout.
Explicitly set the modifier to linear for legacy scanout images,
which specifically indicates that the image doesn't have an
explicit DRM modifier and we should do the safe thing by using
linear.
The naming becomes confusing for scanout with this change,
so the flag is now split into two separate flags, one for controlling
the AFBC optimalness called wsi, the other more directly called
legacy_scanout, which is used for enforcing the linear mod.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
Instead of always adding storage usage on pre_mod_adjustments
and preventing AFBC for all images with usage TRANSFER_DST,
only do this when the image doesn't use AFBC, by adding a
new post_mod_adjustments pass.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
Since we don't have multiplanar AFBC support, this causes issues
when we try to alias an image with a single plane to a plane of
a multiplanar image.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
On panvk, we can use the render area to set fbd bbox extents
instead of setting them based on image sizes. Doing this improves
partial updates (eg. loadOp:load with renderArea < image_size) of
AFBC render targets.
This commit introduces a new structure for this setting and uses
it on both panvk and panfrost. We can't reuse the existing extent
here as that is based on viewport+scissor, which can change within
a renderpass/batch, which causes issues on panfrost.
No functional changes for panfrost, as it doesn't have an equivalent
to renderpass::renderArea so we can't do the same thing there, it
still uses the entire framebuffer extent.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
This represents what this bounding box is being used for better,
as it can be easily confused with the framebuffer bounding box
otherwise.
Also fixes the comment about inclusiveness, as these are being
used as exclusive on both panfrost and panvk.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37771>
Instead of having abstracted opcodes, we target directly the HW format
at the NIR translation.
The payload description gives us the order of the payload sources (we
can use that for pretty printing) and we don't have to have a
complicated scheme in the logical send lowering for the ordering. All
we have to do is build the header if needed as well as the descriptors.
PTL Fossil-db stats:
Totals from 66759 (13.54% of 492917) affected shaders:
Instrs: 44289221 -> 43957404 (-0.75%); split: -0.81%, +0.06%
Send messages: 2050378 -> 2042607 (-0.38%)
Cycle count: 3878874713 -> 3712848434 (-4.28%); split: -4.44%, +0.16%
Max live registers: 8773179 -> 8770104 (-0.04%); split: -0.06%, +0.03%
Max dispatch width: 1677408 -> 1707952 (+1.82%); split: +1.85%, -0.03%
Non SSA regs after NIR: 11407821 -> 11421041 (+0.12%); split: -0.03%, +0.15%
GRF registers: 5686983 -> 5838785 (+2.67%); split: -0.24%, +2.91%
LNL Fossil-db stats:
Totals from 57911 (15.72% of 368381) affected shaders:
Instrs: 39448036 -> 38923650 (-1.33%); split: -1.41%, +0.08%
Subgroup size: 1241360 -> 1241392 (+0.00%)
Send messages: 1846696 -> 1845137 (-0.08%)
Cycle count: 3834818910 -> 3784003027 (-1.33%); split: -2.33%, +1.00%
Spill count: 21866 -> 22168 (+1.38%); split: -0.07%, +1.45%
Fill count: 59324 -> 60339 (+1.71%); split: -0.00%, +1.71%
Scratch Memory Size: 1479680 -> 1483776 (+0.28%)
Max live registers: 7521376 -> 7447841 (-0.98%); split: -1.04%, +0.06%
Non SSA regs after NIR: 9744605 -> 10113728 (+3.79%); split: -0.01%, +3.80%
Only 2 titles negatively impacted (spilling) :
- Shadow of the Tomb Raider
- Red Dead Redemption 2
All impacted shaders were already spilling.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
We start by assigning a backend opcode to all tex instructions, use
that to figure out if we have packed sources and apply the lowering
accordingly.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
Centralize all the information in one place and also make the mapping
decision from nir_tex_instr -> HW opcode much earlier.
This will help knowning exactly what the payload looks like early in
the backend IR and when it needs to lowered to a smaller SIMD size due
to HW limits. It will also allow NIR lowering to know when to combine
parameters into a single packed component.
Finally, this also reduces the amount of LOAD_PAYLOAD we need to carry
in the backend IR, because we don't have to generate VEC()
LOAD_PAYLOAD() for coordinates etc... Those are useless if there is
any other parameter in the payload and we need need to add one more
LOAD_PAYLOAD() when doing the logical send lowering.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
After a recent change, `piglit-traces.sh` automatically sets the caching
proxy, so update the docs to reflect this.
Also update the name of the variable from `FDO_HTTP_CACHE_URI` to
`LAVA_HTTP_CACHE_URI`.
Fixes: fa74e939bf ("ci/piglit: automatically use LAVA proxy")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37862>
This reverts commit a219308867.
It's failing most of the tests on Anv :
$ ./deqp-vk -n dEQP-VK.wsi.xlib.maintenance1.scaling.*
Test run totals:
Passed: 88/2422 (3.6%)
Failed: 576/2422 (23.8%)
Not supported: 1758/2422 (72.6%)
Warnings: 0/2422 (0.0%)
Waived: 0/2422 (0.0%)
The only passing tests seem to be with this pattern :
dEQP-VK.wsi.xlib.maintenance1.scaling.*.same_size_and_aspect
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37904>
- ``MESA_KK_DEBUG``: Set to ``msl`` to log all generated Metal Shading Language (MSL) shaders.
- ``MESA_KK_GPU_CAPTURE``: Starts Metal capture at device create and ends it at device destroy. Set to ``1`` to activate.
- ``MESA_KK_GPU_CAPTURE_DIRECTORY``: Metal capture will be saved to the specified directory. Defaults to Xcode if no path is provided.
- ``MESA_KK_DISABLE_WORKAROUNDS``: Provide ``all`` to disable all workarounds. Otherwise, provide a comma separated list to disable wanted workarounds e.g. ``1,3,4`` to disable workaround 1, 3 and 4.
Metal workarounds
*****************
Different workarounds are applied throughout the project to avoid issues such
.._The Nouveau wiki's CodeNames page: https://nouveau.freedesktop.org/CodeNames.html
.._Matching CUDA arch and CUDA gencode for various NVIDIA architectures: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
error_message:'VA state tracker requires at least one of the following gallium drivers: r600, radeonsi, nouveau, d3d12 (with option gallium-d3d12-video), virgl.')