Suggested by @gurchetansingh.
Android's Soong build system treats several compiler warnings as errors
by default: https://android.googlesource.com/platform/build/soong/+/27f57506/cc/config/global.go/#218
To catch these issues in Mesa, introduce `soong_compat_c_args`
and `soong_compat_cpp_args` with the following flags treated as errors:
-D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS
-Werror=date-time
-Werror=gnu-alignof-expression
-Werror=ignored-qualifiers
-Werror=implicit-fallthrough
-Werror=int-conversion
-Werror=missing-prototypes
-Werror=pragma-pack
-Werror=pragma-pack-suspicious-include
-Werror=sizeof-array-div
-Werror=string-plus-int
-Werror=unreachable-code-loop-increment
These compatibility flags are added to the meson configurations
for ANV, Gfxstream, Lavapipe, PanVK, Turnip, and Venus.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41644>
Based on the approach in e0eea5ea4e.
When a file is too large, -Wmisleading-indentantion will give the warning
below, that we can't prevent from a #pragma:
../src/freedreno/vulkan/tu_perfetto.cc: In function 'void setup_incremental_state(MesaRenderpassDataSource<TuRenderpassDataSource, TuRenderpassTraits>::TraceContext&, tu_device*)':
../src/freedreno/vulkan/tu_perfetto.cc:162: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
162 | if (!state->was_cleared)
../src/freedreno/vulkan/tu_perfetto.cc:162: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89549 for details.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41644>
Stops allocating events in chunks. u_trace_event is allocated using a
linear allocator which has minimal overhead. Buffers for timestamps are
allocated using a custom allocator.
As a sideeffect, it is possible to deduplicate consecutive tracepoints.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41271>
We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
Fixes:
src/freedreno/vulkan/tu_knl_kgsl.cc:1455:16:
error: 'alignof' applied to an expression is a GNU extension [-Werror,-Wgnu-alignof-expression]
alignof(*objs), VK_SYSTEM_ALLOCATION_SCOPE_COMMAND);
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41518>
Fixes:
src/freedreno/vulkan/tu_cmd_buffer.cc:2162:10:
error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
The other option is the [[fallthrough]] annotation.
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41518>
UBO and sampler descriptors are smaller than texture descriptors, but
the nature of Vulkan descriptor sets means we need to make them just as
big. Zero out the remaining dwords so that we don't get garbage that
trips up asserts in gfxrecon-replay.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41608>
Apparently we emitted uninitialized values for VSC in tu6_emit_tile_select
when HW binning wasn't used, which Adreno 630 doesn't like and hangs.
Instead of adding even more conditions, just always init VSC state,
the rare cases where initializing it can be skipped - not worth
the complexity.
Fixes: 49191f46e6 ("tu/a6xx: Emit VSC addresses for each bin to restore after preemption")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41604>
Unfortunately we have to disable concurrent binning by default
because it hurts performance in a number of desktop games without
any case where we know it helps.
There are less vertex fetch resource available in BV compared to BR,
so when binning runs in BV, there are many vertices, and vertices are
attribute heavy - BV has much worse performance than BR, sometimes more
than 50% worse.
Even with worse performance it won't be bad if concurrent binning
actually overlapped with other workload in those cases, but in case of
desktop games - there is almost never a chance for overlap.
However it's impossible to statically find out if binning on BV would
be much slower than on BR, and we also cannot statically predict if
there is enough overlap (if any) to cover for the performance penalty.
Given the above, I don't see a way out but to make concurrent binning
opt in via `tu_allow_concurrent_binning` driconf toggle.
Still allow concurrent binning in CI to catch issues early.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41394>
Apparently, this is a major footgun since it is not uncommon for apps to
enable all the features exposed by a driver. Having UBWC disabled for
D24S8 can result in a major performance loss, and the reason can be hard
for devs to spot. This footgun is already known to have happened a few
times. Furthermore, disabling UBWC depending on a Vulkan feature being
requested broke D24S8 sharing via external memory when only one device
was created with customBorderColorWithoutFormat.
Fortunately, there is the depthStencilSwizzleOneSupport feature, which
was added after the above hardware deficiency was found and, when false,
forbids the problematic state combination.
To prevent the footgun described above, we now set
depthStencilSwizzleOneSupport to false by default. This allows UBWC to be
enabled for D24S8 in all cases while remaining conformant. We also have
the tu_enable_d24s8_border_color_workaround driconf option, which enables
the previous workaround for apps that don't know about
depthStencilSwizzleOneSupport, which is currently only the ANGLE
translation layer.
One caveat is that we cannot use the fast border color HW feature for
D24S8+USAGE_SAMPLED+VK_FORMAT_UNDEFINED, so a new driconf toggle is
added. enable_fast_border_color_for_undefined_formats is set for DXVK and
vkd3d-proton since they are known not to use border colors with D24S8.
Lacking fast border colors is a much smaller penalty than not having UBWC
for D24S8.
For some context also see: https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/4346
This partially reverts 36916949.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41514>
Caused hangs in the following CTS test with CB enabled:
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.random.seed1_geometry_tessellation_multiview
Fixes: 50cc9c723c ("tu/u_trace: Prevent cloning stale RB_DONE_TS results")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41490>
pre_chain.rp_trace usage relied on a bunch of bad assumptions
and together with u_trace_move didn't cause issues until
u_trace is started to be refactored. Fixing those bad assumptions
and correctly initializing and freeing pre_chain.rp_trace
also requires fixing u_trace_move at the same time.
u_trace_move fixes:
- If dst had trace chunks in it - we may have leaked them.
- The correct list move pattern is "list_replace -> list_inithead"
not "list_replace -> list_delinit"
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41390>
When uncached memory type is used under emulation then most games have a
significant performance penalty due to accessing the buffer atomically.
Instead when this option is set, it will override uncached buffer
allocations to instead be cached+coherent if the host supports it. This
allows the atomic accesses to still be done but not have abysmal
performance.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41323>
Instead of leaving timestamp_copy_data half-initialized in
copy_timestamp_cs_pool - always have it fully initialized and valid
state there.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41438>
Before a7xx, ldg/stg.a use an offset in units of their type size while
on a7xx and later, the offset is always in bytes. Currently,
@load/store_global_ir3 take their offset in dwords (32-bits). This has a
few downsides: offsets need an extra shl during codegen on a7xx and
addressing sub-dword-aligned addresses is only possible by doing 64-bit
math on the base address.
Improve the situation by always using a byte offset for
@load/store_global_ir3 and adding the offset_shift index to support type
units pre-a7xx. While we're at it, add the base index as well to support
all ldg/stg.g features in @load/store_global_ir3.
Supporting these renewed intrinsics consists of two parts:
- ir3_nir_lower_io_offsets legalizes the offset_shift on a6xx: for
ldg.a/stg.a, the offset has to be in units of the type size so extra
shifts are inserted to accomplish this if necessary. On a7xx, offsets
are always in bytes so nothing needs to be done.
- The intrinsics are emitted as ldg/stg if the offset is a small enough
constant and as ldg.a/stg.a otherwise. a6xx supports an extra shift
for ldg.a/stg.a that only applies to the GPR offset (not the immediate
base); NIR is pattern matched at this point to extract this if
possible.
All users of @load/store_global_ir3 are updated to generate the offset
in units of bytes. ir3_nir_analyze_ubo_ranges is updated to take the new
offset_shift into account.
Totals from 2029 (1.15% of 176266) affected shaders:
MaxWaves: 26728 -> 26660 (-0.25%); split: +0.01%, -0.26%
Instrs: 1314089 -> 1278603 (-2.70%); split: -2.72%, +0.02%
CodeSize: 2739108 -> 2633236 (-3.87%); split: -3.87%, +0.01%
NOPs: 197537 -> 200843 (+1.67%); split: -1.62%, +3.30%
MOVs: 43771 -> 44025 (+0.58%); split: -1.11%, +1.69%
Full: 31849 -> 31948 (+0.31%); split: -0.03%, +0.34%
(ss): 37965 -> 42027 (+10.70%); split: -3.47%, +14.17%
(sy): 13752 -> 13566 (-1.35%); split: -4.04%, +2.68%
(ss)-stall: 154238 -> 170353 (+10.45%); split: -1.72%, +12.16%
(sy)-stall: 804442 -> 806518 (+0.26%); split: -4.65%, +4.91%
Preamble Instrs: 326728 -> 293488 (-10.17%)
Cat0: 217926 -> 220947 (+1.39%); split: -1.58%, +2.96%
Cat1: 50182 -> 50446 (+0.53%); split: -0.97%, +1.49%
Cat2: 460987 -> 452101 (-1.93%); split: -2.26%, +0.33%
Cat3: 390696 -> 361271 (-7.53%)
Cat7: 39148 -> 38688 (-1.18%); split: -1.24%, +0.06%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41342>
LRZ and FDM have a few major performance pitfalls, if they are not
clearly surfaced when doing perfetto trace - they are easy to miss.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40935>
We rely on tu_lrz_flush_valid_at_suspending_rp_boundary() to make sure
that subsequent resuming renderpasses get the correct LRZ state. However
this doesn't work on early a6xx GPUs without tracking support. Disable
LRZ in this case, similar to secondaries.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40935>
This matches the behavior of radv for these two.
Fixes:
dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.capture_replay.sparse_buffer_descriptor_data_consistency
dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.capture_replay.sparse_buffer_descriptor_data_consistency_and_usage
Fixes: 8feed47fce ("tu: Initial support for sparse binding")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38148>
Maintenance9 will require us to make unbound vertex inputs (that is,
attributes in the VS without a corresponding binding at the same
location) be defined. In order for this to work, VFD_FETCH_INSTR_INSTR
must be defined for all attributes used in the shader. Imagine we do
something like:
CmdSetVertexInputs(only 1 input at location 0)
CmdBindPipeline(VS reads only location 0)
CmdDraw()
CmdBindPipeline(VS reads locations 0 and 1)
CmdDraw()
For the first draw we only need to emit VFD_FETCH_INSTR_INSTR[0], for the
second draw we need to emit VFD_FETCH_INSTR_INSTR[1] as well in the VI
draw state. This unfortunately means we have to do draw-time validation
for vertex input state.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40552>
The kernel already does this for us by zeroing new BOs. It's also
unnecessary, unless the newly-introduced
VK_QUERY_POOL_CREATE_RESET_BIT_KHR flags is used. If we ever start
suballocating query pools, we may have to zero based on that flag, but
for now we don't have to do anything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40552>
Before we were falling back to always emitting a pipeline barrier, which
effectively kills any point of having the event. But with sync2 and the
guarantee that src/dst dependency infos match, we can instead emit the
flushes before writing the event and actually use the event as intended.
As a bonus, this also allows the BV to run ahead of the BR.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40552>
Otherwise we'd get tracepoints out of logical order, which doesn't
matter for perfetto at the moment, but would matter with future
perf warnings.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41102>
Apparently CP_CCHE_INVALIDATE is just a plain register write underneath,
so it needs WFI before it, in order to invalidate at the right point.
```
CP_CCHE_INVALIDATE:
mov $addr, 0x9881
mov $data, 0x1
waitin
mov $01, $data
```
Fixes misrendering in Doom Eternal on A750.
Fixes: fb1c3f7f5d ("tu: Implement CCHE invalidation")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41266>
Different HW units (R2D blit engine, 3D pipeline, etc.) apply subtly
different F32->UNORM16 rounding to the same float clear value, causing
cleared pixels to fail subsequent depth comparisons.
Pre-quantize D16 clear values to exact UNORM16 precision before passing
to any HW path. The GMEM path is unaffected as it already converts to
integer in pack_blit_event_clear_value().
Fixes dEQP-EGL.functional.image.modify.renderbuffer_depth16_renderbuffer_clear_depth
with zink and ANGLE when sysmem is used.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41196>
This moves from deprecated stage_id/hw_queue_id to per-context
stage_iid/hw_queue_iid, which leads to separate timelines per app.
There are several benefits to this:
- Different driver versions could be used by different apps and perfetto
won't confuse tracepoints.
- Tracepoints from different apps may not align perfectly, so previously
we got a fair amount of weird vertical ordering of tracepoints.
The downside is that info is spread across several timelines multiplied
by queues, but I think that's better since it is easier to understand
which tracepoints correspond to which app.
The changes are mostly copied from radeon/intel perfetto integration.
This also fixes app_event emission along the way, previously
debug_marker_stage was called _before_ SEQ_INCREMENTAL_STATE_CLEARED.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41105>
Based on 075d78115e ("panvk: implement deferred image creation"),
8aa2f1a94f ("panvk: add panvk_android_get_wsi_memory for AHB spec v8+"),
and 66bbd9eec8 ("panvk: implement AHB image deferred init and memory alloc").
Defer image initialization for both ANB alias images (gralloc v8+)
and AHB-backed images using vk_android_init_deferred_image() to
deep-copy the VkImageCreateInfo at vkCreateImage time.
For ANB alias images, tu_image_init() and tu_image_update_layout()
run at vkBindImageMemory2 time via tu_android_get_wsi_memory() when
the native buffer arrives.
For AHB images, tu_image_init() and tu_image_update_layout() run at
vkAllocateMemory time when the AHardwareBuffer handle is available
via dedicated allocation.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40635>
Fixes two bugs in the WAIT_FENCE polling loop:
1. Break on timeout returned VK_SUCCESS because ret was read too late.
2. UINT64_MAX timeout_ns overflowed end_time, causing immediate exit.
Fix by reading rsp->ret before the timeout check and using
OS_TIMEOUT_INFINITE (like virtio_pipe_wait in freedreno) to avoid
overflow.
This prevents premature BO teardown during host-side fault recovery.
Fixes: f17c5297d7 ("tu: Add virtgpu support")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41108>
With SPV_KHR_constant_data, it's allowed to specialize array of
constants.
RustiCL changes are from Karol Herbst <kherbst@redhat.com>.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41046>