I did my stress testing mostly outside of north america work hours, but it
turns out once the runners have 60-70% background CPU usage, these ones
intermittently time out.
Reported-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41163>
This reverts commit b83a931cb1 as it causes
regressions with dirty rects enabled on some HW platforms that signal
out of order completion and require individual fence objects per slice
Fixes: b83a931cb1 ("d3d12: Video sliced encode: Use same ID3D12Fence/different per slice values as optimization")
Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41160>
A reshape operation just changes the dimensions of a tensor, but doesn't
change the data at all. So we just point the OFM to the IFM data and
we're done.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
The ethosu_lower_add() function can handle other element wise operations
such as multiply, minimum, and maximum, so rename it in preparation to
add those operations.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
Add support for fully-connected convolution. FC convolution lowering is
nearly the same, so refactor the existing convolution code to support
both.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
For axis 1 concatenation, the OFM strides need to match the IFM strides.
Presumably axis -3 can also be supported, but there haven't been any
models with -3. Not sure what axis 2 would need either.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
Some of the tensor info is needed at various points during lowering.
Instead of storing the tensor index and looking it up every time, store
a point to the tensor struct instead.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
The struct ethosu_operation structure has the same initialization in
multiple ops. More ops with the same duplication are about to be added.
Move this out to a common initializer function.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
The vela compiler defines shift as signed and some upcoming LUT code
allows for negative shifts, so make shift signed everywhere.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
Just like the fix for nvk, just drop this in the GL driver as well.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41143>
Four linked D3D12 pipeline-validation problems with GLSL TCS on DXIL:
1) dxil_nir_kill_unused_outputs killed TCS outputs read back by the
patch-constant function after a barrier, zeroing the tess factors.
Keep shader_out locations with any intra-shader load_deref live
regardless of next_stage_read_mask.
2) is_dead_in_variable dropped TES padding placeholders (no local
uses) in nir_remove_dead_variables. Also honor
prev_stage_written_mask so padded TES inputs stay alive.
3) Preserving (1) leaves HS with outputs the DS doesn't declare,
breaking pipeline validation (e.g. piglit's barrier.shader_test).
Add dxil_nir_pad_tes_input_signature, called from both link paths,
to synthesize matching TES inputs (reusing each TCS output's type
so sig shape and stride match byte-for-byte) plus the tess-level
inputs -- subsuming the tess-level-only block previously in
dxil_spirv_nir_link. Scope the per-variable padding to TCS
outputs that TCS itself reads back via load_deref: outputs that
neither TES nor TCS consumes get killed from the HS signature,
so padding them into DS would make the DS input signature longer
than HS output and break validation for SSO pipelines whose TCS
declares unused per-patch writes (arb_separate_shader_objects/
mix-and-match-tcs-tes).
4) remove_hs_intrinsics rewrote load_output but not
load_per_vertex_output in HS main. With (1) keeping outputs alive,
GLSL reads of outputs in main whose result survives DCE (UAV
atomics, non-tess per-vertex output writes) left
LoadOutputControlPoint in the control-point function, which dxil.dll
rejects outside the PCF (CreatePipelineState then fails with
E_INVALIDARG). Treat load_per_vertex_output like load_output.
Validated on piglit arb_tessellation_shader/execution (WARP + DXC
1.8.2403): barrier now passes; the previously-crashing
tcs-output-unmatched and variable-indexing/tcs-output-array-* fail
gracefully matching baseline; isoline/isoline-no-tcs remain flakes
(pre-existing canary corruption, unrelated).
d3d12-quick_shader.txt drops barrier; d3d12-flakes.txt adds
isoline-no-tcs alongside isoline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41028>
get_tessellator_output_primitive used to unconditionally invert CW<->CCW
on the assumption the input was GL-origin (lower-left). That was wrong
for any upper-left caller — including spirv_to_dxil, whose SPIR-V sources
(DXC, glslang) already align with D3D winding.
Make nir_to_dxil copy info.tess.ccw through and expect upper-left. The
d3d12 gallium driver (GL) flips before the conversion to preserve its
output. spirv_to_dxil and dozen (Vulkan, UPPER_LEFT default) are unchanged.
Assisted-by: Claude Opus 4.7 <noreply@anthropic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41028>
Enable protected context capability for Android
when TMZ support is available. This is needed for Widevine L1 secure
video playback on Android, which requires a protected context.
Signed-off-by: jinmiliu <jinming.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40980>
V_581B_PFP and V_581B_ME is for pws acquire_mem. Current code
does not cause any problem because we won't pass engine arg
directly to acqure_mem packet. But use a native V_581A_* arg
for better coding.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41069>
New IB print will assert reserved packet field to be zero.
Fixes: 1c75cd958f ("ac: enable the new auto-generated CP packet parser")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41069>
By default crocus precompiles shaders, to avoid stuttering at screens,
caused by compiling shaders at the drawing phase.
Unfortunately at intel Gen 6 and higher the precompiled version of the
fragment shaders is not used and every fragment shader is compiled twice.
These double fragment shaders also are added to the memory cache
and disk cache.
This is caused by setting wrong values to variables at the key during
precompiling at routine crocus_create_fs_state() at src/gallium/drivers/crocus/crocus_program.c,
which differ from values at crocus_populate_fs_key() at src/gallium/drivers/crocus/crocus_state.c.
This commit solves 3 problems:
it adjusts the predicted value 'input_slots_valid' at Gen 6
it adjusts the predicted value 'ignore_sample_mask_out' at Gen 6 and higher
it predicts the value 'multisample_fbo' , which helps if samplemask is used
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35605>
The new tool has much better image diffing presentation (thanks to
Danilo's work on turnip's private trace CI), better performance, flake
checking within a single run, parallelized downloads along with replays,
and ability to cache downloaded files to improve runtime, and system
monitoring (for debugging OOM-related slowdowns).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40959>
The new tool has much better image diffing presentation (thanks to
Danilo's work on turnip's private trace CI), better performance, flake
checking within a single run, parallelized downloads along with replays,
and ability to cache downloaded files to improve runtime, and system
monitoring (for debugging OOM-related slowdowns).
./bin/update_traces_checksum.sh still updates based on the output of a CI
run, but you can also apply a patch file that the tool generates, if you
do offline runs using your traces.toml.
New traces being replayed, in less overall runtime (2 minutes instead of 3):
- minetest/minetest-high-v3.trace (new version, not the old flaky one)
- neverball/neverball-v2.trace
- ror/ror-default.trace
- supertuxkart/supertuxkart-mansion-egl-gles-v2.b.trace
- valve/counterstrike-v2.trace
- valve/portal-2-v2.trace
- xonotic/xonotic-keybench-high-v2.trace
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40959>
With SPV_KHR_constant_data, it's allowed to specialize array of
constants.
RustiCL changes are from Karol Herbst <kherbst@redhat.com>.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41046>
The polynomial used for asin_expr() was suboptimal (and its source was
not documented).
A better approximation is found in the _Handbook_of_Mathematical_Functions_
by Abramowitz and Stegun, which is used in Nvidia's Cg toolkit. However,
while this approximation gives a good absolute error bound, its relative
error exceeds the 4096 ulp allowed by the Vulkan spec. Taking a page
from the spirv implementation of asin(), we implement a piecewise
approximation where a Taylor series is used for small values of |x|.
This patch also harmonizes the GLSL and Vulkan implementations by moving
the implementation to common code (nir_builder).
Running tests on asin() with a grid of 64000 samples between 0.0 and +1.0,
the original asin() at 32 bits has:
```
glsl spirv
RMSE: 1.756451e-04 1.609091e-04
worst abs error: 3.904104e-04 at 0.937001 3.904104e-04 at 0.937001
worst ulp error: 11800 at 6.2499e-05 3826 at 0.841331
```
whereas the new implementation has for both:
```
RMSE: 2.528056e-05
worst abs error: 4.962087e-05 at 0.451149
worst ulp error: 2379 at 0.215106
```
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40862>