Some of the tensor info is needed at various points during lowering.
Instead of storing the tensor index and looking it up every time, store
a point to the tensor struct instead.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
The struct ethosu_operation structure has the same initialization in
multiple ops. More ops with the same duplication are about to be added.
Move this out to a common initializer function.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
The vela compiler defines shift as signed and some upcoming LUT code
allows for negative shifts, so make shift signed everywhere.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
Just like the fix for nvk, just drop this in the GL driver as well.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41143>
Four linked D3D12 pipeline-validation problems with GLSL TCS on DXIL:
1) dxil_nir_kill_unused_outputs killed TCS outputs read back by the
patch-constant function after a barrier, zeroing the tess factors.
Keep shader_out locations with any intra-shader load_deref live
regardless of next_stage_read_mask.
2) is_dead_in_variable dropped TES padding placeholders (no local
uses) in nir_remove_dead_variables. Also honor
prev_stage_written_mask so padded TES inputs stay alive.
3) Preserving (1) leaves HS with outputs the DS doesn't declare,
breaking pipeline validation (e.g. piglit's barrier.shader_test).
Add dxil_nir_pad_tes_input_signature, called from both link paths,
to synthesize matching TES inputs (reusing each TCS output's type
so sig shape and stride match byte-for-byte) plus the tess-level
inputs -- subsuming the tess-level-only block previously in
dxil_spirv_nir_link. Scope the per-variable padding to TCS
outputs that TCS itself reads back via load_deref: outputs that
neither TES nor TCS consumes get killed from the HS signature,
so padding them into DS would make the DS input signature longer
than HS output and break validation for SSO pipelines whose TCS
declares unused per-patch writes (arb_separate_shader_objects/
mix-and-match-tcs-tes).
4) remove_hs_intrinsics rewrote load_output but not
load_per_vertex_output in HS main. With (1) keeping outputs alive,
GLSL reads of outputs in main whose result survives DCE (UAV
atomics, non-tess per-vertex output writes) left
LoadOutputControlPoint in the control-point function, which dxil.dll
rejects outside the PCF (CreatePipelineState then fails with
E_INVALIDARG). Treat load_per_vertex_output like load_output.
Validated on piglit arb_tessellation_shader/execution (WARP + DXC
1.8.2403): barrier now passes; the previously-crashing
tcs-output-unmatched and variable-indexing/tcs-output-array-* fail
gracefully matching baseline; isoline/isoline-no-tcs remain flakes
(pre-existing canary corruption, unrelated).
d3d12-quick_shader.txt drops barrier; d3d12-flakes.txt adds
isoline-no-tcs alongside isoline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41028>
get_tessellator_output_primitive used to unconditionally invert CW<->CCW
on the assumption the input was GL-origin (lower-left). That was wrong
for any upper-left caller — including spirv_to_dxil, whose SPIR-V sources
(DXC, glslang) already align with D3D winding.
Make nir_to_dxil copy info.tess.ccw through and expect upper-left. The
d3d12 gallium driver (GL) flips before the conversion to preserve its
output. spirv_to_dxil and dozen (Vulkan, UPPER_LEFT default) are unchanged.
Assisted-by: Claude Opus 4.7 <noreply@anthropic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41028>
Enable protected context capability for Android
when TMZ support is available. This is needed for Widevine L1 secure
video playback on Android, which requires a protected context.
Signed-off-by: jinmiliu <jinming.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40980>
V_581B_PFP and V_581B_ME is for pws acquire_mem. Current code
does not cause any problem because we won't pass engine arg
directly to acqure_mem packet. But use a native V_581A_* arg
for better coding.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41069>
New IB print will assert reserved packet field to be zero.
Fixes: 1c75cd958f ("ac: enable the new auto-generated CP packet parser")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41069>
By default crocus precompiles shaders, to avoid stuttering at screens,
caused by compiling shaders at the drawing phase.
Unfortunately at intel Gen 6 and higher the precompiled version of the
fragment shaders is not used and every fragment shader is compiled twice.
These double fragment shaders also are added to the memory cache
and disk cache.
This is caused by setting wrong values to variables at the key during
precompiling at routine crocus_create_fs_state() at src/gallium/drivers/crocus/crocus_program.c,
which differ from values at crocus_populate_fs_key() at src/gallium/drivers/crocus/crocus_state.c.
This commit solves 3 problems:
it adjusts the predicted value 'input_slots_valid' at Gen 6
it adjusts the predicted value 'ignore_sample_mask_out' at Gen 6 and higher
it predicts the value 'multisample_fbo' , which helps if samplemask is used
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35605>
The new tool has much better image diffing presentation (thanks to
Danilo's work on turnip's private trace CI), better performance, flake
checking within a single run, parallelized downloads along with replays,
and ability to cache downloaded files to improve runtime, and system
monitoring (for debugging OOM-related slowdowns).
./bin/update_traces_checksum.sh still updates based on the output of a CI
run, but you can also apply a patch file that the tool generates, if you
do offline runs using your traces.toml.
New traces being replayed, in less overall runtime (2 minutes instead of 3):
- minetest/minetest-high-v3.trace (new version, not the old flaky one)
- neverball/neverball-v2.trace
- ror/ror-default.trace
- supertuxkart/supertuxkart-mansion-egl-gles-v2.b.trace
- valve/counterstrike-v2.trace
- valve/portal-2-v2.trace
- xonotic/xonotic-keybench-high-v2.trace
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40959>
With SPV_KHR_constant_data, it's allowed to specialize array of
constants.
RustiCL changes are from Karol Herbst <kherbst@redhat.com>.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41046>
The polynomial used for asin_expr() was suboptimal (and its source was
not documented).
A better approximation is found in the _Handbook_of_Mathematical_Functions_
by Abramowitz and Stegun, which is used in Nvidia's Cg toolkit. However,
while this approximation gives a good absolute error bound, its relative
error exceeds the 4096 ulp allowed by the Vulkan spec. Taking a page
from the spirv implementation of asin(), we implement a piecewise
approximation where a Taylor series is used for small values of |x|.
This patch also harmonizes the GLSL and Vulkan implementations by moving
the implementation to common code (nir_builder).
Running tests on asin() with a grid of 64000 samples between 0.0 and +1.0,
the original asin() at 32 bits has:
```
glsl spirv
RMSE: 1.756451e-04 1.609091e-04
worst abs error: 3.904104e-04 at 0.937001 3.904104e-04 at 0.937001
worst ulp error: 11800 at 6.2499e-05 3826 at 0.841331
```
whereas the new implementation has for both:
```
RMSE: 2.528056e-05
worst abs error: 4.962087e-05 at 0.451149
worst ulp error: 2379 at 0.215106
```
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40862>
Latest addrlib supports SIMD (AVX2) and it's definitely fast enough to
be used in production now.
GFX10 is still not enabled by default due to some regressions from the
addrlib bump, also still missing AVX for some formats.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40996>
LLVMpipe is the only driver that actually has supported the instructions
that this cap reports about. But TGSI is a dying IR, and this helps very
little; the compiler back-end will optimize this away anways.
So let's drop it to reduce complexity.
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40993>
These instructions were never supported on Iris, as it never supported
TGSI. This didn't lead to any issues because tgsi_to_nir normalized the
result. This mistake got carried forward when creating crocus as well.
Let's just stop reporting it; it doesn't do anything useful.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40993>
This hasn't been supported since the TGSI envvar was ripped out. When
converted to NIR, we don't see these instructions at all.
Fixes: c3cbe610df ("nouveau: Delete the NV50_PROG_USE_TGSI env var.")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40993>
In order to be able to properly check for maxResourceSize on Vulkan, we
need to be able to report the size even for resources that overflow that
limit. Otherwise we end up failing to find a usable modifier rather than
properly report the problem to the application. This means we need to move
the check out of the mod-handler.
There's no need to validate the slice-stride. The reason is a little bit
complicated, but we have two possible cases:
1. V10 and before: the image-size and the slice-stride are both limited
to UINT32_MAX. Since the image-size is always at least as large as the
slice-stride, it's enough to check the image-stride.
2. V11 and later: 37 bits is large enough to store any valid
slice-stride. The only way we could blow this one up, would be to
pass out-of-range width or height, which is already either validated
by higher-level logic (gallium) or UB (vulkan). This is important,
because we don't have another mandate to reject large resources on
Vulkan; we can only reject due to maxResourceSize, not an individual
plane.
So let's move this out to the call-site. We don't need to do anything
for PanVK, becuase it already checks for maxResourceSize.
To keep the Gallium and Vulkan driver as similar as reasonably possible,
check against the whole resource even in Gallium, where we could have
gotten away with checking a plane at the time instead.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40999>
This change was tested on plam and cayman. Here are the tests fixed:
spec/arb_gl_spirv/execution/uniform/atomic-uint-aoa-cs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-aoa-fs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-array-cs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-array-fs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-cs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-fs: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40822>
This change adds a minimal support for gl_PointSize to
be used alongside gl_ClipDistance/gl_CullDistance.
This change was tested on palm and cayman. Here is the test fixed:
khr-gl4[5-6]/gl_spirv/spirv_validation_builtin_variable_decorations_test: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40701>
This functionality was not working.
To fix this issue, the texture mode needs to be set to
V_030000_SQ_TEX_DIM_2D_MSAA. When this mode is processed
the gpu only takes the layer 0. This change implements
this functionality by copying the layer to a new resource.
This change was tested on palm, barts and cayman. Here is the
test fixed:
khr-gl4[2-6]/texture_view/view_sampling: fail pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40567>
This change extends r600_lds_constant_buffer to
implement a fully conformant gl_PrimitiveID at
the tes and tcs stages.
This change was tested on cayman and barts. Here are the tests fixed:
spec/arb_tessellation_shader/execution/tcs-primitiveid-instanced: fail pass
spec/arb_tessellation_shader/execution/tes-no-tcs-primitiveid-instanced: fail pass
spec/arb_tessellation_shader/execution/tes-primitiveid-instanced: fail pass
khr-gl4[4-6]/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass
khr-gles31/core/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass
khr-glesext/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40297>
The atomic offset implementation was incomplete.
This change was tested on cayman, it fixes all the
variants of this test:
khr-gl4[2-6]/shader_atomic_counters/advanced-usage-multi-stage: fail pass
khr-gles31/core/shader_atomic_counters/advanced-usage-multi-stage: fail pass
Fixes: 06993e4ee3 ("r600: add support for hw atomic counters. (v3)")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40170>
The support needed only a minor adjustment.
Note: As far as rv770 is concerned: khr-gl33 (4/12),
the support needs some work and is disabled.
This change was tested on palm, barts and cayman: piglit (3/3)
khr-gl46 (15/16). The failing test: sampler2darrayshadow_vertex
is referenced as "Bug 21620051" (VK-GL-CTS) and seems to have
a problem.
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39915>
This change is inspired by b56f47611a ("radeonsi: fix
alpha-to-coverage + alpha-to-one used together for
gfx6-10.3") and implements the same algorithm.
This change was tested on rv770, palm and cayman. Here are the tests fixed:
spec/arb_framebuffer_object/execution/msaa-alpha-to-coverage_alpha-to-one: fail pass
spec/arb_framebuffer_object/execution/msaa-alpha-to-coverage_alpha-to-one_write-z: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39779>
Reading the buffer size (GET_BUFFER_RESINFO) does not work
on cypress. This issue is the main difference on a test
basis between cypress and other gpus like palm and cayman.
This change adds a dedicated function which extends the
previous workaround algorithm for cubearray to this end.
This change assumes that all the gpus between cedar and
hemlock have the same issue.
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39650>