Metal provides straightforward ways to copy an image to/from memory,
and image-to-image copies can be implemented by chaining them.
Note that host copy of combined depth-stencil is not supported, as
Metal does not allow CPU copy for these formats. Additionally, GPU
optimized contents are not allowed with host image copy usage; CTS
directly initializes the raw memory of optimized images to random
invalid data, which appears to decompress differently on GPU vs CPU
and fail.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41714>
`linear` controls whether the created image is in linear layout, and
`optimized_layout` controls only the `allowGPUOptimizedContents`
Metal property.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41714>
The job runs the following modules with ANGLE:
- CtsGraphicsTestCases
- CtsNativeHardwareTestCases
- CtsSkQPTestCases
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>
Android CTS for both arm64 and x86_64 Android targets always ships with
an x86_64 host JDK. Tradefed supports running on arm64 hosts though, so
provide a native JDK by installing Debian's openjdk-21-jdk-headless
package on arm64.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>
Update aapt from the Android 14-based version in Trixie to a custom
fork based on the upstream Android 16 QPR2 branch, which fixes the
following error spam on arm64:
E aapt2 : Entry offset at index 0 points outside the Type's boundaries
E aapt2 : Entry offset at index 1 points outside the Type's boundaries
E aapt2 : Entry offset at index 2 points outside the Type's boundaries
E aapt2 : Entry offset at index 3 points outside the Type's boundaries
E aapt2 : Entry offset at index 4 points outside the Type's boundaries
E aapt2 : Entry offset at index 5 points outside the Type's boundaries
E aapt2 : Entry offset at index 6 points outside the Type's boundaries
E aapt2 : Entry offset at index 7 points outside the Type's boundaries
E aapt2 : Entry offset at index 8 points outside the Type's boundaries
E aapt2 : Entry offset at index 9 points outside the Type's boundaries
E aapt2 : Entry offset at index 10 points outside the Type's boundaries
E aapt2 : Entry offset at index 11 points outside the Type's boundaries
E aapt2 : Entry offset at index 12 points outside the Type's boundaries
E aapt2 : Entry offset at index 13 points outside the Type's boundaries
E aapt2 : Entry at index 14 is too small (0)
E aapt2 : Index 15 points to entry with unaligned offset 0x03080001
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>
Blender uses atomic operations as part of its virtual shadow mapping
implementation. Virtual shadow mapping page tagging in compute shaders
benefits from divergent atomics fusion, while fragment shaders doing the
atomic raster step in general have worse performance with this
optimization turned on.
Thus, an option is added to only apply divergent atomics fusion to compute
shaders in ANV, and this option is enabled for Blender.
Initial support for divergent atomics fusion optimization in ANV was added
in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631.
Signed-off-by: Christoph Neuhauser <christoph.neuhauser@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41706>
In shader-db, with `-p skl`, shaders/0ad/12.shader_test does not
compact an instruction because precompact overwrites portions of the
instruction. (Treating the three source instruction as a two source
when accessing instruction fields.)
This instruction could be compacted:
mad(8) g65<1>F g61<4,4,1>F g64<4,4,1>F -g17<4,4,1>F { align16 1Q };
But, since precompact erroneously sets bits, the instruction isn't
compacted.
Fossil testing:
* Tested with 0a3f3fd193 ("brw: drop unused color_outputs_valid
key") reverted, as fossils are currently producing inconsitent
results otherwise.
* Tested skl, icl, dg2, mtl, lnl, bmg and ptl. Only skl had a change.
SKL:
Totals:
CodeSize: 8335219296 -> 8320248992 (-0.18%)
Totals from 359508 (14.42% of 2492689) affected shaders:
CodeSize: 2838254352 -> 2823284048 (-0.53%)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41588>
Instead of doing the iadd manually we can use the uniform slot of the
ld/st/atom instruction getting rid of the iadd altogether.
Additionally for global memory we can also consume a 32 bit offset instead
of requiring it to be 64 bit.
Totals from 158539 (13.07% of 1212873) affected shaders:
CodeSize: 2308216336 -> 2242231136 (-2.86%); split: -2.86%, +0.00%
Number of GPRs: 8682436 -> 8662675 (-0.23%); split: -0.26%, +0.04%
SLM Size: 238816 -> 238604 (-0.09%)
Static cycle count: 2169063422 -> 2147747544 (-0.98%); split: -0.99%, +0.01%
Spills to memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Fills from memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Spills to reg: 45053 -> 45273 (+0.49%); split: -0.04%, +0.53%
Fills from reg: 36385 -> 36757 (+1.02%); split: -0.04%, +1.06%
Max warps/SM: 6027232 -> 6034616 (+0.12%); split: +0.12%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
This tries to handle all combinations we might run into to. We should rely
on previous optimizations that the more difficult cases never happend.
As a side benefit instead of lowering a UGPR to a GPR, it will now be
moved to the UGPR slot.
Totals from 258010 (21.27% of 1212873) affected shaders:
CodeSize: 3742700224 -> 3576740928 (-4.43%); split: -4.44%, +0.01%
Number of GPRs: 13606055 -> 13496463 (-0.81%); split: -0.86%, +0.05%
SLM Size: 589740 -> 589660 (-0.01%)
Static cycle count: 3271547493 -> 3272550831 (+0.03%); split: -0.47%, +0.50%
Spills to memory: 56180 -> 56136 (-0.08%)
Fills from memory: 56180 -> 56136 (-0.08%)
Spills to reg: 108211 -> 110013 (+1.67%); split: -0.63%, +2.30%
Fills from reg: 99216 -> 100471 (+1.26%); split: -0.30%, +1.56%
Max warps/SM: 9921228 -> 9972060 (+0.51%); split: +0.52%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
Adding the zero constants have a minor impact on stats due to some unlucky
interactions with nir_opt_cse, opt_instr_sched_prepass and assign_regs.
Totals from 61 (0.01% of 1212873) affected shaders:
CodeSize: 1044720 -> 1047472 (+0.26%); split: -0.00%, +0.27%
Static cycle count: 1198932 -> 1198490 (-0.04%); split: -0.07%, +0.04%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
Stop passing drmVersionPtr to backends and make sure all
manual version checks are transitioned to
pan_kmod_driver_version_at_least() to encourage new checks
to do the same.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41704>
v14+ supports up to 256 layers in a single tiler descriptor. This comes
with the limitation that only one tiler descriptor is allowed per render
pass.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41640>
buffer_size is uint32_t so we must be careful to not overflow it.
radeonsi had code for this but radv doesn't, which means it will
hang if RADV_THREAD_TRACE_BUFFER_SIZE is too big or if buffer_size
is being doubled up to the point it overflows.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41383>
Instead of having each driver define :
- options through DRI_CONF_OPT_* macros
- call driQueryOption*() to parse those options
- define all the variables to old those options' values
We add one script to do it all for you. All you have to do now is list
all the options you want in a python file.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41697>
Delay sqtt init until all states/funcs have been set.
Also num_contexts is initialized at the end of si_create_context
so use num_contexts == 0 to test if this is the first context.
Fixes: b2db3e1ddc ("radeonsi: add si_gfx_context.c and move code from si_pipe.c")
Reviewed-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41474>
The nir_instrs_equal normalizes the some indices but hash_intrinsic
wasn't normalizing them. Reorganize the code so both do it using the
same helper.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Assisted-by: Pi coding agent (GPT-5.5)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41606>
Everything else related to VPE is already in mm subfolder, so let's
move the pipe_video_codec implementation there as well.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41439>
Unlike most other things where the MOCS setting combines the MOCS Index
and the protected memory bit, the EXECUTE_INDIRECT_DRAW/DISPATCH
commands take only the MOCS Index, and it's limited to only 4 bits.
Enabling the feature on ARL-H caused some tests to hit an assert when
the MOCS selected ended up out of range.
Rename the field to avoid confusion (and match documentation) and set it
through a helper function that calls the same old function and shifts it
down to fit.
Fixes: d1109f67bb ("iris: Emit EXECUTE_INDIRECT_DRAW when available")
Fixes: d161e3c2e2 ("iris: Emit a EXECUTE_INDIRECT_DISPATCH when available")
Fixes: 580728564e ("anv: Emit a EXECUTE_INDIRECT_DISPATCH when available")
Fixes: 6d4f43f0d6 ("anv: Emit EXECUTE_INDIRECT_DRAW when available")
Fixes: 7a9e82e82f ("genxml/12.5: Add the EXECUTE_INDIRECT_DISPATCH instruction")
Fixes: 4229757309 ("genxml/12.5: Add the EXECUTE_INDIRECT_DRAW instruction")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41372>