These cases are now covered in the round-trip tests in
src/intel/compiler/gen, where encoding and parsing logic for assembly
lives.
Assisted-by: Pi coding agent (GPT-5.5)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41920>
All the relevant pieces live in gen now, so pull the tests. Most of them now
are bidirectional: like before they test that assembler produces the expected
bytes, but now also that bytes will decode into the same assembler.
Some of the assembly was tweaked to be in verbose mode so that it can
round-trip correctly. The bytes in the files are the same as they were
in expected files from brw.
Assisted-by: Pi coding agent (GPT-5.5)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41920>
They now check round-trip up to encode/decode. Some tests were
dropped because they don't validate, so we don't really care
they round-trip just at parser/printer level.
Assisted-by: Pi coding agent (GPT-5.5)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41920>
Add a round-trip checker as a gentool subcommand: it reads one or
more case files of the form 'platform | bytes | assembly' and, per line,
checks that the bytes decode to the assembly and/or the assembly encodes to
the bytes. For example:
```
tgl | 40 00 03 00 20 82 05 05 04 06 10 02 2a 00 00 00 | add (8) r5 r6 0x0000002a
```
The second separator picks the directions ('|' both, '<' encode-only, '>'
decode-only) and the disassembly print mode follows the file-name suffix
(_verbose.txt, _translated.txt).
Also add the two bits the checker depends on: GEN_PRINT_NO_LABELS, so a lone
branch prints numeric jip/uip deltas rather than a synthesized label, and a
'gentool disasm --program-subset' option that decodes a program fragment.
Suggested by Alyssa.
Assisted-by: Pi coding agent (GPT-5.5)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41920>
Some formats don't care about execution size, so parser may produce
exec_size 0 for instructions like NOP. Avoid using cvt()-1 in this
case because it would generate a bad value for the encoder, which
would assert.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41920>
ir3 switches builder from main to preamble, but then adds
instructions to main using the preamble builder.
This wasn't a problem before, but we now store function impl
in blocks and this breaks that.
Reviewed-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41950>
Compute derivatives can use the same lane based path as fragment shaders
because a workgroup's invocations map to subgroup lanes in order. This
gives correct derivative quads on Valhall.
Advertise the extension for PAN_ARCH >= 9 with both derivative groups.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Jakob Sinclair <jakob.sinclair@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42142>
In QUADS mode the four invocations of a derivative quad must land in the
same subgroup quad. On Valhall the quad comes from consecutive lanes, but
local invocation ids are laid out row by row, so they do not match and the
y derivative reads the wrong neighbor.
shuffle_local_ids_for_quad_derivatives reorders the ids to fit the lane
layout. It only changes QUADS mode.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Jakob Sinclair <jakob.sinclair@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42142>
Even in scalar shader stages, some things are vectors. This tries to
shrink them. On DG2 and newer platforms, this is a pretty big help. On
older platforms, it is a disaster for spills (+10% across all of
fossil-db on TGL) and fills (+18% across all of fossil-db on TGL). The
pass is disabled on those platforms.
I have an unconfirmed hypothesis that this causes a bunch of extra
copies of SEND results to create the shrunk vectors on old platforms,
but new platforms can just have a smaller SEND destination.
Calling this once after the loop had negligible affect. Only calling
it inside the loop is effective.
shader-db:
Lunar Lake
total instructions in shared programs: 17092168 -> 17090738 (<.01%)
instructions in affected programs: 153196 -> 151766 (-0.93%)
helped: 804 / HURT: 58
total cycles in shared programs: 864408968 -> 864393158 (<.01%)
cycles in affected programs: 7727364 -> 7711554 (-0.20%)
helped: 624 / HURT: 264
total fills in shared programs: 1604 -> 1606 (0.12%)
fills in affected programs: 140 -> 142 (1.43%)
helped: 0 / HURT: 2
total sends in shared programs: 876960 -> 876422 (-0.06%)
sends in affected programs: 5421 -> 4883 (-9.92%)
helped: 388 / HURT: 42
LOST: 1
GAINED: 1
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 20008794 -> 20007561 (<.01%)
instructions in affected programs: 233049 -> 231816 (-0.53%)
helped: 855 / HURT: 108
total cycles in shared programs: 882324073 -> 882292946 (<.01%)
cycles in affected programs: 18182637 -> 18151510 (-0.17%)
helped: 665 / HURT: 343
total spills in shared programs: 4663 -> 4655 (-0.17%)
spills in affected programs: 130 -> 122 (-6.15%)
helped: 2 / HURT: 0
total fills in shared programs: 3990 -> 3984 (-0.15%)
fills in affected programs: 282 -> 276 (-2.13%)
helped: 2 / HURT: 2
total sends in shared programs: 1054303 -> 1053899 (-0.04%)
sends in affected programs: 5820 -> 5416 (-6.94%)
helped: 424 / HURT: 52
LOST: 0
GAINED: 4
No changes on Tiger Lake, Ice Lake, or Skylake because the pass is
disabled on those platforms.
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 914338134 -> 913822786 (-0.06%); split: -0.08%, +0.02%
CodeSize: 12884121056 -> 12875340544 (-0.07%); split: -0.15%, +0.08%
Subgroup size: 40514864 -> 40515040 (+0.00%)
Send messages: 40209899 -> 40209119 (-0.00%); split: -0.02%, +0.01%
Cycle count: 100085655414 -> 100071607083 (-0.01%); split: -0.18%, +0.16%
Spill count: 3459692 -> 3425132 (-1.00%); split: -1.33%, +0.33%
Fill count: 4909516 -> 4895879 (-0.28%); split: -1.09%, +0.81%
Max live registers: 191771666 -> 191693372 (-0.04%); split: -0.08%, +0.04%
Max dispatch width: 48502272 -> 48505424 (+0.01%); split: +0.02%, -0.01%
Non SSA regs after NIR: 136096908 -> 132514357 (-2.63%); split: -2.66%, +0.03%
Totals from 824156 (41.14% of 2003111) affected shaders:
Instrs: 609620701 -> 609105353 (-0.08%); split: -0.12%, +0.04%
CodeSize: 8652537232 -> 8643756720 (-0.10%); split: -0.22%, +0.12%
Subgroup size: 176 -> 352 (+100.00%)
Send messages: 24713957 -> 24713177 (-0.00%); split: -0.03%, +0.02%
Cycle count: 57143649989 -> 57129601658 (-0.02%); split: -0.31%, +0.28%
Spill count: 2919242 -> 2884682 (-1.18%); split: -1.58%, +0.40%
Fill count: 4407886 -> 4394249 (-0.31%); split: -1.21%, +0.90%
Max live registers: 94599082 -> 94520788 (-0.08%); split: -0.16%, +0.08%
Max dispatch width: 21189696 -> 21192848 (+0.01%); split: +0.04%, -0.03%
Non SSA regs after NIR: 90194236 -> 86611685 (-3.97%); split: -4.01%, +0.04%
No changes on Tiger Lake, Ice Lake, or Skylake because the pass is
disabled on those platforms.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41940>
We never called this for scalar shader stages, and nobody is quite sure
why. Some speculation is that there was no benefit before load / store
merging was added. There was also some speculation that it was harmful
before load / store merging could handle holes.
Given that only ~20 shaders in shader-db were affected, it's also
possible that no shaders were affected in scalar in scalar stages at the
time it was first added.
We may never know. ¯\_(ツ)_/¯
Calling it inside the loop had no impact, so call it once after the
loop.
I don't know why this hurts Ice Lake but helps every other platform.
shader-db:
All Iris platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17089936 -> 17089861 (<.01%)
instructions in affected programs: 23685 -> 23610 (-0.32%)
helped: 19 / HURT: 0
total cycles in shared programs: 864096306 -> 864099466 (<.01%)
cycles in affected programs: 1981658 -> 1984818 (0.16%)
helped: 12 / HURT: 7
LOST: 0
GAINED: 2
fossil-db:
Lunar Lake
Totals:
Instrs: 914554524 -> 914548221 (-0.00%); split: -0.00%, +0.00%
CodeSize: 12887150560 -> 12887094496 (-0.00%); split: -0.00%, +0.00%
Cycle count: 100103979198 -> 100103691332 (-0.00%); split: -0.00%, +0.00%
Spill count: 3459811 -> 3459692 (-0.00%)
Fill count: 4909786 -> 4909516 (-0.01%)
Max live registers: 191838197 -> 191831367 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 48514528 -> 48514576 (+0.00%)
Non SSA regs after NIR: 136347693 -> 136146918 (-0.15%); split: -0.15%, +0.00%
Totals from 17915 (0.89% of 2003490) affected shaders:
Instrs: 4205005 -> 4198702 (-0.15%); split: -0.15%, +0.00%
CodeSize: 57002192 -> 56946128 (-0.10%); split: -0.14%, +0.05%
Cycle count: 253980589 -> 253692723 (-0.11%); split: -0.26%, +0.14%
Spill count: 2026 -> 1907 (-5.87%)
Fill count: 2636 -> 2366 (-10.24%)
Max live registers: 1174571 -> 1167741 (-0.58%); split: -0.59%, +0.01%
Max dispatch width: 430368 -> 430416 (+0.01%)
Non SSA regs after NIR: 1005266 -> 804491 (-19.97%); split: -19.97%, +0.00%
Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 989799269 -> 989778469 (-0.00%); split: -0.00%, +0.00%
CodeSize: 16516706896 -> 16516376256 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 27542464 -> 27542528 (+0.00%)
Send messages: 44446154 -> 44446153 (-0.00%)
Cycle count: 91362833728 -> 91362723256 (-0.00%); split: -0.00%, +0.00%
Spill count: 3713932 -> 3713758 (-0.00%)
Fill count: 5001432 -> 5001144 (-0.01%)
Max live registers: 121358101 -> 121356271 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 38061600 -> 38060544 (-0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 161013837 -> 160662598 (-0.22%); split: -0.22%, +0.00%
Totals from 22841 (1.00% of 2278082) affected shaders:
Instrs: 4974061 -> 4953261 (-0.42%); split: -0.42%, +0.00%
CodeSize: 77949200 -> 77618560 (-0.42%); split: -0.44%, +0.02%
Subgroup size: 64 -> 128 (+100.00%)
Send messages: 279204 -> 279203 (-0.00%)
Cycle count: 176737437 -> 176626965 (-0.06%); split: -0.29%, +0.23%
Spill count: 2362 -> 2188 (-7.37%)
Fill count: 3162 -> 2874 (-9.11%)
Max live registers: 906456 -> 904626 (-0.20%); split: -0.21%, +0.01%
Max dispatch width: 451784 -> 450728 (-0.23%); split: +0.01%, -0.24%
Non SSA regs after NIR: 1477247 -> 1126008 (-23.78%); split: -23.78%, +0.00%
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41940>
This prevents regressions in r64 image store tests when the Intel
compilers enable the use of nir_opt_shrink_stores. On all platforms, ANV
lower stores in r64 image stores to write an ivec2 instead of an int64.
As an alternative, I did consider adding a callback. This would have
been very invasive, and it seemed really heavy handed.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41940>
The TfLiteRegistration.async_kernel field is missing initialization. Rather
than add explicit init for it, clear the whole struct to avoid future
issues. Newer versions of TFLite have added more fields.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42160>
When rounding the largest FP8 denorm up, the code ignored the updated
exponent and returned zero instead of the minimum normal value. Pack
the updated exponent in those cases.
Fixes: 2237c022a2 ("util: add float8 conversion functions")
Assisted-by: Pi coding agent (GPT-5.5)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42155>
We were allowing a possible merge of operation across VA mappings.
This is not a valid usage and will cause a ENOSPC to be returned by the
kernel side.
This fixes Forza Horizon 6 device lost when trying to enter in-game or
benchmark mode with VK_EXT_descriptor_heap MR.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 053b7f0f30 ("nvk/nvkmd: Implement nvkmd_ctx for nouveau")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42183>
Map the format AHARDWAREBUFFER_FORMAT_Y8 directly to
VK_FORMAT_R8_UNORM.
Y8 was previously missing from the mapping list, forcing it to be
imported as an external format. This routed MSAA resolves through
the External Format Resolve path, causing driver assertions due
to missing YCbCr metadata.
Direct mapping allows Y8 to be imported as a standard color format,
bypassing EFR and using the standard color resolve path instead.
Signed-off-by: Allen Ballway <ballway@chromium.org>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42057>
Unlike for v_pk_fmac_f16 and v_dual_dot2acc_f32_f16, opsel_hi is
implicitly true even for inline constants operands of v_dot2c_f32_f16 on GFX11.
Fixes: 3238e64d3c ("aco/ra: create v_dot2c_f32_f16")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42151>
Just overrided the needed entrypoint.
GFXBench 5.0 uses VK_IMAGE_LAYOUT_PREINITIALIZED as the old
layout when transitioning optimally-tiled depth images. Per the Vulkan
spec, PREINITIALIZED is only meaningful for linear tiling and is
semantically equivalent to UNDEFINED for optimal tiling. Replace it with
VK_IMAGE_LAYOUT_UNDEFINED to avoid hitting unhandled layout cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41943>
Color clears may happen via different paths: BLIT_EVENT_CLEAR, R2D, or a
draw call. And which path to take may depend sysmem/gmem selection.
The "Appendix I: Invariance" of the Vulkan spec encourages implementations
to produce the same results for the same operation.
Unfortunately I haven't found any ready-made packing functions in
the common utils.
Tested by writing edge-case color values through Vulkan ways of
clearing color, and from fragment shader.
E5B9G9R9, B10G11R11, B5G5R5, A2R10G10B10 are not handled due to
complexity.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41972>
Depth clears may happen via different paths: BLIT_EVENT_CLEAR, R2D, or a
draw call. And which path to take may depend sysmem/gmem selection.
The "Appendix I: Invariance" of the Vulkan spec encourages implementations
to produce the same results for the same operation.
Color clears have the same issue, but with depth it's much easier to
imagine a case where this may visibly affect rendering.
Note, depth and color values have different rounding rules.
Unfortunately I haven't found any ready-made packing functions in
the common utils.
Tested by writing edge-case depth values through Vulkan ways of
clearing depth, and from vertex shader.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41972>
Fix a Meson warning when building etnaviv without etnaviv tools:
WARNING: Build target etnaviv_isa_rs has no sources. This was never supposed to be allowed but did because of a bug, support will be removed in a future release of Meson
Arrays passed to the executable() link_with parameter are
flattened, so setting libetnaviv_isa_rs to an empty array
allows to link the etnaviv_disasm executable in the
etnaviv_isa_disasm test against only libetnaviv_encode,
as intended, but without the warning.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/11626
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42173>
When _eglNativePlatformDetectNativeDisplay fails to recognize a non-NULL
nativeDisplay pointer (e.g. an X11 Display* when Mesa was built without
HAVE_X11_PLATFORM), the old code unconditionally fell back to the
build-time _EGL_NATIVE_PLATFORM default. This could select a platform
that does not match the actual native display (e.g. Wayland for an X11
pointer), causing a crash when the wrong DRI driver tries to use the
native display as its own type.
Restructure the logic so that:
- For EGL_DEFAULT_DISPLAY: fall back to the build-time _EGL_NATIVE_PLATFORM
- For non-default displays that can't be detected: return _EGL_INVALID_PLATFORM
so that eglGetDisplay returns EGL_NO_DISPLAY instead of crashing
Assisted-by: DeepSeek V4 Flash
Closes#151
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42130>
There is no requirement that imported compressed memory is tied to an
image.
In practice it's unlikely to happen since unless drirc
anv_enable_buffer_comp is enabled we don't list the compressed memory
typed for anything but images. But you can build a case hitting the
assert without even creating an image.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5a05a39e56 ("anv: Limit the SCANOUT flag to color images")
Fixes: b7f7f1c74f ("anv: Treat imported compressed buffers as displayable (xe2)")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15578
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42121>