We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
The android extension enables the driver to blit from single-sampled
color attachments.
Adding this image usage expressess that functionality and causes anv to
generate the ISL_FORMAT_RAW-formatted clear color during fast-clears.
This fixes an assert failure when anv tries to override the clear color
format used for a blorp_blit() call to ISL_FORMAT_RAW.
There are other ways to handle this, but this solution is consistent
with our handling of multisample images (which may be resolved as well).
Fixes: 465c186fc5 ("anv: Prepare for format width changes in blorp_copy()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15463
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41650>
The HW can do up to UINT32_MAX but we're using that value to signal
indirect dispatch arguments.
A game like Resident Evil Requiem will use more than 64k on X
dimension.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41592>
Android CTS CtsGpuProfilingDataTest#testProfilingDataProducersAvailable
intermittently fails with "Render stages reported before their
VkQueueSubmit events". Root cause is in the Perfetto clock correlation:
render-stage timestamps go through intel_device_info_timebase_scale()
while VkQueueSubmit packets use BOOTTIME directly, so any drift in the
scaler shows up as render stages preceding their submits.
intel_device_info_timebase_scale() scales the upper and lower halves
of the raw timestamp separately and recombines them, but silently
drops the upper-half division's remainder. When the frequency doesn't
evenly divide 1e9, every wrap past 2^32 loses a fixed number of ns
and shows up as a step in Perfetto's GPU-vs-BOOTTIME snapshot offset.
Carry the upper-half remainder into the lower-half numerator before
dividing, so no precision is lost. All intermediates still fit in
uint64_t.
Cc: mesa-stable
Signed-off-by: Nemallapudi, Jaikrishna <nemallapudi.jaikrishna@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41630>
Passes the ACCESS_CAN_REORDER flag from NIR on to the backend so that we
can lower the loads to a non-volatile SEND. This allows the scheduler to
freely reorder them around stores or fences.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
Our scheduler is overly conservative about reordering instructions
around memory writes or fences. Fortunately, there are several simple
assumptions we can make about our IR to schedule these things a lot
more fluidly:
* Unless its an EOT, a SEND instruction's side effects will only be
observed through other SEND instructions
* The effects of workgroup barriers, memory fences, and BRW_OPCODE_SYNC,
are only used in the IR to synchronize SEND instructions
* All other scheduler dependencies related to memory access are already
expressed through the source and destination operands
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
Some cooperative properties are defined by the driver itself and
are not a property of the HW. In particular whether the scope is
subgroup or workgroup is not directly related to the HW.
It could make sense encode the DPAS combinations into intel_device_info
but we are not using all possible combinations yet and wouldn't be very
useful in practice.
The new scheme was based on radv and will set us up for also filling
the flexible dimensions properties too.
Note: this also fixes a subtle issue where ARL was incorrectly inheriting
the PRE_XEHP configurations which included FLOAT16/FLOAT16/FLOAT16/FLOAT16
which it does not support.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41564>
Since we don't have any DPAS-based implementation of those, it is odd to
support them in the emulation mode that is only enabled with the debug
flag INTEL_LOWER_DPAS nowadays. Remove it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41564>
We can determine used components earlier.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41501>
Usually I'm able to run B580 capture on LNL, but in some cases the
oversubscription on replay would lead to allocation failures.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41501>
I've pulled in a pile of changes to reduce the overhead (runtime and
memory) when sharding for deqp-runner, along with a bunch of fixes for
KHR_display testing that we recently enabled, plus a few others that
affect our drivers.
The big new set of failures looks like it's from more complete coverage of
blitting between formats.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41243>
SEND operands don't have regions or types, hardware don't use those
bits except for possibly an old workaround. So from the perspective
of assembler, we shouldn't need to add them. For now brw_asm grammar
requires at least a type, so normalize to UD.
This will make easier to swap the parser syntax and code later.
Assisted-by: Pi coding agent (opus-4.7)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41456>
From the perspective of assembler, regions and types for ARF null are
not relevant -- so ignore them. We still have some validation relying
on the byte-stride of the destination, so keep those for now.
In the long run, if a certain Gfx version HW requires some specific
matching, the encoder (or the parser) should take care of it.
This change will make easier to swap the parser syntax and code later.
Assisted-by: Pi coding agent (opus-4.7)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41456>
This is a less obtuse error message for why things break.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
The hardware expects it to be present for every colour target.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
When lowering tg4 sparse testing to a non-gather opcode, we were adding
an explicit LOD 0 parameter. But we might already have a LOD or bias.
Fixes tests like:
dEQP-VK.glsl.texture_gather.basic.2d.rgba8.base_level.sparse_level_1_amd_lod
dEQP-VK.glsl.texture_gather.basic.2d.rgba8.base_level.sparse_level_1_amd_bias
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
This reverts commit 2ee6b4d96e.
The previous change avoids 0.25MB (1%) size change on the driver binary file,
but blocks the runtime enablement for some intel tools which is critical
to our optimization tasks.
It's not a good tradeoff based on the new need of the tool in runtime,
so revert this change.
Test: meson setup builddir -Dallow-fallback-for=libdrm -D build-tests=true -Dbuildtype=release --reconfigure && ninja -C builddir && cd builddir && meson test
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: hwandy <hwandy@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41525>
allows deleting piles of moves & pressure.
simd16 results:
Totals:
Instrs: 2759547 -> 2753358 (-0.22%); split: -0.29%, +0.06%
CodeSize: 41141280 -> 41071072 (-0.17%); split: -0.23%, +0.06%
Totals from 332 (12.54% of 2647) affected shaders:
Instrs: 648080 -> 641891 (-0.95%); split: -1.23%, +0.28%
CodeSize: 9782272 -> 9712064 (-0.72%); split: -0.97%, +0.25%
simd32 is a loss because of RA being stupid. again, this is obviously the right
thing to do so we're doing it. stats are just a hint.
Totals:
Instrs: 4683556 -> 4689193 (+0.12%); split: -0.25%, +0.37%
CodeSize: 70072256 -> 70171920 (+0.14%); split: -0.23%, +0.38%
Number of spill instructions: 50320 -> 50316 (-0.01%)
Number of fill instructions: 51530 -> 51526 (-0.01%)
Totals from 351 (13.26% of 2647) affected shaders:
Instrs: 1349954 -> 1355591 (+0.42%); split: -0.86%, +1.28%
CodeSize: 20484224 -> 20583888 (+0.49%); split: -0.80%, +1.29%
Number of spill instructions: 21762 -> 21758 (-0.02%)
Number of fill instructions: 26328 -> 26324 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
this is both a correctness fix (insufficient MEM registers reserved in some
cases) and a performance fix (unnecessary allocations & zeroing in the RA when
we don't spill).
fixes dEQP-VK.dgc.ext.compute.misc.scratch_space
stats are noise but positive i guess.
Totals from 35 (1.32% of 2647) affected shaders:
Instrs: 396770 -> 396690 (-0.02%)
CodeSize: 6040832 -> 6039600 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>