I noticed some shaders with patterns similar to these while working on
cooperative matrix lowering.
Meteor Lake and DG2 are the only platforms that support iadd3, so there
were no shader-db or fossil-db changes on any other platforms.
shader-db:
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19869445 -> 19868343 (<.01%)
instructions in affected programs: 419426 -> 418324 (-0.26%)
helped: 913 / HURT: 2
total cycles in shared programs: 936010029 -> 935909811 (-0.01%)
cycles in affected programs: 31746523 -> 31646305 (-0.32%)
helped: 495 / HURT: 356
LOST: 10
GAINED: 12
fossil-db:
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04%
Spill count: 146887 -> 146886 (-0.00%)
Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00%
Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5550128 -> 5550368 (+0.00%)
Totals from 4401 (0.70% of 632560) affected shaders:
Instrs: 3095239 -> 3086109 (-0.29%); split: -0.30%, +0.00%
Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10%
Spill count: 28105 -> 28104 (-0.00%)
Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02%
Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22%
Max dispatch width: 43768 -> 44008 (+0.55%)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
This allows us to not generate 64-bit iadd3 on Intel but continue
generating it for NVIDIA.
No shader-db or fossil-db changes.
v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit
iadd3 on NVIDIA platforms.
v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of
the optimizations from ever occuring.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
The size parameter can only take certain values. Make this visible to
the compiler to encourage it to inline the memcpy.
This improves descriptor_16combined_sampler from vkoverhead
considerably.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27191>
Macro values that define values for different HW generations should
use the V3DV_X helper instead of being defined under a V3D_VERSION #if
condition.
Without this change, the original V3D_CLE_READAHEAD and
V3D_CLE_BUFFER_MIN_SIZE definitions used were only working for 4.2 HW.
For the 7.1 HW (RPi5) the 4.2 definitions were applied.
The CLE MMU errors were hidden as they were reported at dmesg as
"MMU error from client PTB (1) at 0x1884200, pte invalid" instead of
client CLE. So fixes all v3d dmesg warnings for PTB MMU errors on RPi5.
With this change we really don't need different functions per HW generation,
so we rename back file v3dx_cl.c to v3d_cl.c. As before, we can use
only the packets definitions for 4.2 HW as they use the same opcode as 7.1 HW.
Fixes: 11dce2ac81 ("v3d: fix CLE MMU errors avoiding using last bytes of CL BOs.")
Fixes: e2c624e74e ("v3d: Increase alignment to 16k on CL BO on RPi5")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29496>
Macro values that define values for different HW generations should
use the V3DV_X helper instead of being defined under a V3D_VERSION #if
condition.
Without this change, the original V3D_CLE_READAHEAD and
V3D_CLE_BUFFER_MIN_SIZE definitions used were only working for 4.2 HW.
For the 7.1 HW (RPi5) the 4.2 definitions were applied.
The CLE MMU errors were hidden as they were reported at dmesg as
"MMU error from client PTB (1) at 0x1884200, pte invalid" instead of
client CLE. So fixes all v3dv dmesg warnings for PTB MMU errors on RPi5.
With this change we really don't need different functions per HW generation,
so we rename back file v3dvx_cl.c to v3dv_cl.c. As before, we can use
only the packets definitions for 4.2 HW as they use the same opcode as 7.1 HW.
It fixes also an indentation error introduced with 26c8a5cd72.
Fixes: bb77ac983e ("v3dv: Increase alignment to 16k on CL BO on RPi5")
Fixes: 26c8a5cd72 ("v3dv: fix CLE MMU errors avoiding using last bytes of CL BOs.")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29496>
It doesn't make sense to have two sets of opcodes for this when all backends
that support the flush_to_zero variant just rely on the global floating point
mode anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>
so that the queue count override logic can catch Android system
properties.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29492>
We now use a separate code path to get devinfo for running intel_clc,
so we don't need to set the INTEL_FORCE_PROBE env-var.
This reverts commit aa152ef431.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29445>
Now that we know when we are getting the devinfo as part of the build
process, we can just always force the devinfo to be returned,
regardless of whether INTEL_FORCE_PROBE is set.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29445>
Running intel_clc as part of the build doesn't need to issue this
warning.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29445>
Gfx11-12 can support SLM block loads via OWord Block Load messages
(notably, the aligned version, not the unaligned version).
A while back we deleted the SHADER_OPCODE_OWORD_BLOCK_READ opcode.
Rather than bring it back, we continue using UNALIGNED_OWORD_BLOCK_READ
for SLM block access (like we do for SSBOs) but switch it over to the
aligned variant when lowering logical sends. We do ensure the alignment
is at least 16B, however. This is ugly, but it's probably not worth
bringing back a whole extra opcode for a legacy HDC block load quirk.
References: BSpec 47652 and 1689
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9960
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29429>
Setting the range was overlooked when the fallback path was added.
Fixes: 930e4fa283 ("vulkan/android: Fix suggestedYcbcrModel with !mapper4")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29490>
With AHB + external format, we might get VK_FORMAT_UNDEFINED. And at
least with skiavk we might not get a chained VkExternalFormatANDROID.
In this case, just take the format from the image, which will have
already been resolved via VkExternalFormatANDROID when the image was
created.
See VUID-VkImageViewCreateInfo-image-02399
Also see commit 4f7de83110 ("venus: fix view format for ahb image")
for a similar fix.
Fixes the following cts tests:
CtsViewTestCases:
- android.view.cts.PixelCopyTest#testVideoProducer
CtsMediaDecoderTestCases:
- android.media.decoder.cts.DecodeAccuracyTest#testSurfaceViewLargerWidthDecodeAccuracy[50(c2.v4l2.avc.decoder_h264_520x360)]
- android.media.decoder.cts.DecodeAccuracyTest#testSurfaceViewLargerWidthDecodeAccuracy[50(c2.v4l2.avc.decoder_h264_520x360)]
CtsCameraTestCases:
- android.hardware.camera2.cts.MultiViewTest#testTextureImageWriterReaderOperation[1]
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29490>
This small patch fixes an issue where 'addr' is used uninitialized if
the assert gets removed due to compiling release code and thus
returning uninitialized 'addr'
v2: Modified based on initial review:
a) No need to initialize the 'addr' and 'ret' variables
b) Fix 'ret' variable to be proper type based on hw->get_mem return value
v3: Modified based on additional review:
a) Since both the simulator and mesa have their own version of
'unreachable()' and we cannot use ASSERT for the 'ret' value here,
just use a (void) ret after the assert
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29434>
Compared to vcn4, qp map unit is a 32bit number,
vcn5 uses 16bit integer number, in addition to
that it has 2 unit alignment requirement(32 bit
alignment) and each qp value needs left shift 7 bits.
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29423>
Have logic to handle tile allocation
according to vcn5's capability, if the
tile allocation is out of the limit, will
re-adjust the tile parameters.
re-construct frame header and obu instruction
logic. And add av1 encode params requried
for vcn5.
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29423>
Update header files for av1 tile and delta qp.
vcn5 needs driver and applcation to manage that
while in vcn4 they are managed in FW.
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29423>
We are writing 32-bit register value to buffer and were reading back
64-bit value back into two register. We don't need to read the second
register in this case.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29389>