Since VkQueryPipelineStatisticFlagBits is extended by two bits for
task/mesh shader invocations, ANV_PIPELINE_STATISTICS_MASK should be
defined conditionally based on GFX_VER.
This commit modifies the mask and updates the vk_pipeline_stat_to_reg
array accordingly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29523>
Only MCS surfaces are affected because SRGB format are not listed as
supporting CCS compression.
Fixes CTS test :
dEQP-VK.api.image_clearing.core.clear_color_attachment.single_layer.*_srgb_*sample_count_*
dEQP-VK.api.image_clearing.dedicated_allocation.clear_color_attachment.single_layer.*srgb*
This is similar to what we did in Iris in f8961ea0 ("iris: Disable
sRGB fast-clears for non-0/1 values").
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10003
Fixes: 4cfb4f7d12 ("anv: support fast color clears on vkCmdClearAttachments")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29518>
This was added in 6e0089307e without a
dependency on libdrm_nouveau. If libdrm is not compiled with nouveau
enabled, the build errors here.
This is currently unused and since
821f4c8d99 removed dependency on
libdrm_nouveau, this should be gone too.
Fixes: 821f4c8d99 ("nouveau: import libdrm_nouveau")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29517>
If a buffer upload replaces the entire valid_buffer_range, we can
promote the update to DISCARD_WHOLE_RESOURCE. This helps badly behaved
apps which constantly upload to the same offset (but are overwriting
the entire valid range each time).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29507>
Allow drivers to limit the amount of replacement buffers created, to
avoid runaway memory scenarios.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29507>
This is mostly a revert of cc78a42577 but we leave the meson tools
option as there is now a disassem tool.
This standalone compiler is unmaintained. The replacement is using
drm_shim which goes through the maintained/tested path.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29494>
It's sad, but killing the rebuild because of 60m when it finishes in 70m
it's such a waste. The Marge-bot pipeline will timeout anyway, so no
change here.
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29513>
Iris calls iris_resource_get_param with PIPE_RESOURCE_PARAM_STRIDE
internally now when exporting memory objects. OpenCL's gl_sharing allows
to export buffers as well, which do not have strides.
This fixes the assert being hit there for buffers.
Fixes: 831703157e ("iris: Use resource_get_param in resource_get_handle")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29501>
Like it describes in the comment section of VK_QUERY_TYPE_OCCLUSION,
only occlusion and timestamps queries needs ANV_COPY_QUERY_FLAG_PARTIAL.
VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT is captured by MI commands.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29493>
This opcode gets translated to 2 ALU instructions with dependency ALU
stall. This change reproduces the FS_OPCODE_PACK_HALF_2x16_SPLIT
values which is another opcode that generates 2 instructions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
v2: Add some comments explaining some of the nuance of the shift
optimizations. Fix a bug in the shift count calculation of the upper
32-bits. Move the @64 from the variable to the opcode. All suggested
by Jordan.
No shader-db changes on any Intel platform.
fossil-db:
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154507026 -> 154506576 (-0.00%)
Cycle count: 17436298868 -> 17436295016 (-0.00%)
Max live registers: 32635309 -> 32635297 (-0.00%)
Totals from 42 (0.01% of 632575) affected shaders:
Instrs: 5616 -> 5166 (-8.01%)
Cycle count: 133680 -> 129828 (-2.88%)
Max live registers: 1158 -> 1146 (-1.04%)
No fossil-db changes on any other Intel platform.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
I noticed some shaders with patterns similar to these while working on
cooperative matrix lowering.
Meteor Lake and DG2 are the only platforms that support iadd3, so there
were no shader-db or fossil-db changes on any other platforms.
shader-db:
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19869445 -> 19868343 (<.01%)
instructions in affected programs: 419426 -> 418324 (-0.26%)
helped: 913 / HURT: 2
total cycles in shared programs: 936010029 -> 935909811 (-0.01%)
cycles in affected programs: 31746523 -> 31646305 (-0.32%)
helped: 495 / HURT: 356
LOST: 10
GAINED: 12
fossil-db:
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04%
Spill count: 146887 -> 146886 (-0.00%)
Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00%
Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5550128 -> 5550368 (+0.00%)
Totals from 4401 (0.70% of 632560) affected shaders:
Instrs: 3095239 -> 3086109 (-0.29%); split: -0.30%, +0.00%
Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10%
Spill count: 28105 -> 28104 (-0.00%)
Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02%
Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22%
Max dispatch width: 43768 -> 44008 (+0.55%)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
This allows us to not generate 64-bit iadd3 on Intel but continue
generating it for NVIDIA.
No shader-db or fossil-db changes.
v2: Add nir_lower_iadd3_64 flag so we can continue to generate 64-bit
iadd3 on NVIDIA platforms.
v3: s/bit_size == 64/s == 64/. This cut-and-paste bug prevented any of
the optimizations from ever occuring.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
The size parameter can only take certain values. Make this visible to
the compiler to encourage it to inline the memcpy.
This improves descriptor_16combined_sampler from vkoverhead
considerably.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27191>
Macro values that define values for different HW generations should
use the V3DV_X helper instead of being defined under a V3D_VERSION #if
condition.
Without this change, the original V3D_CLE_READAHEAD and
V3D_CLE_BUFFER_MIN_SIZE definitions used were only working for 4.2 HW.
For the 7.1 HW (RPi5) the 4.2 definitions were applied.
The CLE MMU errors were hidden as they were reported at dmesg as
"MMU error from client PTB (1) at 0x1884200, pte invalid" instead of
client CLE. So fixes all v3d dmesg warnings for PTB MMU errors on RPi5.
With this change we really don't need different functions per HW generation,
so we rename back file v3dx_cl.c to v3d_cl.c. As before, we can use
only the packets definitions for 4.2 HW as they use the same opcode as 7.1 HW.
Fixes: 11dce2ac81 ("v3d: fix CLE MMU errors avoiding using last bytes of CL BOs.")
Fixes: e2c624e74e ("v3d: Increase alignment to 16k on CL BO on RPi5")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29496>
Macro values that define values for different HW generations should
use the V3DV_X helper instead of being defined under a V3D_VERSION #if
condition.
Without this change, the original V3D_CLE_READAHEAD and
V3D_CLE_BUFFER_MIN_SIZE definitions used were only working for 4.2 HW.
For the 7.1 HW (RPi5) the 4.2 definitions were applied.
The CLE MMU errors were hidden as they were reported at dmesg as
"MMU error from client PTB (1) at 0x1884200, pte invalid" instead of
client CLE. So fixes all v3dv dmesg warnings for PTB MMU errors on RPi5.
With this change we really don't need different functions per HW generation,
so we rename back file v3dvx_cl.c to v3dv_cl.c. As before, we can use
only the packets definitions for 4.2 HW as they use the same opcode as 7.1 HW.
It fixes also an indentation error introduced with 26c8a5cd72.
Fixes: bb77ac983e ("v3dv: Increase alignment to 16k on CL BO on RPi5")
Fixes: 26c8a5cd72 ("v3dv: fix CLE MMU errors avoiding using last bytes of CL BOs.")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29496>
It doesn't make sense to have two sets of opcodes for this when all backends
that support the flush_to_zero variant just rely on the global floating point
mode anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>