In an earlier commit, I made us stop counting sync.nops in the shader
statistics we use for shader-db (brw_debug_log_message) and fossil-db
(stats->instructions = ...). However, I missed adjusting the printout
for INTEL_DEBUG.
Fixes: 1497f4e0c2 ("intel/fs: Don't include sync.nop in instruction count statistics")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31311>
... now that gralloc buffer metadata is more widely available
and actually populated.
Test: cvd start --gpu_mode=gfxstream_guest_angle_host_swiftshader
Test: cts -m CtsGraphicsTestCases
Test: cts -m CtsMediaCodecTestCases
Test: cts -m CtsMediaDecoderTestCases
Test: cts -m CtsViewTestCasesTest
Test: Open Youtube in Webview
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31330>
If A8_UNORM isn't specified in the format table, then it is emulated
in the state tracker by RGBA8. This is suboptimal, both because it requires
more memory, and because the blit gets more complicated (and in fact there's
a bug currently in the blit code where we don't mask properly for GL_ALPHA).
Fix this by adding an explicit A8_UNORM format entry.
Fixes piglit test ext_framebuffer_multisample-blit-mismatched-formats.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31322>
For cmd GL_PERFMON_RESULT_SIZE_AMD and GL_PERFMON_RESULT_AMD of
glGetPerfMonitorCounterDataAMD(), the current implementation will
return 0 if the result is not available on HW, but according to the
sepc, if cmd is PERFMON_RESULT_SIZE_AMD, <data> will contain actual
size of all counter results being sampled, instead of 0. And if cmd is
PERFMON_RESULT_AMD, <data> will contain results. The spec doesn't
require the application to wait for the result available on HW before
executing cmd PERFMON_RESULT_SIZE_AMD and PERFMON_RESULT_AMD. So for
cmd PERFMON_RESULT_SIZE_AMD, it should immediately return the correct
result size for the counters enabled by
glSelectPerfMonitorCountersAMD(), and for cmd PERFMON_RESULT_AMD, it
should wait until the result is available on HW and then return the
result.
Without this fix, the Sample Usage in the spec will not work properly
as it always gets size 0 when calling the cmd PERFMON_RESULT_SIZE_AMD
V2: If SelectPerfMonitorCountersAMD is called on a monitor, then the
result of querying for PERFMON_RESULT_SIZE will be 0.
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31179>
Call gfx10_make_texture_descriptor from si_make_texture_descriptor and
use si_make_texture_descriptor everywhere.
This is more readable.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30904>
Instead of allocating it and then leaking it, store the log context
in si_screen.
Also, the log context was only set for "general" instead of all aux
contexts.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30904>
When no pipeline cache is provided by the application and we rely on the
internal one, cache hits are not counted as such.
This was causing us to return COMPILE_REQUIRED on some cases where all
shaders had been found in the cache, as well as some unnecessary extra
processing in the case that we did have to compile the pipeline.
Fixes: 1dacea10f3 ("anv: implement caching for ray tracing pipelines")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31298>
We have not yet added the shaders to the pipeline->shaders array at
this point. If we couldn't compile (or were asked not to) the
pipeline, we were leaking references to any shaders found in the cache.
This would manifest as an assert on device destruction:
vk_pipeline_cache_destroy: Assertion `cache->object_cache->entries == 0' failed.
Fixes: 58c9f817cb ("anv: fix pipeline executable properties with graphics libraries")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31298>
Use the AMD_IMAGE_OPCODES=0 environment variable to test the lower image
opcode code path used by AMD CDNA/compute devices without graphics.
Runs a very limited subset of GL tests.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31180>
We don't have a generic v4i8 on Valhall, we have to lower it to two
v2i8. Fortunately, bi_make_vec_to() hides the Bifrost/Valhall
differences, so use that for nir_op_pack_uvec4_to_uint.
Fixes: 934b0f1add ("pan/bi: Respect swizzles for more vector ops")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31280>
We found out that some HW changes on Xe2 make the HW avoid reading the
blend state if we're using the null_rt bit in the extended descriptor.
Since the alpha_to_coverage bit resides in the blend state, that state
is ignored and writes are going through to the depth/stencil buffers.
Disable null_rt in the color outputs if the color outputs can affect
other outputs (through alpha_to_coverage & omask).
Fixes tests in this pattern on Xe2 :
dEQP-VK.pipeline.*.multisample.alpha_to_coverage_no_color_attachment.*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Backport-to: 24.2
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>
We'll want to tune this setting based on other parameters.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Backport-to: 24.2
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>
We setup an empty render target when there are no color attachments,
which effectively makes it a different surface state. In most cases
the compiler will insert a null-rt bit in the extended descriptor
which means the RT isn't even accessed. But in some cases like
alpha-to-coverage output + depth/stencil write, we will access the
render target because using the null-rt will prevent alpha-to-coverage
from happening.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2bd304bc8f ("anv: Skip the RT flush when doing depth-only rendering.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31196>
Device memory can be allocated from different threads and thus requires
locking when allocating/freeing virtual addresses.
Fixes: 53fb1d99ca ("panvk: Transition to explicit VA assignment on v10+")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31282>
In wave64 for hw with native wave32, ds_append seems to be split in a load for
the low half and an atomic for the high half, and other LDS instructions can
be scheduled between the two.
Which means the result of the low half is unusable because it might be out of date.
I was only able to reproduce this issue in WGP mode, but be conservative and
apply the workaround in CU mode too.
Foz-DB Navi31:
Totals from 13 (0.02% of 79395) affected shaders:
Instrs: 7599 -> 7656 (+0.75%)
CodeSize: 39708 -> 39972 (+0.66%)
Latency: 83174 -> 83572 (+0.48%)
InvThroughput: 8271 -> 8357 (+1.04%)
Copies: 718 -> 717 (-0.14%)
VALU: 3689 -> 3703 (+0.38%)
SALU: 935 -> 965 (+3.21%)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11921
Fixes: 45e935800a ("aco: implement nir_shared_append/consume_amd")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31301>
This change ensures that all these allocations are using
the same memory context.
For instance, this issue is triggered with:
"piglit/bin/arb_shader_image_load_store-host-mem-barrier -auto -fbo":
Indirect leak of 32816 byte(s) in 1 object(s) allocated from:
#0 0x7f49a35447ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7f49998e4b4f in ralloc_size ../src/util/ralloc.c:118
#2 0x7f49998e7521 in create_slab ../src/util/ralloc.c:801
#3 0x7f49998e7521 in gc_alloc_size ../src/util/ralloc.c:840
#4 0x7f49998e7d11 in gc_zalloc_size ../src/util/ralloc.c:868
#5 0x7f49999a6126 in nir_alu_instr_create ../src/compiler/nir/nir.c:682
#6 0x7f49999cba48 in clone_alu ../src/compiler/nir/nir_clone.c:217
#7 0x7f49999cc85a in clone_instr ../src/compiler/nir/nir_clone.c:456
#8 0x7f49999cee3a in clone_block ../src/compiler/nir/nir_clone.c:529
#9 0x7f49999cee3a in clone_cf_list ../src/compiler/nir/nir_clone.c:583
#10 0x7f49999d03be in clone_function_impl ../src/compiler/nir/nir_clone.c:660
#11 0x7f49999d13f7 in nir_function_impl_clone ../src/compiler/nir/nir_clone.c:678
#12 0x7f4999a0e2c5 in lower_call_function_impl ../src/compiler/nir/nir_functions.c:397
#13 0x7f4999a0e2c5 in function_link_pass ../src/compiler/nir/nir_functions.c:430
#14 0x7f4999a0e2c5 in function_link_pass ../src/compiler/nir/nir_functions.c:408
#15 0x7f4999a0e2c5 in nir_function_instructions_pass ../src/compiler/nir/nir_builder.h:108
#16 0x7f4999a0e2c5 in nir_link_shader_functions ../src/compiler/nir/nir_functions.c:452
#17 0x7f499ca30b8f in link_libintel_shaders ../src/gallium/drivers/iris/iris_program_cache.c:329
#18 0x7f499ca30b8f in iris_ensure_indirect_generation_shader ../src/gallium/drivers/iris/iris_program_cache.c:374
#19 0x7f499d185267 in gfx9_emit_indirect_generate ../src/gallium/drivers/iris/iris_indirect_gen.c:593
#20 0x7f499d119c79 in iris_upload_indirect_shader_render_state ../src/gallium/drivers/iris/iris_state.c:8744
#21 0x7f499fe86b01 in iris_indirect_draw_vbo ../src/gallium/drivers/iris/iris_draw.c:233
#22 0x7f499fe86b01 in iris_draw_vbo ../src/gallium/drivers/iris/iris_draw.c:343
#23 0x7f499a174e43 in tc_call_draw_indirect ../src/gallium/auxiliary/util/u_threaded_context.c:3828
#24 0x7f499a1557fe in batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:453
#25 0x7f499a1557fe in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:504
#26 0x7f499a167f26 in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:761
#27 0x7f499a168888 in tc_texture_map ../src/gallium/auxiliary/util/u_threaded_context.c:2783
#28 0x7f49986f2631 in pipe_texture_map ../src/gallium/auxiliary/util/u_inlines.h:556
#29 0x7f49986f2631 in _mesa_map_renderbuffer ../src/mesa/main/renderbuffer.c:494
#30 0x7f49991af7ca in readpixels_memcpy ../src/mesa/main/readpix.c:260
#31 0x7f49991af7ca in _mesa_readpixels ../src/mesa/main/readpix.c:898
#32 0x7f499931ee23 in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:575
#33 0x7f49991b40b5 in read_pixels ../src/mesa/main/readpix.c:1199
#34 0x7f49991b40b5 in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1216
#35 0x7f49991b4a20 in _mesa_ReadPixels ../src/mesa/main/readpix.c:1231
...
SUMMARY: AddressSanitizer: 323648 byte(s) leaked in 201 allocation(s).
Fixes: 5438b19104 ("iris: enable generated indirect draws")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31313>
This is a regression happening with the commit 87b99d5797 ("nir: use copysign for atan").
Indeed, the opcode "copysign" was generating an incompatible i915 sequence.
For instance, this issue is triggered with
"deqp-gles2 --deqp-case=dEQP-GLES2.functional.shaders.operator.angle_and_trigonometry.atan2.highp_float_vertex":
deqp-gles2: ../src/compiler/nir/nir_lower_int_to_float.c:239: lower_alu_instr: Assertion `nir_alu_type_get_base_type(info->output_type) != nir_type_int && nir_alu_type_get_base_type(info->output_type) != nir_type_uint' failed.
Fixes: c4cec84231 ("nir/i915g/r300/nv30: skip marking varyings as flat in some drivers")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31315>
Starting with a750, the ALWAYS_ON counter is initialized from a loadable
counter in CX power domain, which is never turned off except during a
GPU reset. This means that timestamps should always be monotonic except
if the GPU resets, in which case subsequent submits should return
DEVICE_LOST anyway. Thus it should be good enough to satisfy the Vulkan
requirement that vkCmdWriteTimestamp is monotonic.
kgsl tries to synchronize the CX counter to the CPU counter, and
additionally adds a synchronization ioctl to improve the accuracy. I'm
not sure whether the former is really useful for us, but the latter
should eventually be implemented in drm/msm. However for now we can
expose the extension without any kernel support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31100>
It appears this was missed due to fractional runs, but the fix for this issue has already been merged.
While the test hasn't run in the full runs yet, it should now be considered fixed, just like on other GPUs.
Fixes: 812c8f6abe ("tu: Treat partially-bound depth/stencil attachments as passthrough")
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31286>
The expectations for the manual runs were missed during the uprev, update them now.
Fixes: 213f5e9152 ("Uprev Piglit to e9ab30aeaed97b69868cf4d6d6a3f70f3b53c362")
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31286>
KHR-GL46.texture_swizzle.functional usually takes 50+ seconds and can
time out on occasion. This isn't caused by the new kernel, it always
took this long. Skip it for pre-merge jobs on a630.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31286>
In traces produced with PAN_MESA_DEBUG, print swizzles in human readable
form (like BGRA) as well as the raw decimal format we were printing
before. This is purely a convenience feature for developers.
Reviewed-by: Boris Brezilllon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31242>
This adds an additional aux_context, so that the gfx queue isn't stalled
due to clearing buffers or initializing DCC.
This aux context will only be used by resource_create, which will allow
us to remove all barriers around the clears because there are no others
users of those buffers on that context.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31291>