This applies only to Gfx9.
We're writting out of bounds to a wrong location.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1d9cf8f381 ("anv: add gfx9 generated draw support")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25361>
If we dispatch exactly a multiple of 8192 items, there is additional
lane left to generate the jump instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c950fe97a0 ("anv: implement generated (indexed) indirect draws")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25361>
We need to shut down the runtime queue threads before tearing down
anything else.
Gets rid of helgrind errors like this :
==212772== Possible data race during write of size 4 at 0xADCBFB0 by thread #1
==212772== Locks held: 1, at address 0x6B8F260
==212772== at 0x8AC3EFF: simple_mtx_destroy (simple_mtx.h:97)
==212772== by 0x8ACB24D: intel_ds_device_fini (intel_driver_ds.cc:603)
==212772== by 0x6CBD4D4: anv_device_utrace_finish (anv_utrace.c:471)
==212772== by 0x6C71577: anv_DestroyDevice (anv_device.c:3679)
==212772== by 0x6B2F1E2: loader_layer_destroy_device (loader.c:4358)
==212772== by 0x6B3F10B: vkDestroyDevice (trampoline.c:983)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: cc5843a573 ("anv: implement u_trace support")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10010
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25805>
We have only 6 of these boards since one died in May, and 7 jobs allocated
to them. So you ended up with a 5 minute delay on each pipeline with an
otherwise-idle farm while you waited for the first batch of jobs to
complete so you could get the last one started. It turns out that piglit
was taking 3 minutes of runtime each, so we can just shard piglit 2 ways,
stay under runtime, and not over-allocate the farm
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25790>
The following instructions :
- 3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP
- 3DSTATE_VIEWPORT_STATE_POINTERS_CC
- 3DSTATE_SCISSOR_STATE_POINTERS
do not care about the content/format/count of the render targets, only
the size of the render area and count of viewport/scissor.
By tracking render targets & render area we can reduce the emission of
those instructions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25778>
Heuristic-based optimization throttling CCS work (async compute).
Without throttling, background compute work consumes all threads,
deminishing performance gains by running dispatch in parallel with
3D work.
Optimization is heuristics based, meaning a workload might slow
down when using async compute.
Best value: PixelAsyncComputeThreadLimit = 4. On DG2, this
equates to a max CCS thread occupancy of 37.5%.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25508>
Previously all items on a timeline row would have the same name. This
change uses the tracepoint names to put into the timeline instead.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25730>
Useful to figure out what's the tracepoint name you're implementing.
We'll use this in the intel perfetto integration glue to index into an
array of perfetto iid.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25730>
There is not real need to use the spirv-linker here at all,
we can just read all the CL C files into one buffer, then compile
that buffer in a single pass.
This worksaround an issue seen with llvm17 and opaque pointers
and the spirv linker.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25759>
When vk_require_astc is true and there is no native ASTC LDR support,
enable ASTC LDR emulation.
vk_require_astc defaults to true on Android 14+.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
When the blit src has an emulated format, redirect to the hidden plane.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
When the view format is the same as the image format, and the format is
emulated, change the format to the decompressed format.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
Add anv_astc_emu_decompress to decompress the raw texel data to the
hidden plane. Call anv_astc_emu_decompress from anv_CmdCopyImage2 and
anv_CmdCopyBufferToImage2.
v2: support transfer queue and add missing flushes (Lionel)
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
When an image is created with an emulated format, add a hidden plane to
the image. The hidden plane will be used for decompressed data.
v2: assert no sparse residency
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
anv_is_format_emulated returns true when a format is emulated. It will
be used for ASTC LDR emulation, but it always return false at the
moment.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
They can be used to save/restore a subset of the current compute state.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
Add anv_descriptor_set_write as a helper for both
anv_UpdateDescriptorSets and anv_CmdPushDescriptorSetKHR.
v2: rename push_set to the more generic set_override
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
Do not assume anv_cmd_pipeline_state::push_descriptor is the currently
bound push descriptor set. With this and anv_push_descriptor_set_init,
it is possible to initialize a temporary push descriptor set on stack
for internal use.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
Add optional anv_state_stream to anv_image_view_init. This is useful
for internal image views whose lifetimes are tied to the lifetime of a
command buffer.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25467>
I don't know this fixes anything but I noticed the generated draws
jump into addresses slightly different from CPU generated jumps.
After checking the genxml, I noticed MI_BATCH_BUFFER_START "Batch
Buffer Start Address" fields have different sizes in Gfx8 & Gfx9+.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25705>
Use the correct miptail start LOD for the surfaces involved in the
XY_BLOCK_COPY_BLT/XY_FAST_COLOR_BLT instructions.
Thanks to Lionel for pointing out the issue.
Fixes: 46f45d62d1 ("intel/isl: Start using miptails")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25688>
Those workarounds are temporary for newer platforms so we can't use
INTEL_NEEDS_WA_*, luckly those already had runtime checks.
INTEL_NEEDS_WA_* was only used because it was accessing instructions
or fields of the instructions that only exists in gfx12 or gfx125.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25685>
On older platforms, we have the blitter engine, but it lacks compression
handling and other features we need, unfortunately, so enable the
transfer queue only on ACM+ platforms.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25667>
Now that we have proper handling of FCV_CCS_E everywhere, we can turn
this on for Gen12.5.
This helps fix a performance regression where enabling fast
clears to non-zero values with CCS_E caused additional partial resolves,
regressing performance on certain games. Performance is helped on the
following games:
- F1'22: +45%
- RDR2: +6%
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25589>
Surfaces with FCV_CCS_E aux usage should be marked as fast cleared when
being rendered to, to ensure proper fast clear state tracking. We also
need to ensure that we're not trying to partially resolve surfaces with
level > 0 and layer > 0 since we don't track fast clear states for
those.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25589>
Our implementation of secondary command buffers already jumps into
them and edits the end of the secondary command buffer to jump back
into the primary.
That implementation can work just the same with any levels of
secondary. The only possible issue would happen with a secondary
calling itself, but that's not possible.
We also cannot support simultaneous execution with self-modifying
command buffers. That's actually not a problem at the moment because
we don't have multiple queues of the same family but we choose to
reflect that in the feature bits.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25600>
First, we need to give the parent_instr field a unique name to be able to
replace with a helper. We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.
This was done with a combination of sed and manual fix-ups.
Then we use semantic patches plus manual fixups:
@@
expression s;
@@
-s->renamed_parent_instr
+nir_src_parent_instr(s)
@@
expression s;
@@
-s.renamed_parent_instr
+nir_src_parent_instr(&s)
@@
expression s;
@@
-s->parent_if
+nir_src_parent_if(s)
@@
expression s;
@@
-s.renamed_parent_if
+nir_src_parent_if(&s)
@@
expression s;
@@
-s->is_if
+nir_src_is_if(s)
@@
expression s;
@@
-s.is_if
+nir_src_is_if(&s)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>