If multiview is enabled on the render pass, baseLayer and layerCount
will be 0 and 1 respectively and throw us off.
We can still fast clear if view_mask == 1, but anything else hits the
BLORP_BATCH_NO_EMIT_DEPTH_STENCIL restriction.
Fixes: e488773b29 ("anv: Fast clear depth/stencil surface in vkCmdClearAttachments")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 5d22f307d5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
Like accumulators and ARF address registers, the virtual address
registers are not tracked in a way the defs analysis can know
about. This could actually be fixed, but that is future work.
Fixes: b110b06447 ("brw: introduce a new register type for the address register")
Suggested-by: Lionel
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 8624da56ee)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
brw_reg::nr encodes both which ARF it is and which instance of that
ARF. In other words, nr for acc0 and acc2 have some bits that say
BRW_ARF_ACCUMULATOR and some bits that say 0 vs 2. The previous test
would only detect acc0.
Fixes: 0d144821f0 ("intel/brw: Add a new def analysis pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 366410e913)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
This can occur if NULL or an accumulator is an explicit destination.
update_for_reads still needs to process the sources.
v2: Pass a brw_reg to ::mark_invalid, and do the VGRF check in that one
place.
Fixes: 0d144821f0 ("intel/brw: Add a new def analysis pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit a548466186)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
With perfetto that string is processed later leading to
use-after-free.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 413e169f45)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Blorp emits 3DSTATE_BINDING_TABLE_POINTER_* instructions in 3D mode.
At the moment we're saved by the push constants reemitting the btp but
we'll drop that in the next commit.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 533c748b34)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The new compression scheme introduced in Xe2 also applies to Xe3, so
we're liable for the same bugs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2418c91537 ("anv/drirc: disable Xe2 CCS drm modifiers for GTK engine")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 4ac47f8dde)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
If either source of the CMP is modified before an appropriate ADD is
found, the ADD and the CMP will not have the same result.
No shader-db changes on any ELK platform. I suspect the problematic
cases only occur after scheduling has rearranged instructions. This is
likely the reason BRW didn't experience this problem until 09450faf.
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit da1fd9786b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This is a backport of BRW e26270249b.
shader-db:
All Intel platforms had similar results. (Broadwell shown)
total instructions in shared programs: 18623918 -> 18624594 (<.01%)
instructions in affected programs: 125179 -> 125855 (0.54%)
helped: 0 / HURT: 139
total cycles in shared programs: 957073100 -> 957072484 (<.01%)
cycles in affected programs: 16534168 -> 16533552 (<.01%)
helped: 42 / HURT: 68
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit bdbfe8de4d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
If either source of the CMP is modified before an appropriate ADD is
found, the ADD and the CMP will not have the same result.
shader-db:
Lunar Lake
total instructions in shared programs: 17098815 -> 17098818 (<.01%)
instructions in affected programs: 1187 -> 1190 (0.25%)
helped: 0 / HURT: 3
total cycles in shared programs: 876858960 -> 876858968 (<.01%)
cycles in affected programs: 6878 -> 6886 (0.12%)
helped: 0 / HURT: 1
Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown)
total instructions in shared programs: 20034973 -> 20034984 (<.01%)
instructions in affected programs: 4599 -> 4610 (0.24%)
helped: 0 / HURT: 11
total cycles in shared programs: 881033088 -> 881033108 (<.01%)
cycles in affected programs: 57872 -> 57892 (0.03%)
helped: 0 / HURT: 5
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 918873064 -> 918873269 (+0.00%)
CodeSize: 14747338416 -> 14747339360 (+0.00%); split: -0.00%, +0.00%
Cycle count: 104141836677 -> 104141840371 (+0.00%); split: -0.00%, +0.00%
Totals from 205 (0.01% of 2011421) affected shaders:
Instrs: 290415 -> 290620 (+0.07%)
CodeSize: 4280704 -> 4281648 (+0.02%); split: -0.01%, +0.03%
Cycle count: 18166526 -> 18170220 (+0.02%); split: -0.00%, +0.02%
Closes: #14874
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit d1614cd6db)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Make sure that lowering undone in elk_nir_optimize are reapplied.
No shader-db or fossil-db changes on any Intel platform. This is most
likely to impact either Gfx8 on ANV or Gfx7.5 on HASVK. I don't
fossil-db test either of those platforms.
I tried doing a similar thing here as is done in BRW (previous commit),
but that caused a couple Haswell shaders to fall off a performance
cliff:
total spills in shared programs: 8247 -> 8311 (0.78%)
spills in affected programs: 6 -> 70 (1066.67%)
helped: 0 / HURT: 2
total fills in shared programs: 8558 -> 8910 (4.11%)
fills in affected programs: 6 -> 358 (5866.67%)
helped: 0 / HURT: 2
Fixes: 442daeb54a ("nir/opt_algebraic: use fcanonicalize")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit df704bd38e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This was incorrect for OpenCL due to the possibility of variable shared memory
existing despite shared_size == 0. Fortunately the optimization it was trying to
do should be done in NIR via nir_opt_barrier_modes so we can just drop the brw
code and move on with our merry lives. Fixes OpenCL tests on Iris:
non_uniform_work_group non_uniform_3d_barriers
basic async_strided_copy_local_to_global
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bd5ebbb2f8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
A descriptor buffer promoted to push constants requires a constant
cache invalidation if it is modified on the device.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 42b70cf05a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
Given a situation like this :
- CB_A: begin, renderDepthA, end
- CB_B: begin, computeA, barrier (depth), computeB, end
The depth cache is not being flushed between renderDepthA & computeB
because :
- it's not flushed at the end of CB_A (it's not required)
- when CB_B starts, we're still on GFX pipeline mode but do not
flush render caches because pipeline mode is unknown
- when barrier is CB_B is executed, we're already in compute
pipeline mode and HW cannot flush depth.
The fix is to flush RT/depth cached when switching from unknown
pipeline mode any pipeline mode.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e6dae6ef5f ("vulkan: Optimize implicit end_subpass barrier")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14816
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Tested-by: David Gow <david@davidgow.net>
(cherry picked from commit 888ac904a3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This bit is set in mocs for other protected attachment types by
anv_image_fill_surface_state() however was ommited for depth/stencil
attachments here.
Without the protected bit set, it causes heavy black artifacting when
attaching a protected depth attachment image to a framebuffer.
Fixes: 794b0496e9 ("anv: enable protected memory")
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit f84ed620c2)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
`operands_match` was modifying instruction source operands in-place
(through the `elk_fs_reg *src` pointer member) and relying on a
save/restore pattern to undo the modifications. Work on local copies
instead, which is simpler and avoids mutating shared state in a
comparison function.
Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit 14c65322e8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The MUL case in `operands_match` was reading and writing the `.f` union
member unconditionally, even when the register's `.file != IMM`. In that
case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the
`fabsf()` call could corrupt the `.nr` by clearing bit 31.
Guard all `.f` accesses with `.file == IMM` checks.
Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit 93f39f87c4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
`operands_match` was modifying instruction source operands in-place
(through the `brw_reg *src` pointer member) and relying on a
save/restore pattern to undo the modifications. Work on local copies
instead, which is simpler and avoids mutating shared state in a
comparison function.
Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit b302faad8b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
The MUL case in `operands_match` was reading and writing the `.f` union
member unconditionally, even when the register's `.file != IMM`. In that
case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the
`fabsf()` call could corrupt the `.nr` by clearing bit 31.
Guard all `.f` accesses with `.file == IMM` checks.
Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit f5e0f63216)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
This has not been problem before the compression hint given to kernel
but now that we set it we hit problems when allocating bo if modifier
does not support compression.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14625
Fixes: f91de58818 ("anv: Add support to DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit fc814fa828)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Fixes a race condition where a BVH will be dumped before its command buffer is
actually submitted if a different command buffer completes between the time the
BVH dump is recorded and the time the command buffer is actually submitted.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Fixes: 1b55f101 ("anv/bvh: Dump BVH synchronously upon command buffer completion")
(cherry picked from commit 95e471e558)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
Not enough tested on over Gen12 platforms.
Turns out to be not working on DG2, for example.
Cc: mesa-stable
Closes: #14449
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39676>
(cherry picked from commit d2c24a0d8b)
For non-WSI images, explicitly map VK_IMAGE_LAYOUT_PRESENT_SRC_KHR to
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL in anv_layout_to_aux_state().
Before this patch, the function passed PRESENT_SRC into
vk_image_layout_to_usage_flags() and got a return value of 0 from it
(that function expects that layout to be explicitly handled by the
caller). This caused the logic dependent on the return value to be
unreliable.
Fixes: c5cad407f8 ("anv: handle non-wsi images in anv_layout_to_aux_state")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39618>
(cherry picked from commit f616d4fb2a)
Don't return early from anv_layout_to_fast_clear_type() for Xe2+. We'll
need to make more use of the function for some MCS changes in later
commits.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 811c413f98)
The pMiColStarts/pMiRowStarts arrays from applications may have
incorrect units. Instead of using them directly, compute the tile
start positions in superblock units internally based on the tile
dimensions.
Cc: mesa-stable
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39471>
(cherry picked from commit 8e9fec8e40)
This fixes bunch of cts tests hitting issues when attempting
anv_image_mcs_op with compute.
Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39581>
(cherry picked from commit 85978ccd28)
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.
Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.
Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*
Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
(cherry picked from commit 5b48805b42)
We started allowing non-default clear colors with FCV in commit
cd8e120b97. When rendering to an image with FCV, set the fast-clear
type to ANV_FAST_CLEAR_ANY if the image properties allow such
fast-clears.
Fixes: cd8e120b97 ("anv: Allow more single subresource fast-clears with FCV")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit ce196c9de5)
From RENDER_SURFACE_STATE::AuxiliarySurfaceQPitch on BDW+,
This field must be set to an integer multiple of the Surface
Vertical Alignment
Accomplish this by aligning the height of each MCS layer to main
surface's vertical alignment. Prevents the following test group from
failing on Xe2 when a future commit enables multi-layer fast-clears in
anv:
dEQP-VK.api.image_clearing.*.
clear_color_attachment.multiple_layers.
*_clamp_input_sample_count_*
The main test I used to debug this:
dEQP-VK.api.image_clearing.core.
clear_color_attachment.multiple_layers.
a8b8g8r8_unorm_pack32_64x11_clamp_input_sample_count_2
Backport-to: 25.3
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit eb4a581e44)
Tested on PTL, fixes various copy_and_blit tests that utilize compute
after ab9d3528dc that exposed this to them.
Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39548>
(cherry picked from commit bb84773c81)
Prevent assert failures in a future commit where Tile64 will be selected
more often.
Fixes: 42ef23ecd1 ("intel/blorp: Don't redescribe some Tile64 clears")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
(cherry picked from commit 6fc0e5c0aa)
When determining if an LOD can fit within a miptail, we must minify in
pixel space and then convert to elements.
Prevents the following test case from failing when Yf is force-enabled:
dEQP-VK.image.texel_view_compatible.graphic.extended.3d_image.texture_read.astc_8x5_srgb_block.r32g32b32a32_uint
Fixes: 46f45d62d1 ("intel/isl: Start using miptails")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
(cherry picked from commit add742fca6)
The previous Gfx12+ implementation using bit masking is failing for FP8
types, so replacing with explicit lookup tables.
For float types, the encoding now aligns with brw_data_type_float, ensuring
correct behavior for DPAS and other 3-source instructions.
Fixes: d1d4e3d530 ("brw: Add EU assembler support for float8")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39448>
(cherry picked from commit 0ce4e8ba6f)