We started allowing non-default clear colors with FCV in commit
cd8e120b97. When rendering to an image with FCV, set the fast-clear
type to ANV_FAST_CLEAR_ANY if the image properties allow such
fast-clears.
Fixes: cd8e120b97 ("anv: Allow more single subresource fast-clears with FCV")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
On Xe2+, HSD 14011946253 and the related documents explain that MCS
still only supports a single clear color.
Fixes: df006bba02 ("iris: Update aux state for color fast clears (xe2)")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
When changing the clear color without a fast clear, use dirty bits to
ensure that surfaces with inline clear colors are updated and that
partial resolves are done as needed.
Remove the flags at the bottom of fast_clear_color() as
blorp_fast_clear() already sets them for us.
Fixes: 64d861b700 ("iris: Skip some fast-clears even on color changes")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
From RENDER_SURFACE_STATE::AuxiliarySurfaceQPitch on BDW+,
This field must be set to an integer multiple of the Surface
Vertical Alignment
Accomplish this by aligning the height of each MCS layer to main
surface's vertical alignment. Prevents the following test group from
failing on Xe2 when a future commit enables multi-layer fast-clears in
anv:
dEQP-VK.api.image_clearing.*.
clear_color_attachment.multiple_layers.
*_clamp_input_sample_count_*
The main test I used to debug this:
dEQP-VK.api.image_clearing.core.
clear_color_attachment.multiple_layers.
a8b8g8r8_unorm_pack32_64x11_clamp_input_sample_count_2
Backport-to: 25.3
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
Mesa now has a statistics framework. This adds support for emitting
additional statistics about PowerVR shaders for the Rogue architecture.
Add support for emitting the following statistics: Code size, scratch
size, spill count, temp count, loop count, number of inst groups, number
of main inst groups, number of bitwise inst groups and number of control
inst groups.
Add support for new PCO_DEBUG_PRINT option "stats" to emit shader stats.
Signed-off-by: Duncan Brawley <duncan.brawley@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39523>
This avoids generating some useless math that would need to be cleaned
up later, without complicating things too much.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
This helps cut down URB messages on tessellation and mesh shaders
significantly. fossil-db results on Battlemage:
Instrs: 505172392 -> 505207187 (+0.01%); split: -0.00%, +0.01%
Send messages: 23678197 -> 23656126 (-0.09%); split: -0.09%, +0.00%
Cycle count: 63150470088 -> 63147482640 (-0.00%); split: -0.01%, +0.00%
Spill count: 576554 -> 576616 (+0.01%)
Fill count: 545304 -> 545413 (+0.02%)
Max live registers: 141099192 -> 141150675 (+0.04%); split: -0.00%, +0.04%
Max dispatch width: 39856192 -> 39856208 (+0.00%)
Totals from 4231 (0.27% of 1583648) affected shaders:
Instrs: 1620161 -> 1654956 (+2.15%); split: -0.25%, +2.40%
Send messages: 128652 -> 106581 (-17.16%); split: -17.18%, +0.03%
Cycle count: 24650700 -> 21663252 (-12.12%); split: -12.82%, +0.70%
Spill count: 378 -> 440 (+16.40%)
Fill count: 1308 -> 1417 (+8.33%)
Max live registers: 364676 -> 416159 (+14.12%); split: -0.24%, +14.36%
Max dispatch width: 67952 -> 67968 (+0.02%)
There are several reasons we didn't go with nir_opt_vectorize_io:
1. nir_opt_vectorize_io appears to work on the slot location level.
We want to be able to vectorize based on the URB offsets, especially
for cases like point size, layer, and viewport which have different
VARYING_SLOT_* values but live in the same vec4 in a URB entry.
2. We want vec8 stores, and nir_opt_vectorize_io only seems to vectorize
within a single 32-bit vec4. It does handle 8 components, but that's
only for packing 16-bit values into a 32-bit vec4.
Improves performance of Sascha Willems' tessellation demo by around 4%
on Meteorlake.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
Both the URB Global Offset and Per-Slot Offsets are specified to be
unsigned numbers. The URB Global Offset is only 11 bits, and so is
limited to be between [0, 2047]. While the per-slot offsets are
given as U32 values, it would appear that adding the two offsets
does not handle 32-bit overflow/unsigned wrap correctly.
This pops up in Piglit's TCS variable-indexing tests, which ends up
performing loads from offset (x - 16) and a base of 18, and at an offset
(x) with a base of 2. These should be equivalent, but when x <= 15, the
per-slot offset calculated in the shader is negative (0xfffffff[0-f])
and adding the base of 18 is not wrapping around correctly to [2, 17].
To work around this, avoid using the global offset when the per-slot
offset is present, and just add the two in the shader where unsigned
wrap works correctly.
Tigerlake and later don't seem to have this issue.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
We were checking for 0xf which is fine for vec4, but vec8 gets 0xff.
Either way, nothing is writemasked, so we can skip sending the mask.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
By default, load/store vectorization uses nir_round_up_components()
to round up loads and possibly writemasked stores to the next valid
NIR vector width. However, some backends may not support load/stores
at all sizes. For example, older Intel supports only power-of-two
vector widths. Newer Intel also supports vec2 and vec3, but not
vec5/6/7. By providing a callback, backends can request promotion
to their next supported memory load/store vector width.
The existing "should we vectorize?" callback should continue to return
false for unsupported vector widths (i.e. beyond the maximum supported).
With this new callback, they do not need to say "no" to vectorization
that would normally produce an unsupported count (e.g. vec5/6/7) but
instead request that the component count be rounded up appropriately.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
This adds a new option, round_up_store_components, which rounds up the
number of components for stores that support writemasking to the next
valid vector size. For example, vec4+vec2 stores would round up from
6 components (which wouldn't be supported) to a full supportable vec8
store, relying on writemasking to ensure the correct pieces are written.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
URB intrinsics are simply memory load/stores to a special memory region,
so it's pretty reasonable to handle these in the memory vectorizer. We
treat emit_vertex_* intrinsics as a barrier for shader outputs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
This makes it easier for NIR passes to distinguish between inputs and
outputs without having to reason about which URB handle source was
passed to the intrinsic. It probably also makes it a bit easier for
humans to read the NIR too.
v2: Don't add memory mode to store intrinsics. It's always output.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
Tested on PTL, fixes various copy_and_blit tests that utilize compute
after ab9d3528dc that exposed this to them.
Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39548>
MHW has a long-running shader compile step on first
launch that is significantly sped up by disabling
Link Time Optimization in the ANV driver.
Shader compile times with LTO disabled are 50% of
baseline measurements and the benchmark shows no
stastically significant change to performance
(tested on LNL-M OOB)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39544>
HiZ must only be cleared when the full HiZ workaround is enabled. This
means that the previous slow clear draw would disable HiZ because it
hits the conditions (ie. depth/stencil enable and depth writes enabled).
So, the draw and the dispatch can run in parallel by moving the barrier
earlier.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39433>
Needs to consider the base offset, otherwise it's resolving to the
first 3D slice.
Fixes very recent VKCTS coverage dEQP-VK.pipeline.*.multisample.m10_resolve.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39393>
In this case, do the wrapping logic on our end and normalize right away
to host time domain.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
Deal with VRR vs FRR as well.
Loosely based on earlier work by Keith Packard and Emma Anholt
(MR 38472 for reference).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
Looks like the "new" page flip handler was in 2.4.78 in 2017. Mesa
requires at least 2.4.109, so we can retire this.
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
This is mostly provided for convenience, but it's not implementable by
applications when we're using blit queues for PRIME, so it's quite useful
to have.
This is reworked from previous GOOGLE_display_timing
MRs by Keith Packard and Emma Anholt.
See MR 38472 for reference.
Rather than exposing PRESENT_STAGE_LOCAL, we expose all timestamps in
one unified domain to simplify the implementation.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
Only weakness right now is that we cannot implement VRR vs FRR query reliably.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
Since this does most of the work to determine the right aux usage for
a depth texture, turn it into a helper that returns that aux usage in
order to avoid duplication of logic between it and its callers.
Suggested-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
v2: Add additional AUX state transition test-cases for HIZ_CCS (Nanley).
v3: Assume partial resolve is equivalent to full resolve on legacy HiZ
surfaces during isl_aux_state_transition_aux_op() instead of
asserting (Nanley).
v4: Move some tests into different group, add more MCS tests (Nanley).
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
This appears to be needed to guarantee that a resolved depth surface
has no remaining fast-cleared blocks on DG2 as well as MTL. After
this series this should no longer be hit in practice since we'll be
doing partial resolves in most cases, but it seems sensible to keep
and correct the workaround for our peace of mind to make sure that
full resolves are truly resolving the main surface.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
This appears to be needed to guarantee that a resolved depth surface
has no remaining fast-cleared blocks on DG2 as well as MTL. After
this series this should no longer be hit in practice since we'll be
doing partial resolves in most cases, but it seems sensible to keep
and correct the workaround for our peace of mind to make sure that
full resolves are truly resolving the main surface.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>