Simplify transition_depth_buffer() by reusing a function to update the
fast-clear value instead of open-coding that logic.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35329>
There were a couple issues in iris_resource_prepare_render():
* It previously assumed that the sampler would always look at the raw
dwords for 32bpc formats. However, the sampler only does this on
gfx12.0 for R32 formats (not RG32 formats for example). Update the
comments to reflect this.
* It only initialized the clear color if the render_format was
non-32bpc. However, initialization is still needed outside of this
case because a subsequent sampling operation may use a view format
which looks at the sampler field. Check for the FCV aux-usage instead
of the render format's number of bits-per-channel to fix this.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35329>
Use isl_get_sampler_clear_field_offset() to more accurately determine
when the sampler will change the field it reads from on gfx11-12. This
avoids partial resolving in a number of cases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35329>
Use get_copy_format_for_bpb() instead of
get_ccs_compatible_uint_format() when performing blorp_copy(). This
matches the code path taken on gfx20 and increases the testing of cases
which would impact gfx12.0 in isl_get_sampler_clear_field_offset().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35329>
Through testing, I've found that the sampler will fetch the clear color
pixel from the converted clear color field in more cases. So, stop
reporting the raw dword offset for them:
* On gfx12.5, for 32-bpc color images.
* On gfx11-12.0, for 64-bpp color images.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35329>
The AFBC path is behind a driconf option, and so was not tested by any
existing CI jobs. We had a regression with this that went unnoticed for
several months. To avoid similar situations in the future, add AFBC
smoke tests to CI, similar to the existing spilling smoke tests..
Some tests on g52 fail instead of crashing when AFBC is enabled, but
otherwise the CI expectations are identical.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35193>
Judging by comments in the chat, it seems the usage of the tokens in crnm is
only natural when you've been using it for a while. New users would appreciate
reading it in the documentation, beyond the help in the tool.
Also, mentioning how to create a token and what's the minimal scope of the
token to be used with the tool can help new users.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34860>
Consider the following snippet from a Trine shader
20: RCP temp[40].z, temp[39].__w_;
21: MOV temp[40].xy, temp[34].-x-y__;
22: DP3 temp[41].x, temp[40].xyz_, temp[29].xyz_;
...
33: DP3 temp[52].x, temp[40].xyz_, temp[51].xyz_;
34: MAX temp[53].x, temp[52].x___, none.0___;
35: MUL temp[54].xy, temp[40].xy__, const[8].ww__;
36: MUL temp[55].xy, temp[54].xy__, temp[41].xx__;
37: MUL temp[56].x, temp[40].z___, const[8].w___;
When we search for writers for temp[40] so that we can check if we can
convert the MUL to omod, the corresponding variable actually contains
the RCP temp[40].z first and the MOV temp[40].xy is marked as friend.
However the current logic only checks the first instruction of variable,
so we fail to find the writer. Just search recursivelly also the
variable friends.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34462>
This moves per-patch output VMEM stores to the end of the shader where they
execute only once. They are skipped if the whole workgroup discards
all patches.
If tcs_vertices_out == 1, per-patch output VMEM stores use the same lanes
as per-vertex output VMEM stores, which are aligned to 4 or 8 lanes to get
cached bandwidth for the stores.
Previously, per-patch outputs were stored to memory for every store_output
intrinsic in TCS.
Additionally, LDS is no longer allocated for per-patch outputs that are only
written and read by invocation 0, or they are written by all invocations
but not read, and don't have indirect indexing. This reduces LDS usage and
LDS traffic.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
This unifies the duplicated LDS output patch size computation between
hs_output_lds_offset and ac_nir_compute_tess_wg_info.
"+ 4" to the output patch stride minimizes LDS bank conflicts by making
the beginning of each patch start on a different LDS bank.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
Checking whether every compoment is valid in tess_level_has_effect() when
prim_mode is unknown generated too many SALU. Do this instead:
if (triangles) ...
subgroup vote for triangles
else if (quads) ..
subgroup vote for quads
else // isoline
subgroup vote for isolines
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
This rewrites tess level value tracking to use the 2-bit masks, which
means LDS allocation is determined separately for outer and inner levels.
LDS is not allocated for tess levels that are only written by invocation 0
and never read or only read by invocation 0. If the number of output
patch vertices is 1, LDS is also not allocated for tess levels.
Tess level outputs for TES are always written as whole vec4 to get cached
bandwidth.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
This improves write throughput for TCS outputs. It follows the same idea
as attribute stores in hw GS. The improvement is easily measurable with
a microbenchmark.
It also has the advantage that multiple output stores to the same address
don't result in multiple memory stores. Each output components gets only
one memory store at the end of the shader.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
It's a stride of 1 output, which isn't 16. It's 16 * num_threads,
aligned to 256.
tcs_offchip_layout has 5 unused bits, so let's use them.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>