I see no point, we allocate for every shader stage anyway. This is a bit
simpler.
I'm not a fan of the brw_compiler singleton at all but torching that is not on
today's agenda. Flattening it a little bit very much is.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
The intel_vue_map is only partially initialized before being used. All
used fields are initialized, but in debug paths the unitialzed fields
will also be read. To fix this initialize the struct to 0. In the brw
path this struct is part of the prog_data, and is rzalloc'd.
CID: 1665308
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37261>
Implement HSD 16028171704/14025112257:
LSC state cache livelock:- Once state cache entries are full,
subsequent walker dispatches with two threads per thread group maybe
gets stuck infinitely because of state cache live lock.
One thread continuously stuck in loop doing UGM fence + evict and UGM
read is waiting on UGM read to have certain value. while other thread
supposed to update the value that first thread is waiting for. But
since entries are full in state cache, there is second thread never
make progress.
Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
Allow CCS for non-display linear surfaces in isl_surf_supports_ccs().
We're going to rely more on the helper to determine CCS-enabling for Xe2
on iris.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32120>
It was only added to indirect compute walkers while HSD don't say
anything about this optimization be specific to indirect compute
walkers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36058>
We don't support redescribing Tile64 and 3D due to interleaved depth
planes.
Fixes: 312952048b ("intel/blorp: Redescribe gfx12.5 surfaces for CCS fast clears")
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35619>
Use get_copy_format_for_bpb() instead of
get_ccs_compatible_uint_format() when performing blorp_copy(). This
matches the code path taken on gfx20 and increases the testing of cases
which would impact gfx12.0 in isl_get_sampler_clear_field_offset().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35329>
Through testing, I've found that the sampler will fetch the clear color
pixel from the converted clear color field in more cases. So, stop
reporting the raw dword offset for them:
* On gfx12.5, for 32-bpc color images.
* On gfx11-12.0, for 64-bpp color images.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35329>
In practice, I don't think it's actually going to overflow, but it could
in theory, which coverity is pointing out.
CID: 1647010
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>
Refactor the scale factors to highlight the 16-tile width requirement on
Tile4. The fast-clear simulator code associated with HSD 1407682962
also contains a 16-tile requirement for Tile4 surfaces (for the pitch).
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
According to HSD 1407682962 and the associated simulator code,
fast-clear performance can be affected by: image alignment, tiling,
dimensionality, and row pitch. Redescribe surfaces in order avoid
fast-clearing at a slower rate.
Also, benchmarking the main patch in the performance CI (hw=A750)
shows that some traces are helped significantly:
* TotalWarWarhammer3 +5.58% (n=2)
* Factorio +3.75% (n=1)
* TerminatorResistance +3.3% (n=2)
* Borderlands3 +3.23% (n=2)
We could additionally increase the alignment requirements of surfaces in
order to deterministically increase fast-clear performance. That's left
out of this patch in order to avoid any functional pitfalls that can
arise with increased memory consumption. As a result, performance will
continue to be affected by how ISL/drivers/apps configure main surface
memory alignments (directly or indirectly).
Thanks to Lionel Landwerlin for pointing me to the relevant simulator
code.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11168
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11418
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
Unify code which creates surfaces from buffers. The behavior is slightly
changed to use array layers to enable arrayed buffer clears (as needed).
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
One more instruction were the MOCS value was splited into two
registes.
Cc: mesa-stable
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34592>
Xe2 changed the MOCS field in few instructions, those now have a field
for the MOCS index and other the encryption enable bit but ISL returns
the combination of both aka MEMORY_OBJECT_CONTROL_STATE.
To minimize changes I have added 2 macros to extract the values
from the value returned by isl.
From all the instructions changed Mesa only make use of two, so the
other instruction will be handled in the next patch.
Cc: mesa-stable
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34592>
Copy engine is not used in gfx12 platforms on ANV but that is possible
in Iris.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34560>
Compressed CPS surfaces operations such as copies and clears need to be
handled through the depth stencil hw to ensure that the aux data is
handled correctly.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20741>
Currently blorp assumes that copies of depth/stencil is restricted
to/from depth/stencil formats. We want to allow color<->depth copies.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31983>
This is a request from debug engineers to be able to trace the HW
better when analyzing hangs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33795>
Defaults to true. When set to false Iris and various tools can be
built without ELK support. In both cases this means supporting
only Gfx9+. This option must be true to build Crocus or Hasvk.
This allows skipping re-building ELK when developing for newer platforms
with tools/tests enabled.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11575
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
Request a fixed subgroup size for pixel shaders that require it due to
the hardware restrictions of fast clears and repeated data clears.
This requires plumbing the "is_fast_clear" boolean across several
callers since blorp_compile_fs_brw() currently has no information
regarding whether the kernel is intended for a fast clear operation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
Found on simulation, complaining about SIMD32 shaders enabled when
using MSAA 16x.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30753>
Use 3DSTATE_URB_ALLOC_* instruction to program URB for multislice device
config.
In case only one slice is available in the device, SliceN fields will be
ignored by HW.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32736>
Moving it to intel_shader_enums.h
The plan is to make it visible to OpenCL shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>
According to HSD 1406738321, full resolves and fast-clears don't work
properly on 3D textures. Up until now, we've disabled CCS for this case.
Instead, redescribe the surface as 2-dimensional to perform auxiliary
surface operations.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31880>
Stuff COMPUTE_WALKER_BODY in COMPUTER_WALKER in both iris and anv.
This also fixes the tracepoint for ray dispatches. Stuffing
COMPUTE_WALKER_BODY allow us to set the
cmd_buffer->state.last_compute_walker.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31822>
In iris, this should avoid some partial resolves when copying between
images. In anv, this will reduce restrictions on dmabufs which have
clear color support in the next patch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31136>
blorp_copy_get_formats() tries to make the source and destination view
formats match as much as possible. This avoids some casting in the copy
shader, but it makes determining the format that will be used for a
surface impossible without having the ISL surface for both that surface
and a source or destination.
We'd like to enable the Vulkan driver to know as early as possible what
format an image may be reinterpreted as for correctness. So, determine
the copy formats more independently and expose a helper which does so
for drivers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31136>
This tests is asserting on LNL like :
dEQP-VK.pipeline.monolithic.sampler.border_swizzle.r8_srgb.gbar.custom.gather_1.no_swizzle_hint
dEQP-VK.api.image_clearing.core.clear_color_image.2d.optimal.single_layer.e5b9g9r9_ufloat_pack32
Because blorp tries, for example, to setup a render target with
L8_UNORM_SRGB (which is mapped to the R8_UNORM_SRGB of Vulkan) but is
not supported for rendering.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1c7fe9ad1b ("anv: Support fast clears in anv_CmdClearColorImage")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31357>
Whenever we execute a fast-clear due to LOAD_OP_CLEAR, we decrease the
number of layers to clear by one. We then enter the slow clear function
and possibly exit without clearing if the layer count is zero.
Unfortunately, we've already compiled the shader for slow clears by the
time we exit. Skip the slow clear function if there are no layers to
clear.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31167>