While trying to use that feature on RADV I ran into an infinite
recursion.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 97b4a6d0e3 ("compiler: SPIR-V shader replacement")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41751>
The kevins are increasingly creaky and unreliable after a decade of
excellent service, so it's time to send them off to the farm and move
our T860 jobs to a device type which can actually run jobs.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41749>
Forcing flush by setting initial_gfx_cs_size to zero requires
there are always packets emitted on starting new gfx IB.
But this is not the case with userq, as there is no preamble.
Add a new flag to be used with si_flush_gfx_cs to force flush.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41530>
Dishonered 2 or DXVK is creating pipelines with empty fragment
shaders. With alpha-to-coverage a dynamic state, we currently consider
there is a need for a render target but if the shader is not writing
anything, it's not needed.
This change only considers the color output writes as it's the alpha
channel there that is used for coverage computation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41711>
Suggested by @gurchetansingh.
Android's Soong build system treats several compiler warnings as errors
by default: https://android.googlesource.com/platform/build/soong/+/27f57506/cc/config/global.go/#218
To catch these issues in Mesa, introduce `soong_compat_c_args`
and `soong_compat_cpp_args` with the following flags treated as errors:
-D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS
-Werror=date-time
-Werror=gnu-alignof-expression
-Werror=ignored-qualifiers
-Werror=implicit-fallthrough
-Werror=int-conversion
-Werror=missing-prototypes
-Werror=pragma-pack
-Werror=pragma-pack-suspicious-include
-Werror=sizeof-array-div
-Werror=string-plus-int
-Werror=unreachable-code-loop-increment
These compatibility flags are added to the meson configurations
for ANV, Gfxstream, Lavapipe, PanVK, Turnip, and Venus.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41644>
Based on the approach in e0eea5ea4e.
When a file is too large, -Wmisleading-indentantion will give the warning
below, that we can't prevent from a #pragma:
../src/freedreno/vulkan/tu_perfetto.cc: In function 'void setup_incremental_state(MesaRenderpassDataSource<TuRenderpassDataSource, TuRenderpassTraits>::TraceContext&, tu_device*)':
../src/freedreno/vulkan/tu_perfetto.cc:162: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
162 | if (!state->was_cleared)
../src/freedreno/vulkan/tu_perfetto.cc:162: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89549 for details.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41644>
Only adding the workarounds that have an actual effect on that driver.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41664>
We want extract the leaf type from potential hit and assign it
to commited hit.
Instead of that, we were simply assigning leaf type 0x7 to commited hit.
This patch mask out leaf type with nir_iand_imm and also update the
incorrect field comment.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41667>
Add optional OA performance counter collection around each execute()
call. Examples:
```
# List all profiles and counters, with descriptions.
$ executor --oa list
# Collect all counters from a profile.
$ executor --oa ComputeBasic file.lua
# Collect a subset of counters from a profile, separated by comma.
$ executor --oa ComputeBasic:GpuTime,AvgGpuCoreFrequency file.lua
# By default use ComputeBasic profile, so counter names only also work.
$ executor --oa GpuTime file.lua
```
The selected counters are printed to stdout after the script finishes,
or written to a file specified by --oa-csv FILENAME.
Assisted-by: Pi coding agent (GPT-5.5)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41610>
ORCJIT expects every functions prototypes to be present even when using
object caches. Code for adding stubs for entry point functions was added
previously when implementing shader cache for ORCJIT, but when using
OpenCL, extra functions could be present in compute shaders which need
stubs too.
Reuse the code for constructing references for extra functions to
generate function stubs for them.
This fixes function calls with Rusticl on llvmpipe with ORCJIT.
Fixes: bb0efdd4d8 ("llvmpipe: add shader cache support for ORCJIT implementation")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41532>
This struct was initially packed to fit in a slot in NIR intrinsics
indices. Nowadays NIR supports larger indices and cooperative matrix
has extensions that allow it to go beyond the existing limit. This
patch changes the struct to be larger and remove the manual bit packing.
The hash table change is to use the specialized version for u64 keys
that's available in src/util.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41691>
It's not about the memory traffic but updating the Tmax value/distance
so that on next intersection, we would be comparing the updated Tmax
value/distance instead of original distance.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41709>
Execution mask gets applied to last thread in the threadgroup to mask
off simd lanes, But with BTD enabled, we are seeing only last 4
components has valid stack ID's and upper 4 components of the register
are zero.
Changing execution mask somehow populates the stack IDs properly.
This is on simulator, before changing the execution mask:
00000000 00000000 00000000 00000000 000F000E 000D000C 000B000A 00090008 00000000 00000000 00000000 00000000 000F000E 000D000C 000B000A 00090008 r1
After changing execution mask:
000F000E 000D000C 000B000A 00090008 00070006 00050004 00030002 00010000 000F000E 000D000C 000B000A 00090008 00070006 00050004 00030002 00010000 r1
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41409>
When updating a register after successfully finding a pair to coalesce,
use the live range of the source register to walk only the instructions
that might use it. Depending on the shader this allows skipping a bunch
of blocks -- and also terminating early.
Below are fossil compilation times in a MTL machine compiling shaders
for a BMG GPU, the big win here was for Cyberpunk 2077.
```
// Differences at 95.0% confidence.
// Rise of the Tomb Raider (n=20)
-0.0095 +/- 0.00706877
-1.90572% +/- 1.40609%
// Alan Wake (n=20)
-0.031 +/- 0.0172806
-0.93599% +/- 0.51952%
// Borderlands 3 (n=15)
-0.353333 +/- 0.118679
-2.44307% +/- 0.80787%
// Oblivion Remastered (n=15)
-0.134 +/- 0.026008
-2.76898% +/- 0.531637%
// Baldur's Gate 3 (n=15)
-0.954286 +/- 0.163625
-2.21713% +/- 0.377562%
// Cyberpunk 2077 (n=20)
-2.8665 +/- 0.228489
-8.08661% +/- 0.621779%
```
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41495>
Instead save to a local variable and use that. In various cases the
compiler is not able to pull it out of the loop, since there are other
not inlined function calls as part of the loop's body, resulting in
repeated unnecessary calls to either size_read() or its pieces that
get inlined.
Below are fossil compilation times in a MTL machine compiling shaders
for a BMG GPU:
```
// Differences at 95.0% confidence.
// Rise of the Tomb Raider (n=20)
-0.017 +/- 0.00724575
-3.45177665% +/- 1.45084%
// Alan Wake (n=20)
-0.153 +/- 0.00960067
-4.99265786% +/- 0.303695%
// Borderlands 3 (n=14)
-0.486428571 +/- 0.15354
-3.51248195% +/- 1.0835%
// Oblivion Remastered (n=14)
-0.143571429 +/- 0.0357991
-3.05749924% +/- 0.747872%
// Baldur's Gate 3 (n=14)
-1.68928571 +/- 0.151598
-4.12128605% +/- 0.364259%
```
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
Compute var_from_reg() once in setup_def_use() and pass the variable
number to setup_one_read() and setup_one_write(). This lets the loops walk
consecutive variable numbers directly instead of mutating a brw_reg offset.
Also: setup_one_write() is only called for VGRFs, so remove the check
for VGRF there.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
Only ARF sources are relevant in this case, so check the file
before calling size_read().
Below are fossil compilation times in a MTL machine compiling shaders
for a BMG GPU:
```
// Differences at 95.0% confidence.
// Rise of the Tomb Raider (n=20)
No difference proven
// Alan Wake (n=20)
-0.0725 +/- 0.0139437
-2.30965276% +/- 0.438787%
// Borderlands 3 (n=14)
-0.248571429 +/- 0.135107
-1.76946153% +/- 0.954171%
// Oblivion Remastered (n=14)
-0.0735714286 +/- 0.0235712
-1.54770849% +/- 0.492117%
// Baldur's Gate 3 (n=14)
-0.832142857 +/- 0.23095
-1.98028217% +/- 0.545648%
```
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
regs_read() itself gets inlined, but size_read() does not. In GCC
release builds this results in three calls to size_read() at each site,
one of them due to how MIN2 is expanded. Use a local variable to store
the result.
Below are fossil compilation times in a MTL machine compiling shaders
for a BMG GPU:
```
// Differences at 95.0% confidence.
// Rise of the Tomb Raider (n=20)
-0.013 +/- 0.00596452
-2.56410256% +/- 1.15623%
// Alan Wake (n=20)
-0.1755 +/- 0.0144896
-5.29491628% +/- 0.425556%
// Borderlands 3 (n=14)
-0.562142857 +/- 0.129678
-3.84765816% +/- 0.870239%
// Oblivion Remastered (n=14)
-0.0821428571 +/- 0.0262485
-1.69867061% +/- 0.537247%
// Baldur's Gate 3 (n=14)
-1.61357143 +/- 0.21693
-3.69788342% +/- 0.486462%
```
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
The instruction may get transformed, modifying the destination before
the loop index gets incremented. So save the original regs_written
value to be used in the loop increment.
While we are here, assert that all the slots in mov[] are filled
at this point in the code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>
On Xe2+ the Wa_1407528679 NoMask workaround is disabled, so
baked_ordered_dependency_mode() should treat all instructions as
exec_all, matching the logic in gather_inst_dependencies() and
emit_inst_dependencies().
Without this, ordered RegDist dependencies from uniform/WE_all
producers (e.g. 'mov s0, imm') are not found during baking and
fall through as separate WE_all SYNC NOPs. Real shaders pile up
dozens of these in front of masked sends.
v2(Caio): Fix existing scalar_register test expectations
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Fixes: 47a6ef3fef ("brw/scoreboard: Use a predicate helper for the nomask workaround")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41713>
Add two tests verifying that ordered RegDist dependencies from
uniform/WE_all producers are baked into the consumer's SWSB on Xe2+.
Disabled for now since they fail on current main.
Reviewed-by: Michael Cheng <michael.cheng@intel.com>
Assisted-by: Pi coding agent (Opus-4.7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41713>
CSE should not depend on liveliness analysis. When the pass runs the
only possible liveliness analysis that is run is on the bi_validate
path, having a dependency only makes our validated runs different (and
slower) than the unvalidated runs.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Found-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41504>
CSE can cause some cases where we had
%3 = ICMP_OR %1, %2, 0
%4 = ICMP_OR %1, %2, 0
%5 = LSHIFT_AND %3, %4
To become
%3 = ICMP_OR %1, %2, 0
%5 = LSHIFT_AND %3, %3
The va_fuse_cmp pass would try to rewrite this as
%3 = ICMP_AND %1, %2, %3
But this is obviously wrong, we should not fuse the same instruction
together.
Fixes: 800a861431 ("pan/bi: Fuse FCMP/ICMP on Valhall")
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Found-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41504>
Zink now unconditionally requires VK_KHR_maintenance5 to run.
Add it to the required extension list of Zink documentation.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41729>
VK_KHR_maintenance5 is now unconditionally required by Zink.
Move it to the gl21_baseline capabilities set to make it required by
every Zink profile.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41729>
Fixes "multiple stores to the same location" assertions in tests like
dEQP-VK.pipeline.monolithic.color_write_enable_maxa.cwe_after_bind.attachments3_more0
In that case, the stores were actually to different locations, but some
constant additions hadn't been folded into the location field yet.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>