When debugging a problem in a trace, CTS test,... that is caused by a
known compiler feature, the first step is usually to find which shader
causes the problem. This is often non-trivial as the amount of shaders
in a trace can be huge. This commit adds a debugging tool to help with
this.
The idea behind this tool is to assign every shader a deterministic
(pre-compilation) ID that can be used to order shaders. Once we have
this, we can use it to bisect which shader causes the problem. This
obviously only works if the problem can be traced back to a single
shader. In my experience, this is often the case.
This tool reuses the shader cache key as deterministic ID. It is
concatenated with the variant ID to distinguish the different variants
of a shader.
In practice, bisecting the shaders in a test run works like this:
- Gate the problematic compiler feature using ir3_shader_bisect_select;
E.g., if (ir3_shader_bisect_select(v)) IR3_PASS(...);
- Run test with IR3_SHADER_BISECT_DUMP_IDS_PATH=ids.txt
- Sort ids.txt
- Bisect the shader IDs using IR3_SHADER_BISECT_LO/IR3_SHADER_BISECT_HI.
- Dump the problematic shader using IR3_SHADER_BISECT_DISASM.
A Python script is provided to make all this easier:
- ir3_shader_bisect.py dump-ids -o ids.txt 'test args'
- ir3_shader_bisect.py bisect -i ids.txt 'test args'
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33602>
We need to insert a (ss)nop when an instruction that doesn't support
(ss) needs it. However, when this happens in a block that needs to be
legalized more than once (e.g., because it is in a loop), the (ss)nop
would be inserted every iteration, causing an infinite loop.
Fix this by checking if the previous instructions is a nop and applying
(ss) there.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 5993723471 ("freedreno/a3xx/compiler: scheduling/legalize fixes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36440>
Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.
We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.
Via Coccinelle patch:
@@
expression a, b;
@@
-nir_def_rewrite_uses_after(a, b, b->parent_instr)
+nir_def_rewrite_uses_after_def(a, b)
Followed by a bunch of sed.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
The HW returns the wrong result for gl_SampleMaskIn when
post_depth_coverage is enabled together with FSR. This is especially
problematic since we use gl_SampleMaskIn to lower gl_HelperInvocation.
Disable the extension as a workaround.
We are currently unaware of any apps that use VK_EXT_post_depth_coverage
so disabling it should not be a problem.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35788>
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
Lowering UBO ranges to the const file happens via the preamble on
turnip (all gens) and freedreno (a7xx+). The only exception is the
consts_ubo, which is uploaded to the const file via CP.
Make this more consistent, and remove some special-casing code, by
treating consts_ubo as any other UBO.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36470>
Now that we have we have the concept of "dummy" registers, we can use it
for descriptor prefetches as well. Currently, they are represented as
having no dst, and a fixup pass during legalization adds the actual
needed dummy dst. This can be prevented by representing their dst using
a dummy register from the start.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36365>
A sam used as descriptor prefetch has a dummy src which is hard-coded
to r48.x. Neither postsched nor legalization are aware this is a dummy
src though, causing false dependencies to be added. This often results
in unnecessary syncs and/or sub-optimal scheduling decisions.
Fix this by adding a "dummy" register flag and teaching postsched and
legalization to ignore registers with that flag set.
Totals from 24681 (15.00% of 164575) affected shaders:
Instrs: 17953856 -> 17953751 (-0.00%); split: -0.00%, +0.00%
CodeSize: 36166530 -> 36163008 (-0.01%); split: -0.09%, +0.08%
NOPs: 3466012 -> 3465943 (-0.00%); split: -0.00%, +0.00%
MOVs: 550649 -> 550613 (-0.01%)
(ss): 460398 -> 460402 (+0.00%)
(ss)-stall: 1780969 -> 1780916 (-0.00%); split: -0.00%, +0.00%
(sy)-stall: 5876641 -> 5876604 (-0.00%); split: -0.00%, +0.00%
Preamble Instrs: 4118242 -> 4087950 (-0.74%); split: -1.13%, +0.39%
Last helper: 7258848 -> 7258837 (-0.00%); split: -0.00%, +0.00%
Cat0: 3795308 -> 3795239 (-0.00%); split: -0.00%, +0.00%
Cat1: 746570 -> 746534 (-0.00%)
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36365>
When invalid registers are passed to `get_ready_slot`, it may cause an
OOB array access. Instead of running into UB when this happens, catch it
early by asserting.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36365>
Set to true everywhere except:
- spirv_to_nir used by Vulkan
- bindless handles in GLSL
- some internal shaders and driver-specific code
Acked-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36099>
Use the existing DRIVER_NAME mechanism to pick up common skips.
This is less error prone than manually adding the skips.
A redundant freedreno-a618-skips.txt is also dropped, as it's already
included via GPU_VERSION.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36215>
The cheza runners were decommissioned.
Rename the restricted trace results to a618 (same GPU generation) to keep
the history.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36215>
* use tex builder (what I came for)
* use nir_create_variable_with_location
* drop redundant internal = true's, the shaders are already initialized that way
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36050>
There is no need to have an own copy
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36218>
Otherwise we can have a case where binning VS uses more consts than
full VS (when safe variant is used for full VS), that will result in
a rendering issue because SP_VS_CONST_CONFIG.CONSTLEN is shared between
full and binning VS in PROGRAM_CONFIG state and gets the value from the
full VS.
There are two alternative solutions that can allow binning VS to always
use maximum constlen:
- Move constlen emission to per-XS config. This interferes
PROGRAM_CONFIG state which uploads consts and does SP_UPDATE_CNTL.
Consts would need to be uploaded after constlen is defined, while
SP_UPDATE_CNTL must be done before per-XS state is emitted.
Also having SP_UPDATE_CNTL in a draw state that is always DIRTY
isn't great.
Something didn't work out on A6XX, so this idea was dropped.
- Emit constlen again in VS_BINNING draw state. This seem to work
but also likely an undefined behaviour since constlen is changed
after some consts are uploaded.
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36203>
Thanks to the optimizations in the previous commits and further
increasing tests_per_group, we can now run more tests in a618-vk
while also reducing its parallelism.
Dropping `DEQP_FRACTION: 2` and only increasing the `fraction` of
the basic test set to 3 results in broader overall coverage.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36027>
The a618 jobs run on two different 8 thread devices, which is the optimal
value for FDO_CI_CONCURRENT on most of their jobs.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36027>