Remove the PAN_MESA_DEBUG=fp16 flag that was hiding it.
Skip two buggy dEQP tests. See linked discussion. We'll need to make
sure this gets sorted out before submitting conformance, but I don't see
a test with a fix in the pipeline as valid reason to hold back valid
code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9239>
Needed for constant folding to be effective. But don't copyprop into
instructions already reading from FAU, that will just end up adding more
moves!
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9239>
We don't vectorize transcendentals, since those are scalar only in
hardware. Also don't vectorize a few places where impedance mismatches
between NIR and the hardware make handling vectors infeasible for now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9239>
gcc 11 warns:
[846/1506] Compiling C object src/mesa/libmesa_common.a.p/main_shaderapi.c.o
In function ‘shader_source’,
inlined from ‘_mesa_ShaderSource_no_error’ at ../src/mesa/main/shaderapi.c:2137:4:
../src/mesa/main/shaderapi.c:2095:25: warning: ‘*offsets_10 + _130’ may be used uninitialized [-Wmaybe-uninitialized]
2095 | totalLength = offsets[count - 1] + 2;
I can't really see how it's getting to that conclusion, but allocating
`offsets` with calloc is both natural to do here and guarantees
initialization.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10671>
gcc 11 warns:
../src/util/format/u_format_fxt1.c:940:22: warning: ‘fxt1_variance.constprop’ accessing 128 bytes in a region of size 64 [-Wstringop-overflow=]
940 | int32_t maxVarR = fxt1_variance(NULL, &input[N_TEXELS / 2], n_comp);
But, suspiciously, if you inline fxt1_variance the warning goes away.
What's happening is that the 2nd arg is uint8_t[N_TEXELS][MAX_COMP], so
it looks like we're passing too small of an array in since gcc knows
that `input` is also [N_TEXELS][MAX_COMP]. Fair enough. Fix the
signature to reflect what's actually going on, and remove some unused
arguments while we're at it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10671>
This only implements dynamic primitive restart enable, depth bias
enable and rasterizer discard enable. I leave logic op and patch
control points for later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10340>
according to spec, the pSizes array member is only used if the array is non-null
and the value is not VK_WHOLE_SIZE, otherwise this value is calculated based
on the buffer size - the offset
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10625>
If CRC data is currently invalid and the current batch will make it
valid, write even clean tiles to make sure CRC data is updated.
Fixes: 8ba2f9f698 ("panfrost: Create a blitter library to replace the existing preload helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10566>
when an object is initialized with this, it should not be visible to any
other threads or contexts, so there should be no need to use an atomic set here
at the time of this commit, there are only two callers in the tree which pass
values != 1:
* zink uses a calculated number for framebuffer refcount on init (this is fine)
* aux/pb passes 0 on init (this is fine)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10650>
This reverts commit 7ca72f1726.
Unlike what's stated in this commit, the depth or stencil components
have to be replicated on all channels, as specified in the
"Texture Sampling and Texture Formats" section of the TGSI doc
(docs/gallium/tgsi.rst).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10649>
We don't really need loops unrolled, so let's just disable this. This is
generally recommended for NIR drivers, but we can do even better; not
even unroll in NIR. And since we don't set
nir_shader_compiler_options::max_unroll_iterations, we're already there.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10487>
The GL driver was getting loop unrolling from the GLSL compiler frontend,
but NIR unrolling is more sophisticated, so prefer that.
The only caveat is that loop unrolling is implemented in the Mesa state
tracker, so our backend won't have a chance to undo the optimization if
it causes us to lower thread count or spill, so we choose to be a bit more
conservative with the configuration than what we were doing with GLSL.
Shader-db results follow. Increase in instruction counts is expected due
to additional unrolling. We lose threads in very few shaders, but we
make up for this with the additional unrolling and reduced spilling. We
also managed to get 3 more shaders to compile successfully.
total instructions in shared programs: 13416427 -> 13461431 (0.34%)
instructions in affected programs: 96936 -> 141940 (46.43%)
helped: 58
HURT: 216
Instructions are HURT.
total threads in shared programs: 410626 -> 410598 (<.01%)
threads in affected programs: 56 -> 28 (-50.00%)
helped: 0
HURT: 14
Threads are HURT.
total loops in shared programs: 2121 -> 1708 (-19.47%)
loops in affected programs: 468 -> 55 (-88.25%)
helped: 446
HURT: 47
Loops are helped.
total uniforms in shared programs: 3676567 -> 3691185 (0.40%)
uniforms in affected programs: 25304 -> 39922 (57.77%)
helped: 23
HURT: 199
Uniforms are HURT.
total spills in shared programs: 5902 -> 5727 (-2.97%)
spills in affected programs: 285 -> 110 (-61.40%)
helped: 19
HURT: 0
total fills in shared programs: 13308 -> 13121 (-1.41%)
fills in affected programs: 301 -> 114 (-62.13%)
helped: 19
HURT: 0
total sfu-stalls in shared programs: 31860 -> 32856 (3.13%)
sfu-stalls in affected programs: 1692 -> 2688 (58.87%)
helped: 25
HURT: 196
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13448287 -> 13494287 (0.34%)
inst-and-stalls in affected programs: 98404 -> 144404 (46.75%)
helped: 57
HURT: 217
Inst-and-stalls are HURT.
total nops in shared programs: 329276 -> 329551 (0.08%)
nops in affected programs: 2189 -> 2464 (12.56%)
helped: 58
HURT: 181
Nops are HURT.
LOST: 0
GAINED: 3
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>
Once we have exhausted compile strategies at 4 threads and we start
enabling lower thread counts, there is no point in starting compiles
with 4 threads for them, we know these will fail, so let's start at
2 in these cases.
This also has another nice implication: if the driver compiles at 4
threads and fails to register allocate, we were allowing it to try
with 2 threads, but this would only retry the register allocation
process and would not really recompile the shader with 2 threads. This
is not optimal, because at 2 threads we have more TMU fifo space for
each thread and we can do more TMU pipelining, so we were missing that
opportunity.
This improves performance in Sponza by ~1.5% and also seems to help
UE4 slightly.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>
Until now, if we can't compile at 4 threads we would lower thread count
with optimizations disabled, however, lowering thread count doubles the
amount of registers available per thread, so that alone is already a big
relief for register pressure so it makes sense to enable optimizations
when we do that, and progressively disable them until we enable spilling
as a last resort.
This can slightly improve performance for some applications. Sponza,
for example, gets a ~1.5% boost. I see several UE4 shaders that also get
compiled to better code at 2 threads with this, but it is more difficult
to assess how much this improves performance in practice due to the large
variance in frame times that we observe with UE4 demos.
Also, if a compiler strategy disables an optimization that did not make
any progress in the previous compile attempt, we would end up re-compiling
the exact same shader code and failing again. This, patch keeps track of
which strategies won't make progress and skips them in that case to save
some CPU time during shader compiles.
Care should be taken to ensure that we try to compile with the default
NIR scheduler at minimum thread count at least once though, so a specific
strategy for this is added, to prevent the scenario where no optimizations
are used and we skip directly to the fallback scheduler if the default
strategy fails at 4 threads.
Similarly, we now also explicitly specify which strategies are allowed to do
TMU spills and make sure we take this into account when deciding to skip
strategies. This prevents the case where no optimizations are used in a
shader and we skip directly to the fallback scheduler after failing
compilation at 2 threads with the default NIR scheduler but without trying
to spill first.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>