This makes sure we apply WA only when it is required, these issues
do not happen for later MTL steppings.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23596>
This makes sure we apply WA only when it is required, these issues
do not happen for later MTL steppings.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23596>
Specifically we would bail out previously when encountering any
control flow, now we would optimize it even when the second ARL/ARR
is inside a lower level if/else branch.
shader-db
RV530:
total instructions in shared programs: 132020 -> 131924 (-0.07%)
instructions in affected programs: 3374 -> 3278 (-2.85%)
helped: 4
HURT: 0
RV370:
no change (no control flow there)
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
We already do this for ARL, so just generalize the pass.
shader-db
RV530:
total instructions in shared programs: 132235 -> 132020 (-0.16%)
instructions in affected programs: 8492 -> 8277 (-2.53%)
helped: 41
HURT: 1
total temps in shared programs: 16900 -> 16887 (-0.08%)
temps in affected programs: 83 -> 70 (-15.66%)
helped: 13
HURT: 0
RV370:
total instructions in shared programs: 82395 -> 82320 (-0.09%)
instructions in affected programs: 4715 -> 4640 (-1.59%)
helped: 33
HURT: 1
total temps in shared programs: 12316 -> 12305 (-0.09%)
temps in affected programs: 75 -> 64 (-14.67%)
helped: 11
HURT: 0
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
Its particularly important to have the copy-propagate pass run first.
So that when the round is vectorized, we don't have to follow the MOVs
to find out if it leads to ARL or not (we don't vectorize ARR/ARL at the
moment).
No shader-db change.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
Specifically after the first copy propagate run but before the
second one. Removal of ARLs will enable the copy propagate to be more
aggresive, as it is very carefull in such cases.
shader-db
RV530:
total instructions in shared programs: 131861 -> 131503 (-0.27%)
instructions in affected programs: 23949 -> 23591 (-1.49%)
helped: 199
HURT: 15
total temps in shared programs: 16997 -> 16903 (-0.55%)
temps in affected programs: 767 -> 673 (-12.26%)
helped: 69
HURT: 9
RV370:
total instructions in shared programs: 82360 -> 82027 (-0.40%)
instructions in affected programs: 19516 -> 19183 (-1.71%)
helped: 183
HURT: 15
total temps in shared programs: 12370 -> 12262 (-0.87%)
temps in affected programs: 664 -> 556 (-16.27%)
helped: 73
HURT: 0
The hurt programs are due to some constant load being copy propagated
which leads to bad interaction with source conflict resolve pass later.
v2: add missing shader type initialized to the tests. Previously we were
checking for has_omod which also practically means we have a fragment
shader, however its less readable.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
Having a high value for the first activity timeout made sense back in
the days when the machine may not be associated with salad early... but
this isn't the case anymore!
So let's go with a very conservative value of 2 minutes to boot :)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21730>
Makes it more obvious that this function is actually initializing the dri_screen
and not some helper.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23054>
Avoid calling this function on screen creation failure as we will discard its
result right after.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23054>
Avoid inconsistent behavior on screen creation failures which might lead
to double free issues.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23054>
This function is actually used before the use of dri_init_screen_helper so
it is not exactly releasing the memory allocated by the screen helper.
Also clear the base.screen variable after destroy to make this function
reentrant.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23054>
The code to release the device was actually always used after the call
to this function.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23054>
On RDNA2, VRS rates are part of the HTILE buffer but if we disable
HTILE completely for eg. GENERAL, VRS rates aren't read by the hw.
Fix this by disabling HTILE compression which should have the same
effect without VRS.
Fixes recent
dEQP-VK.fragment_shading_rate.renderpass2.monolithic.attachment_rate.misc.*
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23209>
DB_RENDER_CONTROL controls whether depth/stencil rendering should be
compressed. Emitting this register as part of the framebuffer will
allow us to keep HTILE enabled for VRS rates, instead of disabling it
completely.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23209>
Otherwise we run into problems by putting this optimization loop
before I/O lowering, where there might still be 8-bit values that
haven't been lowered to 16 or 32. Once that's done, any remaining
movs or vec ops will have higher bit sizes already.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Add support for 16-bit UBO loads, delete handling of byte-addressed
UBO loads (which I think was never used anyway) and add handling
for the component const index to optimize out unneeded extractResults.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Derefs have index-based access semantics, which means we don't need
custom intrinsics to encode an index instead of a byte offset.
Remove the "masked" store intrinsics and just emit the pair of atomics
directly. This massively reduces duplication between scratch, shared,
and constant, while also moving more things into nir so more optimizations
can be done.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
We still use shader_temp as a temporary variable mode to differentiate
which variables have simple deref patterns vs ones that need to be
lowered to ssbo, but then we put it back to mem_constant when we're
done to restore sanity.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
There's a few changes in here that are very inter-related.
First, we stop lowering load_deref on shader_temp to load_ptr_dxil,
and just leave it as load_deref. In order for that to work, we need
the derefs to be in a shape that's acceptable to DXIL, so the only
current producer of shader_temp loads (the CLC frontend) needs to
run some lowering passes on them first.
The DXIL backend is augmented to just write out deref indices while
walking a deref chain, which will get combined in the load op into
a GEP instruction. For non-mesh/raytracing shaders, these are required
to be single-level scalar arrays, but the complexity here is preparation
for when we don't need to do that anymore.
Additionally, the const lookups are changed from using a hash table
to just putting an index on the variable.
All of this together is enough to enable the authored-forever-ago test
which uses indirect array access into a const packed struct. The
load_ptr_dxil handling didn't deal with packed structs / unaligned
accesses, but now that we're in a logical address space with derefs
instead of physical, there's no alignment to deal with anymore and
the fact that it's packed goes out the window.
This removes one custom DXIL intrinsic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
DXIL requires GEP chains to point to a global variable that's a flat
array of primitive types. If we're converting deref chains to GEP
chains, we're effectively in a logical address space, which means
we can do things like change sizes of variables, since we know
they won't alias with anything else. If they could alias, we'd be
lowering them to an explicit I/O op instead. That means we can
start disabling some of the low-bit-size lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
Now that we try harder for memcpys, we can use nir's complex usage helper.
We also can just mark the vars instead of using a hash map, since location
doesn't mean anything for constant vars.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
For the case of memset, the SPIR-V translator produces a copy from
a byte array of 0s. If we wait to lower memcpys until after types
are sized, we can potentially turn those 0s into SSA zeros and remove
the entire constant array.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
I was seeing u2u64 still in my final shader after pack/unpack were
lowered, which sounds to me like some other optimizations are missing
for detecting the post-lowering pack/unpack patterns, but let's at
least add some patterns for the simple cases.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>