Calculation for the worst case scenario in bufs_req should also include
predication command size.
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Andrzei Okenczyc <Andrzej.Okenczyc@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36809>
../src/gallium/frontends/va/config.c(574): error C2059: syntax error: '}'
MSVC 2019 doesn't support for it yet
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36843>
Usually JIP will be valid, but as part of other changes, it will be
possible to have a shader that have multiple EOT messages and end with
and ENDIF instruction. Its JIP will point after the program ends.
This is fine but was tripping up the compaction code.
Change compaction to not read its internal structures beyond the last
instruction.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36822>
BITFIELD_MASK() returns a 32-bit unsigned integer, and Clang complains
if we assign it to a 16-bit unsigned integer without a cast. Let's add
that cast.
While we're at it, add an assert() to make it clear to the compiler that
the condition in BITFIELD_MASK() can be optimized away.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
BITFIELD_MASK() returns a 32-bit unsigned integer, and Clang complains
if we assign it to a 16-bit unsigned integer without a cast. Let's add
that cast.
While we're at it, add an assert() to make it clear to the compiler that
the condition in BITFIELD_MASK() can be optimized away.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
The enum pan_earlyzs is just enum mali_pixel_kill under a different
name, which was needed because the enum was missing from common.xml.
However, because pan_earlyzs_lut is used in files that are both included
with PAN_ARCH unset and set to values including values lower than 6, we
get issues with the way genxml/common_pack.h gets included, resulting in
the enum not being defined.
We don't really depend on the values for this, only on the size. So
let's just use unsigned values in the struct instead, to side-step the
issue.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
While this was also using translate_zs_format() before the commit in
question, that's didn't lead to any real issues, because only a single
value was legal here before. While it's not entirely in-spec to use
other values, it seems the HW doesn't mind.
But when this logic was reworked, the typed field was used instead. This
lead to a compiler warning on Clang.
Let's correct this properly here, rather than papering over the compiler
warning.
Fixes: 7a763bb0a3 ("pan/genxml: Rework the RT/ZS emission logic")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
To generate a nir_component_mask_t, we should use nir_component_mask,
not BITFIELD_MASK()...
But we're also generating the same mask twice here, so let's just
store that to a variable and reuse the mask when shifting it while we're
at it.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
Currently LRZ writes are disabled when depth writes are enabled but the
fragment shader is using discard. Additionally, LRZ writes should be
disabled when fragment shader is outputting sample coverage or the pipeline
state is enabling alpha-to-coverage which behaves as a discard.
This fixes rendering problems on Assetto Corsa. Conditions now used for
disabling LRZ writes match one set of conditions under which the
EARLY_Z_LATE_Z z-test mode is used. It was assumed that in that mode the
LRZ writes in binning will not happen until the late-Z phase, but that's
apparently not the case.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36848>
this is only to catch the case of a bound descriptor being written to
by some operation other than its draw/dispatch descriptor bind,
so any non-write binds are ignored
previously those non-write binds were required because of how sync
analysis could drop non-write access, so that is fixed as well
also use the vbo bind count instead of the mask because why not
also also ignore non-write GENERAL image deferred sync because that shouldn't
need anything deferred
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36846>
this avoids a scenario where a non-subdata UNSYNCHRONIZED unmap triggers through
tc at the same time the frontend calls an UNSYNCHRONIZED subdata call
in the main thread, which desynchronizes the cmdbuf and hits an assert
Fixes: 8ee0d6dd71 ("zink: add a third cmdbuf for unsynchronized (not reordered) ops")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36846>
D3D11 requires that subnormals are not flushed to zero
when tessellating primitives. Since we are flushing
subnormals during shader execution, we must temporarily
turn flushing off when calling the tessellator.
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36811>
Some ops on 64 bit data don't require the data to reside in neighboring
channels and can be executed as seperate 32 bit ops. In these cases we don't
need to pin the registers to a specific channel, but for scheduling it is better
that we make sure that both destination values reside in different channels, so
that they can be scheduled into one ALU group and reduce the probability of
read-port conflicts when used as source values.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743>
More ops then op2_dot_ieee + op2_mul_ieee can be submitted
as multi-slot ops. Make it ease to handle additional opcodes
when splitting the alu op that has only one dst but requires
multiple slots. With that we can emit more multi-slot ops that
use consecutive slots and use a different opcode in the last slot.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743>
If we have more than one register that is associated with the same
ssa index, but can be allocated without a specific channel pinning,
then don't add it to the ssa.index/register.index map to not
re-use the same register index.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743>
In addition, on Cayman some trans opts can use three or four channels,
and it may be an advantage to use the four channel version if the
result needs to be written to the w channel to reduce the all-over
ALU instruction group count.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36743>
Instead of each helper having a VK_ERROR_FEATURE_NOT_PRESENT fast-reject
path, drop those paths and check at the top of each caller. This
ensures that we do the check once per wsi_device, and only on a known
test dma-buf and that any subsequent fails turn into fails rather than
silently turning off explicit/implicit sync in potentially inconsistent
ways.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36816>
Usually we can fold most ldc and ldcx into the instruction using it,
however there are a couple of cases where we can't, e.g. when there is an
indirect offset.
Moving the ldc(x) down to the consumer leads to increase value ranges for
uniform registers, but lowering them for normal registers.
Totals:
CodeSize: 914650304 -> 914469536 (-0.02%); split: -0.05%, +0.03%
Number of GPRs: 3879754 -> 3863818 (-0.41%); split: -0.42%, +0.01%
Static cycle count: 1073273107 -> 1073101189 (-0.02%); split: -0.09%, +0.08%
Spills to reg: 67219 -> 67707 (+0.73%); split: -0.10%, +0.83%
Fills from reg: 79733 -> 80456 (+0.91%); split: -0.10%, +1.01%
Max warps/SM: 3666036 -> 3672668 (+0.18%); split: +0.18%, -0.00%
Totals from 24235 (27.66% of 87622) affected shaders:
CodeSize: 444747392 -> 444566624 (-0.04%); split: -0.11%, +0.07%
Number of GPRs: 1360384 -> 1344448 (-1.17%); split: -1.20%, +0.03%
Static cycle count: 806310857 -> 806138939 (-0.02%); split: -0.12%, +0.10%
Spills to reg: 35826 -> 36314 (+1.36%); split: -0.19%, +1.55%
Fills from reg: 31863 -> 32586 (+2.27%); split: -0.26%, +2.53%
Max warps/SM: 911328 -> 917960 (+0.73%); split: +0.74%, -0.01%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36536>