The full nightly jobs have been failing for a while without much interest
in them.
Reduce Piglit coverage by switching to the `quick_gl` profile, which
is what the pre-merge jobs run.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36608>
The nightly jobs can hit OOMs on JSL and ADL, so reduce the number of
threads used by deqp-runner to avoid that.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36608>
Our SSBO access instructions expect offsets in units of the accessed
type's size. However, we were ingesting SSBO intrinsics that use byte
addresses. We were fixing this up in ir3_nir_lower_io_offsets by
inserting a ushr or, if possible, propagating this shift into another
shift that's part of the address calculation.
Having to insert a ushr if unfortunate, as for most accesses, it should
be possible to extract this shift directly from the access chain because
the array strides and struct offsets would be properly aligned. It also
prohibits nir_opt_offsets to find constant additions to extract as they
would be hidden behind a ushr that often cannot be optimized away.
57ea689273 ("ir3: optimize SSBO offset shifts for nir_opt_offsets")
tried to overcome the latter problem somewhat by pushing a ushr into
additions. This turned out to be unsound because even though SSBO
offsets are unsigned, intermediate results in the offset calculation
might be negative values which means we should use ishr in those cases.
Unfortunately, we cannot know when to use ushr or ishr.
This commit switches ir3 to the newly introduced offset_shift index for
SSBO intrinsics. This allows the shift to be extracted when lowering
derefs in nir_lower_explicit_io. In some, we still might have to add an
extra shift to make sure the offset uses the correct units. It turns out
that this is very rare and using offset_shift greatly improves the
shader stats:
Totals from 33267 (20.20% of 164705) affected shaders:
MaxWaves: 440368 -> 455258 (+3.38%); split: +3.40%, -0.01%
Instrs: 22974358 -> 21844188 (-4.92%); split: -4.98%, +0.06%
CodeSize: 45456418 -> 43099334 (-5.19%); split: -5.22%, +0.03%
NOPs: 4612549 -> 4524353 (-1.91%); split: -2.97%, +1.05%
MOVs: 802018 -> 817547 (+1.94%); split: -3.29%, +5.23%
COVs: 381987 -> 382061 (+0.02%); split: -0.03%, +0.05%
Full: 514078 -> 477339 (-7.15%); split: -7.18%, +0.04%
(ss): 544419 -> 502332 (-7.73%); split: -9.12%, +1.39%
(sy): 292099 -> 304697 (+4.31%); split: -3.19%, +7.50%
(ss)-stall: 2106134 -> 2104011 (-0.10%); split: -1.82%, +1.71%
(sy)-stall: 9704720 -> 10324864 (+6.39%); split: -4.64%, +11.03%
STPs: 11301 -> 10074 (-10.86%)
LDPs: 18654 -> 17202 (-7.78%)
Preamble Instrs: 4652214 -> 4580289 (-1.55%); split: -1.59%, +0.04%
Early Preamble: 13977 -> 13978 (+0.01%)
Constlen: 1881764 -> 1881304 (-0.02%); split: -0.03%, +0.01%
Last helper: 5157587 -> 5074042 (-1.62%); split: -1.86%, +0.24%
Subgroup size: 2262976 -> 2263232 (+0.01%)
Cat0: 5065452 -> 4976324 (-1.76%); split: -2.73%, +0.97%
Cat1: 1241085 -> 1251974 (+0.88%); split: -2.52%, +3.40%
Cat2: 8462897 -> 7723367 (-8.74%); split: -8.74%, +0.01%
Cat3: 5738382 -> 5735312 (-0.05%); split: -0.06%, +0.00%
Cat5: 761945 -> 763017 (+0.14%); split: -0.00%, +0.14%
Cat6: 199819 -> 197766 (-1.03%); split: -1.34%, +0.31%
Cat7: 890192 -> 581842 (-34.64%); split: -35.20%, +0.57%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
The goal here is to generate addresses that are a right-shifted version
of the actual byte address and record the shift amount in the
offset_shift index. While we could just insert a ushr at the end of
deref chains, this will prevent the shift to be optimized away in many
cases. Instead, we try to extract the shift from the array strides and
struct offsets that make up the deref chain, and only insert a ushr when
absolutely necessary (i.e., for casts). This means we have to walk the
entire deref chain at once for accesses that support offset_shift and we
don't use the standard algorithm of replacing each deref one at a time.
To be able to legally right-shift casts, we use the alignment
information and never shift more than what the alignment could support.
It should also be noted that casts generally have two sources: something
provided by the driver (e.g., a Vulkan resource index) or a variable
pointer coming from a phi/bcsel. For the latter, the entire access chain
consists of multiple parts that are ended by either a phi/bcsel or an
access. Only the part the ends in an access is handled by this new
algorithm; the other parts are handled as usual. This is necessary
because we have no way to encode the offset shift or to even know how
much we would be able to shift without knowing how it is accessed.
This commit adds the general implementation for lowering accesses using
offset_shift and adds a compiler option for drivers to enable it for
SSBO accesses.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
We will need this when building shifted addresses. Since adding these
parameters has a lot of code churn which would distract from the main
changes, it is split-off in a separate commit.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
We will add support for shifted addresses; this commit makes sure the
APIs of the functions already support passing shifts.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
The helper is used to build the address passed to
build_explicit_io_load/store. For now, it simply takes care of adding
the component offset when scalarizing. In the future, this can be used
to do more complex address manipulations, like calculating the full
deref chain address.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
nir_explicit_io_address_from_deref implicitly builds the offset but only
makes the full address available. Split-out the offset calculation in a
separate function so we can reuse it elsewhere.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
Hardware will typically do bounds checking on the final scaled address
so the wrap check should do the same.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
We currently support offset scaling on a per-intrinsic type basis. Since
the introduction of the offset_shift index, different instantiations of
the same type can now have a different scale. Add support for this by
calculating the offset scale on the fly for instructions that have
offset_shift.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
Note: this was implemented and tested for ir3. The code paths that are
never used there [1] seem non-trivial to implement. Since they cannot be
easily tested, asserts and TODOs are added to ensure we don't
accidentally hit them for intrinsics with offset_shift.
[1]: these paths are never used on ir3 since lower_mem_access_bit_sizes
is only used for SSBO accesses to lower 64b accesses (which are 64b
aligned) to 32b ones. So we'll never request an increase of alignment.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
The immediate addition can easily be handled by nir_opt_offsets, which
will also take any driver limits into account.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
In ir3, SSBO offsets are in units of the accessed type size so we want
to start using the new offset_shift index.
Even though the shift is implicit for the ir3 intrinsics, we use
nir_intrinsic_copy_const_indices when creating them so we need to make
sure our indices match the ones used by the generic intrinsics.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
For intrinsics supporting offset_shift, dealing with their offset is a
bit tricky as we cannot simply add a byte offset to it anymore (which is
what most passes want to do). This commit adds some helpers to add byte
offsets (and adjusting offset_shift accordingly) so that individual
passes don't have to worry about this.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
For load/store intrinsics that take an offset, this specifies the amount
the offset is shifted left to calculate the final offset:
offset = (offset_src + base) << offset_shift
This is useful for backends that have memory operations that use offset
units other than bytes (i.e., where the shift is implicit).
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
Values are taken from minStorageBufferOffsetAlignment and
minUniformBufferOffsetAlignment.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35092>
Predicate registers can be written from the scalar ALU by using a
special cat2 encoding: if the dst is encoded as a0.c, the instruction
will execute on the scalar ALU and write to p0.c.
This commit follows the blob and disassembles scalar predicates as
up0.c. The "u" presumably stands for "uniform".
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36614>
Predicate registers can be written from the scalar ALU by using a
special cat2 encoding: if the dst is encoded as a0.c, the instruction
will execute on the scalar ALU and write to p0.c.
This commit makes the ir3 backend aware of scalar predicates. A new
register flag (IR3_REG_UNIFORM) is added that can be used to mark
predicate dsts as being written by the scalar ALU. For such dsts, the
same synchronization rules apply as for shared registers written by the
scalar ALU (e.g., (ss) is needed to read them from the vector ALU).
Scalar predicates can be used in the early preamble, which makes control
flow available there.
In many ways, the backend treats IR3_REG_UNIFORM the same as
IR3_REG_SHARED. A new flag was added because IR3_REG_SHARED is mainly
used to denote a separate register file, not as a flag to indicate usage
by the scalar ALU. Scalar predicates still use the normal predicate
register file but allow it to be written from the scalar ALU.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36614>
New type was not handled in the switch which lead to hitting following
assert when running tests with pipeline cache:
deqp-vk: ../src/compiler/glsl_types.c:3334: decode_type_from_blob: Assertion `!"Cannot decode type!"' failed.
Fixes: 9e5d7eb88d ("compiler/types: add a bfloat16 type")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36833>
[WHY]
It was found that the caller may call with stream_count = 0, while
streams array is some garbage.
it randomly ends up output_ctx being modified and leading to validation
failure.
[HOW]
Add checking to the stream_count.
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Roy Chan <Roy.Chan@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36809>
[WHY]
For further debugging need to know about the build cmd variables.
[HOW]
Added these input and output paramaters to vpe events.
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Muhammad Ansari <Muhammad.Ansari@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36809>
Various small optimizations that have been accumulating, deal with them
in one commit:
- Add erase functionality for vector util, remove memsets for time opt.
- Update should_gen_cmd_info to take in any stream variables.
- Program funcs should directly program - update mpcc mux hook func to
take in blend_mode.
- Add reserved bits for debug flags.
Signed-off-by: Brendan Steven, Leder <BrendanSteven.Leder@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36809>
Calculation for the worst case scenario in bufs_req should also include
predication command size.
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Andrzei Okenczyc <Andrzej.Okenczyc@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36809>