Entering a loop with empty exec mask might lead to
not be able to execute the break condition and
lead to infinite loops.
Totals from 81 (0.04% of 202440) affected shaders: (Navi48)
Instrs: 3040566 -> 3040716 (+0.00%)
CodeSize: 17506768 -> 17507188 (+0.00%)
Latency: 16342966 -> 16345166 (+0.01%)
InvThroughput: 3112932 -> 3113286 (+0.01%)
Branches: 82229 -> 82365 (+0.17%)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
Map the kernel alloc_flag AMDGPU_GEM_CREATE_DISCARDABLE to
RADEON_FLAG_DISCARDABLE in function radv_amdgpu_bo_get_flags_from_fd.
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40879>
This updates 14024997852 with BMG and brings in media WA
16021867713.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40881>
The new definitions have their numbers offset by 1 (e.g. S_580 -> S_581).
The remaining old definitions are adjusted to match that.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40588>
The old implementation only worked for 16-bit because we assumed scalar
so we could stomp the whole destination as if it was 32-bit. This
version works for v2i16.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
Now that bi_byte() does the right thing, we can just use it and not
worry about the rest.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
We call bi_optimize_nir() a few lines later.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
This isn't hard to do and it gives us a lot more flexibility in what NIR
we can consume.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
Add asserts this time that we don't miss any and that the buckets
actually match the enum in bifrost/compiler.h.
Fixes: 82328a5245 ("pan/bi: Generate instruction packer for new IR")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
It appears nowhere so I don't know why we have it in the enum.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
This will let us share a common scratch swizzling between brw and jay.
Changes by Ken:
- Use an immediate SIMD width when known so we don't need to re-lower
- Switch to load_simd_width_intel because it may not match
info->api_subgroup_size on Vulkan without VK_EXT_subgroup_size_control
- Stop using DWord Scattered Write messages for scratch. These take an
offset in DWords, and our offsets are now always in bytes. This also
means that we no longer create MEMORY_OPCODE_* IR with inconsistent
units of either bytes or dwords. Yikes. We use byte scattered
messages now.
fossil-db stats on Battlemage:
Instrs: 500477504 -> 500450056 (-0.01%); split: -0.01%, +0.00%
CodeSize: 7807432368 -> 7806786192 (-0.01%); split: -0.01%, +0.00%
Cycle count: 62404008370 -> 62398437734 (-0.01%); split: -0.01%, +0.00%
Fill count: 546690 -> 546695 (+0.00%); split: -0.00%, +0.00%
Max live registers: 141257956 -> 141258100 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 72350283 -> 72336544 (-0.02%)
Totals from 99 (0.01% of 1581969) affected shaders:
Instrs: 366593 -> 339145 (-7.49%); split: -7.58%, +0.09%
CodeSize: 6425936 -> 5779760 (-10.06%); split: -10.06%, +0.00%
Cycle count: 2412009876 -> 2406439240 (-0.23%); split: -0.26%, +0.03%
Fill count: 19675 -> 19680 (+0.03%); split: -0.02%, +0.04%
Max live registers: 17600 -> 17744 (+0.82%); split: -0.09%, +0.91%
Non SSA regs after NIR: 37894 -> 24155 (-36.26%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
brw_nir_apply_key typically knows the dispatch width (it's fixed for
geometry stages, and we clone the NIR for compute and mesh shaders).
For compute/mesh, this was the very next thing called. For the others,
if we know the width, there's no reason not to lower it.
Scratch lowering will start using load_simd_width_intel soon, so we
need it to work in all stages.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
This records the actual SIMD width we selected for the shader, in
all cases except fragment shaders, where we don't know it yet.
MR 37258 notes that "Backends can update [these fields] when they make
new decisions about the subgroup size" - which is what we now do.
Note that nir->info.api_subgroup_size may be different than min/max
subgroup size on Vulkan prior to SPV1.6/VK_EXT_subgroup_size_control,
so we do not alter that.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
This lets us emit NIR code based on the SIMD size. For non-fragment
stages, we'll replace it with a constant and optimize, but for FS,
we delay it until the backend.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
With multiview, the HW has to dispatch at least ViewCount * 6 fibers per
primitive, since there are up to 6 VS threads per primitive. The HW can
launch multiple GS waves per VS wave but one VS wave must contain the
entire primitive. With ViewCount = 16 there are 96 fibers per
primitive, which is more than we can launch in one wave. To fix this,
lower the maximum view count.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40153>
With GS+multiview, the VS will loop over each view in the shader while
each GS invocation only corresponds to a single view. Varyings for each
view will be stored next to each other in local memory. Implement view
index calculations when lowering VS outputs/GS inputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40153>
On v12+, IDVS no longer has separate position and varying variants, so
we only need to emit stats for one binary. Attempting to emit stats for
the nonexistent varying shader breaks shader-db.
Fixes: 7819b103fa ("pan/bi: Add support for IDVS2 on Avalon")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40810>
Now that all of the additional cases are handled, we can hook up the
allow_merging_workgroups flag in panvk.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Mali does not support divergent operands in some cases, and we are
already using lower_non_uniform_access to handle this for descriptor
indexing. We can extend this to handle merged workgroups by just tagging
every intrinsic as nonuniform and then letting divergence analysis sort
out which ones can actually be nonuniform in opt_non_uniform_access.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Merging workgroups affects divergence analysis, since subgroups can now
contain extra threads from other workgroups. We already have divergence
analysis flags to handle this case, but since the compiler options memory
is static, we need to define an entirely separate option set for merged
vs non-merged workgroups.
In gallium, we don't have to switch options because opengl requires
uniformity over the entire dispatch in application shaders.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Vulkan guarantees that all subgroup invocations will be part of the same
workgroup, so we need to disable merging workgroups for shaders where
the subgroup layout is observable.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
In panvk, we will need to decide whether we are merging workgroups early
in shader compilation, before calling nir_lower_non_uniform_access. This
is because nonuniform lowering introduces new subgroup intrinsics which
would otherwise inhibit workgroup merging, and because the set of
instructions that need to be lowered may be different with merged
workgroups.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
The only requirement for barriers is that the hardware doesn't support
allow_merging_workgroups with actual BARRIER instructions. We only emit
these for workgroup execution barriers though, so are safe to merge
workgroups when the shader uses memory barriers or subgroup execution
barriers.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
This avoid unnecessarily emitting the switch back/away ambles when
they aren't actually used due to preemption optimization being
disabled. This alleviates unnecessary overhead when not running with
the mitigation for kernel drivers which support it.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40852>
KGSL doesn't support reading of performance counters by writing to
the selector registers directly from a userspace CS, instead these
requests need to be routed via the KGSL uAPI for perf counters.
Certain Turnip features which use performance counters such as
KHR_performance_query as well as preempt-optimize mode in autotune
are now explicitly disabled to reflect this.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40852>