On v12+, IDVS no longer has separate position and varying variants, so
we only need to emit stats for one binary. Attempting to emit stats for
the nonexistent varying shader breaks shader-db.
Fixes: 7819b103fa ("pan/bi: Add support for IDVS2 on Avalon")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40810>
Now that all of the additional cases are handled, we can hook up the
allow_merging_workgroups flag in panvk.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Mali does not support divergent operands in some cases, and we are
already using lower_non_uniform_access to handle this for descriptor
indexing. We can extend this to handle merged workgroups by just tagging
every intrinsic as nonuniform and then letting divergence analysis sort
out which ones can actually be nonuniform in opt_non_uniform_access.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Merging workgroups affects divergence analysis, since subgroups can now
contain extra threads from other workgroups. We already have divergence
analysis flags to handle this case, but since the compiler options memory
is static, we need to define an entirely separate option set for merged
vs non-merged workgroups.
In gallium, we don't have to switch options because opengl requires
uniformity over the entire dispatch in application shaders.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Vulkan guarantees that all subgroup invocations will be part of the same
workgroup, so we need to disable merging workgroups for shaders where
the subgroup layout is observable.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
In panvk, we will need to decide whether we are merging workgroups early
in shader compilation, before calling nir_lower_non_uniform_access. This
is because nonuniform lowering introduces new subgroup intrinsics which
would otherwise inhibit workgroup merging, and because the set of
instructions that need to be lowered may be different with merged
workgroups.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
The only requirement for barriers is that the hardware doesn't support
allow_merging_workgroups with actual BARRIER instructions. We only emit
these for workgroup execution barriers though, so are safe to merge
workgroups when the shader uses memory barriers or subgroup execution
barriers.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
This avoid unnecessarily emitting the switch back/away ambles when
they aren't actually used due to preemption optimization being
disabled. This alleviates unnecessary overhead when not running with
the mitigation for kernel drivers which support it.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40852>
KGSL doesn't support reading of performance counters by writing to
the selector registers directly from a userspace CS, instead these
requests need to be routed via the KGSL uAPI for perf counters.
Certain Turnip features which use performance counters such as
KHR_performance_query as well as preempt-optimize mode in autotune
are now explicitly disabled to reflect this.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40852>
For consistency with other shader stages (required by d3d, neither GL nor
Vulkan really care). A bit awkward since we don't want to disable them for
things like rusticl, which we should be able to distinguish with shader type.
Note that to satisfy d3d requirements, disabling denorms in general is not
sufficient, due to d3d requiring them to be disabled for single precision
opcodes, but enabled for double precision ones, and x86 can't switch that
individually (hence will need per-instruction tracking and switching inside
the shader).
Reviewed-by: Brian Paul <brian.paul@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40787>
Similar to what we already do for smallfloats to floats, handle denorms
and normals separately with bit manipulation stuff rather than rely on
a rescale mul which depends on cpu denorms.
This is a bit more complex, but on the upside we don't need to track
fpstate for denorms anymore in llvmpipe backend. (With modern x86 cpus
this is essentially only really relevant for r11g11b10 float format,
since f16 formats are using f16c instructions.)
Reviewed-by: Brian Paul <brian.paul@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40787>
isel.cf.deep_traversal is a new ACO test that verifies
that the iterative nir cf visitor allows arbitrary depth.
A depth of 10000 would cause a stack overflow on x86-64 linux
(4096 kB stack) for the old recursive code. This test
is by default not enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
When iterating control-flow recursively, we always run the
risk of causing a stack overflow if the control-flow
depth is too large. This patch resolves this by visiting
control-flow nodes in an iterative way, managing an explicit
stack on the heap.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
All B-series Rogue cores seem to have USC rounding mode as RTE instead
of RTZ.
Set the has_usc_alu_roundingmode_rne feature flag for them (currently
only BXS-4-64 has it set).
Verified via testing on BXM-4-64 (36.52.104.182) by fixing CTS tests
dEQP-VK.spirv_assembly.instruction.*.float_controls.fp32.input_args.* ,
and via proprietary driver vulkaninfo result on BXE-2-32 (36.29.52.182),
BXE-4-32 (36.50.54.182) and BXM-4-64 (36.56.104.183) (checking
shaderRoundingModeRT?Float32 properties).
Fixes: 1db1038a61 ("pvr: add device info for BXM-4-64 (36.56.104.183)")
Fixes: e60e0c96ba ("pvr: add device info for BXE-2-32 (36.29.52.182)")
Fixes: 2743363a57 ("pvr: add device info for BXM-4-64 (36.52.104.182)")
Fixes: ea28791d40 ("pvr: add device info for BXE-4-32 (36.50.54.182)")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40794>
While the swizzle code was producing the correct encoding, the
disassembly was slightly weird and swz_16 required an extra argument
that was always "false".
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40865>
The Vulkan runtime provides the dynamic state infrastructure via
vk_common_CmdSetAttachmentFeedbackLoopEnableEXT(). This builds on the
attachment feedback loop layout support.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40498>
PanVK treats image layouts as no-ops and already disables Forward Pixel
Kill when the same render target is both read and written.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40498>
While not currently required, it will be for future GPUs.
Also cleans up gpu_id as parameter to some functions that didn't use it.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40610>
The current implementation is a bit awkward and becomes tricky when
adding support for 64 bit gpu_ids.
Rather than keeping a mask of bits in gpu_id to compare with the stored
gpu_prod_id value, rely on macro functions for fetching the information
required from gpu_id and creating the comparison value.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40610>
Rather than having preload registers hardcoded over multiple files,
gather them in one place with an enum abstraction.
This should simplify updates to the preload registers.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40643>
genX_cmd_compute.c has 2 places that is had a code very similar to
anv_shader_get_scratch_surf() but we could not make use of this function without
change it parameters.
Now it takes the shader stage and the total_scratch instead of anv_shader because
cmd_buffer_trace_rays() don't have a shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40832>
We will need to call get_scratch_surf() from other files, so here removing the
static and adding it to anv_private.h.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40832>
We are already at our limit of 31 texture opcodes, and cannot add any
more without expanding the opcode hashing in nir_instr_set. Thankfully,
it's at 29 bits, so adding one here is possible still.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40833>
New on Xe2, this instruction enables faster 32x32 integer multiply at the cost
of extra accumulator usage. Add it to the opcode list for future use.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40833>
A set is large and expensive to iterate.
This is faster (overall fossilize-replay difference):
Difference at 95.0% confidence
-250 +/- 28.9257
-2.04849% +/- 0.235211%
(Student's t, pooled s = 34.1626)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>