Once we implement packing support for v15, the encoding will drastically
change. Therefore, make the packing tests explicitly use v10 to allow
for adding v15 support later.
With v15, we get support for 128 registers in any multiple of 16 (vs
previously having the choice between 32 or 64 register mode).
To support this, shader register count is passed in a different way from
v15, requiring some updates to how we encode the ShaderProgramDescriptor
and the ShaderProgramPointer.
Note that this currently does not change the compiler behavior of
running in either 32 or 64 register mode, just how this is passed to the
GPU.
This allows specifying multiple modifiers for fields in the xml which
will be applied in the written order when packing and inverse-applies in
the reverse order when unpacking.
thread_max_workgroup_size has been replaced with
thread_num_active_granularity in v15, which requires updated handling
for calculating the max number of threads in a workgroup
Since v15, gpu_ids are 64 bit, so they need to be handled differently.
To ease this, a compat value of 0xF is found in what previously used to
be ARCH_MAJOR, which we can use to decide whether to read information
from the full 64 bits.
Since we now cannot pass gpu_id directly as deviceID, align with the DDK
on what fields to expose.
A couple of preloads were missed when implementing the preload register
abstraction.
This fix is not required prior to v15, but marking it as a bug fix for
consistency.
Fixes: 1f0370616a ("pan: Centralize preload registers")
Set the FBD size/alignment correctly and emit the fragment staging
registers before issuing fragment commands.
Also, move some temporary registers to non-conflicting ones.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
On v14+, flat source formats are no longer supported by LD_VAR_BUF and
LD_VAR_BUF_IMM opcodes. This patch makes the compiler emit the
dedicated LD_VAR_BUF_FLAT* opcodes instead.
Add the ISA definitions, handle the new opcodes, and add packing tests
for both immediate and indirect forms.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Encoding for LdVarBufImmF16 on v11 changed compared to v10. Updated the
test to check for the right encoding.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
The provoking vertex bit in RUN_FRAGMENT2 is located in a register
instead of a descriptor stored in memory. That means we don't need to
patch memory, resulting in a much leaner implementation compared to
RUN_FRAGMENT.
Also, implement the simultaneous reuse copy path with the corresponding
tiler pointer patching.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Set the FBD size/alignment correctly and emit the fragment staging
registers before issuing fragment commands.
Also, move some temporary registers to non-conflicting ones.
Incremental rendering is left as TODO for later.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Enable building libpanfrost for v14. Also, modify format mappings to
account for the new architecture specification.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
v14+ no longer uses specific AFRC compression formats for YUV. Instead,
generic R8/R8G8 and R10/R10G10 formats are used.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Map the multiplane and special internal formats to the new v14+ YUV
formats. Note v14+ has a much simplified list of formats.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reuses the same structure that is used by pan_emit_fb_desc.
Also, modify pan_emit_fbd's signature to take a pan_ptr to the
framebuffer memory instead of the CPU-mapped pointer.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Add a new structure that is used to store per-layer RUN_FRAGMENT2 state.
Any other state will be emitted directly to registers.
Also, modify pan_emit_fb_desc's signature to take a pan_ptr to the
framebuffer memory instead of the CPU-mapped pointer.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Add support for emitting and decoding RUN_FRAGMENT2 instructions.
Some existing decoding logic from decode.c is modified to be reusable
by the new RUN_FRAGMENT2 decoding logic.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
This is just a copy of v13.xml to help spot any missing changes while
working on v14.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Note block-linear interleaved clump orderings are not supported on all
v10 architectures.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
The vertex input state can be NULL if rasterization is disabled with
dynamic vertex inputs.
The input assembly state can be NULL if rasterization is disabled
and both states are dynamic (primive topology and primitive restart
enable).
This fixes a segfault with gpu-ratemeter vk_dyn.prim
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41335>