The number of IBs per submit isn't infinite, it depends on the IP type
(ie. some initial setup needed for a submit) and the packet size.
It can be calculated according to the kernel source code as:
(ring->max_dw - emit_frame_size) / emit_ib_size
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22354>
For stream!=0, this align_mul=4 is not true. Not observe any
problem yet, just for correctness.
Fixes: 60ac5dda82 ("ac: Add NIR lowering for NGG GS.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22304>
If vertex does not complete a primitive, it should not set the odd
flag which miss lead liveness check when culling is enabled.
For example, if odd flag is set regardless of complete flag, when
culling is enabled, 3 vertices of a triangle's init prim flag:
[0x00 0x04 0x01]
then after culling, this triangle has been culled, their prim flag:
[0x00 0x04 0x00]
the second vertex is miss treat as live because its odd flag (code
check prim_flag!=0 for liveness).
Fixes: 1bdeb961bd ("ac/nir/ngg: add gs culling")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8725
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22304>
set the crop region and enable the feature if requested
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22158>
update the register version and select appropriate registers during decoder create
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22158>
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22158>
AMD recommends doing this to speed up the CP when it processes
the draw ring entries. LLPC also does this.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22211>
When writing the draw ready bit, don't write the high 24 bits
of DWORD3, because that is used by the HW for something else
according to LLPC.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22211>
Inspired by Nicolai Hähnle's commit in LLPC.
Instead of using a SALU instruction to add to the scalar
offset, rely on the buffer swizzling and use constant offset.
Fossil DB stats on GFX1100:
Totals from 47910 (35.51% of 134913) affected shaders:
CodeSize: 87927612 -> 86968136 (-1.09%)
Instrs: 17584007 -> 17440094 (-0.82%)
Latency: 97232173 -> 97126311 (-0.11%)
InvThroughput: 9904586 -> 9905288 (+0.01%); split: -0.02%, +0.02%
VClause: 544430 -> 542566 (-0.34%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22227>
Update the metadata format. For gfx8- chips nothing change.
For gfx9 chips:
* for textures without a valid modifier a dw is added at index=10
containing the stride
* for textures with a valid modifier the modifier is stored at
index 10 and 11. Then the number of planes is stored at 12.
Then for each plane the offset and the stride are stored.
The goal here is to be able to create textures from dmabuf from
umr - without these changes this is impossible because these
values can't be guessed.
The new layout is compatible with version=1 so old/new UMD can
be used together without issues and isn't used by default.
For radeonsi, it will be possible to use it with a AMD_DEBUG=...
option.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21984>
Use more specific verbs to avoid confusion:
set -> apply
get -> compute
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21984>
We never implemented it, and having broken mipmaps works out better
for applications and CTS. Actually implementing stencil adjust is
going to be a major pain due to stuff like the GENERAL layout.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21869>
Add some set macro defines for mesh shading packets.
The naming convention is:
S_(packet opcode)(dword index)_FIELD_NAME
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21409>
This should be handled separately from the other repacked
variables, because it doesn't use a dword.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21995>
These mostly existed because of the long name of the state variable
and are not really necessary.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21995>
We are planning to reuse more than just uniforms later,
hence let's clarify the name of these.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21995>
These were meant to explain the LDS layout, but
the actual LDS usage is better explained by:
ngg_nogs_get_culling_pervertex_lds_size().
Also add some comments there.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21995>
Can you spot the problem?
exp param0 v6, v5, v5, v5
exp param1 v7, off, off, off
exp param1 v7, off, off, off
radeonsi uses ac_nir_optimize_outputs to eliminate output stores with
identical SSA defs (i.e. duplicated), which then causes 2 outputs to
map to the same parameter export.
This is a regression. The old LLVM code was correctly emitting each
export only once. vs_output_param_mask was supposed to be used for
this instead of vs_output_param_offset.
Fixes: 80506be31b - ac/nir/ngg,radv,radeonsi: nogs use ac_nir_export_(position|parameter)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21920>
tcs_tess_lvl_(in|out)_loc may be not set if user miss tess
factor output.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21437>
Radeonsi is going to use nir tess factor write, so need to
remap tess factor location.
RADV set tess factor driver location to be 0 and 1 in
get_linked_variable_location(). While radeonsi also set them
to be 0 and 1 in st->map_io aka. si_shader_io_get_unique_index_patch().
We could just set them to be 0 and 1 at the beginning of
ac_nir_lower_hs_outputs_to_mem(), but in order to keep the
location map at the same place, we still do this in
lower_hs_output_store().
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21437>
It will be shared by other nir lowering too.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21437>
Block compressed + linear format is not supported in addrlib. But these
surface can be used as transfer resource. RADEON_SURF_NO_TEXTURE flag
indicates not to set flags.texture flag in gfx9_compute_surface().
This will help to fix the vkCreateImage() crash where block
compressed + linear format image is requested.
v2: combine RADEON_SURF_NO_TEXTURE to below line (Marek Olšák)
v1: add RADEON_SURF_NO_TEXTURE flag (Marek Olšák)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21422>