It's possible to get a display list with no vertex buffers if the linker
eliminates all VS inputs or if the list was built with glArrayElement with
no enabled attribs.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21860>
The hardware uses the register to premultiply GS vertex indices
in input VGPRs.
This changes the behavior as follows:
- VGT_ESGS_RING_ITEMSIZE is always 1 on gfx9-11, set in the preamble.
- The value is passed to the shader via current_gs_state (vs_state_bits).
- The shader does the multiplication.
The reason is that VGT_ESGS_RING_ITEMSIZE will be removed in the future.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403>
gfx11 renamed PRIM_GRP_SIZE to VERTS_PER_SUBGRP but another change was
was missed.
Update our code based on PAL's UniversalCmdBuffer::CalcGeCntl function
(especially useVgtOnchipCntlForTess being false for gfx11).
Fixes: 25a66477d0 ("radeonsi/gfx11: register changes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20728>
We don't need to load screen->debug_flags because sctx->thread_trace
is already telling us if sqtt is enabled.
Furthermore we can perform this check only for GFX9 because sqtt
isn't supported currently on older chips.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
RGP expects a pipeline's shaders to be all stored sequentially, eg:
[vs][ps][gs]
As such, it assumes a single bo is dumped to the .rgp file, with
the following info:
* va of the bo
* offset to each shader inside the bo
For radeonsi, the shaders are stored individually, so we may have
a big gap between the shaders forming a pipeline => we can produce
very large file because the layout in the file must match the one
in memory (see the warning in ac_rgp_file_write_elf_text).
This commit implements a workaround: gfx shaders are re-exported as a
pipeline.
To update the shader address, a new state is created (sqtt_pipeline),
which will overwrite the needed _PGM_LO_* registers.
This reduces DeuxEX rgp captures from 150GB+ to less than 100MB.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
Multi-mode multi-draws will make it more complicated, so let's start with
simpler code.
I changed the order a little: I put the VBO update next to emit_draw_packets.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
This moves setting those registers from an unconditional place in draw_vbo
into si_set_rasterized_prim (for draw_vbo), si_update_rasterized_prim
(for bind_xx_shader), and si_bind_rs_state.
It's a little more complicated than expected.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
This removes the parameter from si_emit_derived_tess_state and uses
si_context to pass it. This rework is needed for multi-mode draws
where num_patches will be needed much later.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
tcs_out_lds_offsets is not sure to be 16 byte aligned, it's
calculated like this:
num_patches * patch_vertices * lshs_vertex_stride
num_patches and patch_vertices are not sure to be any value aligned,
lshs_vertex_stride is added one extra dword, so it's only 4 byte
aligned.
This may cause problem even before we switch to nir tess output
lower when write tess factor before read tail of input. But it's
more likely to cause problem after we switch to nir tess output
lower because the main body won't eliminate the low 4bit offset
but epilog will, so they use different offset to read/write tess
factor.
Fixes: 7598bfd768 ("radeonsi: replace llvm tcs output with nir lower pass")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7083
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18174>
We were writing descriptors into si_context and then copying them into
the command buffer. Just write them into the command buffer directly.
Also set the pointer to VBO descriptors right after them.
When we start a new command buffer or we finish blitting, we no longer
restore precomputed VBO descriptors. Instead, we just reupload them again.
It's a compromise to have the common path simpler and faster (maybe).
This removes a lot of stuff. Now the VBO descriptor upload path looks
very similar to the display list path.
There was an accidental hidden optimization that is now documented as
"last_const_upload_buffer".
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
While this is nice to have, it doesn't include VBO descriptors in user
SGPRs, and we need to remove it, so that we can simplify the VBO code.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
This is more straightforward. Also, radeon_add_to_buffer_list makes
writing VBO descriptors into the command buffer slower after that code
is reordered in following commits. This seems to be the only way that
isn't slower.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
Create nir passthrough shader with explicit input/output and vertex
output count so that it can be handled by compiler same as user tcs.
The drawback is we create more si_shader_selector with different
input/output and vertex output count which was handled by compiler
backend before.
As fixed function tcs can be handled like user tcs, we don't need
the dedicated fixed_func_tcs_shader state either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
We need more GS/NGG bits, so we need to add current_gs_state for that.
This simplifies the logic in the draw code.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
Statistics only work in non-NGG mode. If screen->use_ngg is true, we can't
know if the draw will actually use NGG or not, so this commit switch
to a shader based implementation of this counter.
To avoid modifying si_query, the shader implementation behaves like the hw
one: it uses the same buffer size and offset.
The emulation path activation in the shader is controlled by vs_state_bit[31].
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
DISABLE_INSTANCE_PACKING needs to be enabled when stats queries are
active to fix incorrect results.
We need to emit this for indexed and non-indexed draws.
Based on PAL's waDisableInstancePacking.
This fixes:
KHR-GL46.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
This uses the common helper code to implement the tess ring sizing.
One question is if radeonsi should be using tess_offchip_ring_offset
in some places it's using tess_factor_ring_size?
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16415>