Commit graph

197 commits

Author SHA1 Message Date
Marek Olšák
999b956ebc radeonsi: correct an assertion if we get a display list with no vertex buffers
It's possible to get a display list with no vertex buffers if the linker
eliminates all VS inputs or if the list was built with glArrayElement with
no enabled attribs.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21860>
2023-03-15 13:16:34 +00:00
Marek Olšák
c2f3339783 radeonsi: remove unused TCS/TES SGPR fields
We stopped using them when we switched to ac_nir_lower_hs_outputs_to_mem.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21860>
2023-03-15 13:16:34 +00:00
Marek Olšák
ddded6fbb5 radeonsi: emulate VGT_ESGS_RING_ITEMSIZE in the shader on gfx9-11
The hardware uses the register to premultiply GS vertex indices
in input VGPRs.

This changes the behavior as follows:
- VGT_ESGS_RING_ITEMSIZE is always 1 on gfx9-11, set in the preamble.
- The value is passed to the shader via current_gs_state (vs_state_bits).
- The shader does the multiplication.

The reason is that VGT_ESGS_RING_ITEMSIZE will be removed in the future.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403>
2023-03-08 07:29:09 +00:00
Marek Olšák
3e8bd05020 radeonsi: don't set PACKET_TO_ONE_PA for line stippling
A hw guy told me this.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
2023-02-24 21:27:24 +00:00
Marek Olšák
b9c6ef7f51 radeonsi: remove unused VS_STATE_LS_OUT_PATCH_SIZE
This became unused when we switched to nir_lower_hs_inputs_to_mem.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
2023-02-24 21:27:24 +00:00
Marek Olšák
98eee7dee3 amd: replace SI_BIG_ENDIAN with UTIL_ARCH_BIG_ENDIAN
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
2023-02-24 21:27:24 +00:00
Marek Olšák
429f43f088 radeonsi: use SPI_SHADER_USER_DATA_HS_0 definition instead of LS_0
The value is the same, but LS_0 is for gfx9 only, and HS_0 is for everything
except gfx9.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
2023-02-24 21:27:23 +00:00
Marek Olšák
742c9f411b radeonsi: change si_shader::ctx_reg to a nameless union for better readability
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
2023-02-24 21:27:23 +00:00
Marek Olšák
3e9863f496 radeonsi: move a few DB_SHADER_CONTROL states into si_shader_ps
They can be set si_shader_ps.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
2023-02-24 21:27:23 +00:00
Pierre-Eric Pelloux-Prayer
f73cdda983 radeonsi/gfx11: fix ge_cntl programming
gfx11 renamed PRIM_GRP_SIZE to VERTS_PER_SUBGRP but another change was
was missed.

Update our code based on PAL's UniversalCmdBuffer::CalcGeCntl function
(especially useVgtOnchipCntlForTess being false for gfx11).

Fixes: 25a66477d0 ("radeonsi/gfx11: register changes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20728>
2023-01-25 08:09:13 +00:00
Pierre-Eric Pelloux-Prayer
df16fa43ff radeonsi: handle sqtt pipeline in shader prefetch
When sqtt is enabled, the shader code lives in the pipeline bo,
not in the shader bo.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
2022-10-25 11:58:07 +00:00
Pierre-Eric Pelloux-Prayer
6189af1ddb radeonsi: store the shader gpu adress in si_shader
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
2022-10-25 11:58:07 +00:00
Pierre-Eric Pelloux-Prayer
619f009ff2 radeonsi/sqtt: simplify condition to determine if sqtt is on
We don't need to load screen->debug_flags because sctx->thread_trace
is already telling us if sqtt is enabled.
Furthermore we can perform this check only for GFX9 because sqtt
isn't supported  currently on older chips.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
2022-10-25 11:58:07 +00:00
Pierre-Eric Pelloux-Prayer
cc5dd491ec radeonsi: simplify si_prefetch_shaders
Since 93cd96b523 the only used value of si_L2_prefetch_mode
was PREFETCH_ALL so we can remove some dead code in si_prefetch_shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
2022-10-25 11:58:07 +00:00
Pierre-Eric Pelloux-Prayer
8034a71430 radeonsi/sqtt: re-export shaders in a single bo
RGP expects a pipeline's shaders to be all stored sequentially, eg:

  [vs][ps][gs]

As such, it assumes a single bo is dumped to the .rgp file, with
the following info:
  * va of the bo
  * offset to each shader inside the bo

For radeonsi, the shaders are stored individually, so we may have
a big gap between the shaders forming a pipeline => we can produce
very large file because the layout in the file must match the one
in memory (see the warning in ac_rgp_file_write_elf_text).

This commit implements a workaround: gfx shaders are re-exported as a
pipeline.

To update the shader address, a new state is created (sqtt_pipeline),
which will overwrite the needed _PGM_LO_* registers.

This reduces DeuxEX rgp captures from 150GB+ to less than 100MB.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
2022-10-25 11:58:07 +00:00
Pierre-Eric Pelloux-Prayer
4fdf10fdaf radeonsi/gfx11: don't set VERTS_PER_SUBGRP to 0
It seems slower.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: 25a66477d0 ("radeonsi/gfx11: register changes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18758>
2022-09-23 13:29:18 +00:00
Pierre-Eric Pelloux-Prayer
adad285fc9 radeonsi: use LOAD_CONTEXT_REG_INDEX for VGT_STRMOUT_DRAW_OPAQUE
Based on PAL's UniversalCmdBuffer::CmdDrawOpaque.

We don't need to use PFP_SYNC_ME because it's done in emit_cache_flush.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18129>
2022-08-30 10:51:43 +00:00
Marek Olšák
01d351a491 radeonsi: move patch_vertices-related tessellation updates out of si_draw
This only depends on the patch_vertices and the TCS.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
93cd96b523 radeonsi: remove 1 draw packet order codepath, keep the first one
Multi-mode multi-draws will make it more complicated, so let's start with
simpler code.

I changed the order a little: I put the VBO update next to emit_draw_packets.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
808893ee69 radeonsi: cosmetic changes in si_emit_rasterizer_prim_state
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
dcd80d31cf radeonsi: set GS_STATE_OUTPRIM and PROVOKING_VTX_INDEX only when they change
This moves setting those registers from an unconditional place in draw_vbo
into si_set_rasterized_prim (for draw_vbo), si_update_rasterized_prim
(for bind_xx_shader), and si_bind_rs_state.

It's a little more complicated than expected.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
a070a09d00 radeonsi: precompute GS_OUT_PRIM in advance
We don't have to do it every draw now if the rasterized prim type
doesn't change.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
7144621e94 radeonsi: unify the logic that sets rast_prim
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
58539e976b radeonsi: move fixing ngg_culling into si_update_shaders
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
e5a9203159 radeonsi: remove the prim_restart_tri_strips_only option
Not used enough, no difference in performance for Dirt Rally on 6800.
Move the variable down.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
d8125427cd radeonsi: move *rs to its only use in si_draw
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
e19363a44e radeonsi: make the primitive type constant with tessellation
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Marek Olšák
89640f32e0 radeonsi: don't pass num_patches via derived_tess_state, pass it via si_context
This removes the parameter from si_emit_derived_tess_state and uses
si_context to pass it. This rework is needed for multi-mode draws
where num_patches will be needed much later.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18195>
2022-08-30 04:57:43 +00:00
Qiang Yu
ff7c59672f radeonsi: fix tcs_out_lds_offsets arg alignment
tcs_out_lds_offsets is not sure to be 16 byte aligned, it's
calculated like this:

  num_patches * patch_vertices * lshs_vertex_stride

num_patches and patch_vertices are not sure to be any value aligned,
lshs_vertex_stride is added one extra dword, so it's only 4 byte
aligned.

This may cause problem even before we switch to nir tess output
lower when write tess factor before read tail of input. But it's
more likely to cause problem after we switch to nir tess output
lower because the main body won't eliminate the low 4bit offset
but epilog will, so they use different offset to read/write tess
factor.

Fixes: 7598bfd768 ("radeonsi: replace llvm tcs output with nir lower pass")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7083
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18174>
2022-08-24 02:04:15 +00:00
Marek Olšák
a60181e8f2 radeonsi: use do..while loops and other cosmetic changes in display list path
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
e9a0cae1a1 radeonsi: use si_cp_dma_prefetch_inline for prefetching VBO descriptors
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
0e574c801c radeonsi: remove temporary si_context::vb_descriptor_user_sgprs
We were writing descriptors into si_context and then copying them into
the command buffer. Just write them into the command buffer directly.
Also set the pointer to VBO descriptors right after them.

When we start a new command buffer or we finish blitting, we no longer
restore precomputed VBO descriptors. Instead, we just reupload them again.
It's a compromise to have the common path simpler and faster (maybe).

This removes a lot of stuff. Now the VBO descriptor upload path looks
very similar to the display list path.

There was an accidental hidden optimization that is now documented as
"last_const_upload_buffer".

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
a5d37e161d radeonsi: remove vb_descriptors_gpu_list only used for debugging
While this is nice to have, it doesn't include VBO descriptors in user
SGPRs, and we need to remove it, so that we can simplify the VBO code.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
b4cef2487b radeonsi: add vertex buffers into the BO list in set_vertex_buffers
This is more straightforward. Also, radeon_add_to_buffer_list makes
writing VBO descriptors into the command buffer slower after that code
is reordered in following commits. This seems to be the only way that
isn't slower.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
c4ffac8a17 radeonsi: merge both fail paths in si_set_vb_descriptor
I removed the assertion because apps are allowed to set an offset greater
than the size.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
f129db911b radeonsi/gfx11: use a better workaround for the export conflict bug
This is recommended for better performance.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17864>
2022-08-03 00:57:16 +00:00
Marek Olšák
a791e7f37f radeonsi/gfx11: skip code in si_update_shaders that has no effect
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17864>
2022-08-03 00:57:16 +00:00
Marek Olšák
34196148c1 radeonsi/gfx11: use better PRIM_GRP_SIZE_GFX11 setting
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17864>
2022-08-03 00:57:16 +00:00
Pierre-Eric Pelloux-Prayer
af7c2ff842 radeonsi: check last_dirty_buf_counter and dirty_tex_counter
Check both counters in draw and compute, otherwise compute dispatches may
miss buffers invalidation.
This fixes the test case from https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/702
(both with and without GALLIUM_THREAD=0).

cc: mesa-stable

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17394>
2022-07-23 18:03:22 +00:00
Qiang Yu
a1763ad4b3 radeonsi: replace llvm based fixed tcs with nir
Create nir passthrough shader with explicit input/output and vertex
output count so that it can be handled by compiler same as user tcs.

The drawback is we create more si_shader_selector with different
input/output and vertex output count which was handled by compiler
backend before.

As fixed function tcs can be handled like user tcs, we don't need
the dedicated fixed_func_tcs_shader state either.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
2022-06-27 02:38:21 +00:00
Marek Olšák
a9f7744cfe radeonsi: rework how vs_state_bits is set and unpacked
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
2022-06-11 11:14:16 +00:00
Marek Olšák
c9c7dcb619 radeonsi: rename and regroup VS_STATE definitions
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
2022-06-11 11:14:16 +00:00
Marek Olšák
091617002f radeonsi: rework how VS_STATE_BITS are set for VS, TES, and GS
We need more GS/NGG bits, so we need to add current_gs_state for that.
This simplifies the logic in the draw code.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
2022-06-11 11:14:16 +00:00
Marek Olšák
4f3c74ddfb radeonsi: determine DB_SHADER_CONTROL in si_shader_ps
This is cleaner and more flexible.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
2022-06-11 11:14:16 +00:00
Marek Olšák
39800f0fa3 amd: change chip_class naming to "enum amd_gfx_level gfx_level"
This aligns the naming with PAL.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pellou-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16469>
2022-05-13 14:56:22 -04:00
Pierre-Eric Pelloux-Prayer
38e8a73e14 radeonsi: implement GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB in shaders
Statistics only work in non-NGG mode. If screen->use_ngg is true, we can't
know if the draw will actually use NGG or not, so this commit switch
to a shader based implementation of this counter.

To avoid modifying si_query, the shader implementation behaves like the hw
one: it uses the same buffer size and offset.

The emulation path activation in the shader is controlled by vs_state_bit[31].

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
2022-05-12 09:16:11 +02:00
Pierre-Eric Pelloux-Prayer
d3a5f411a3 radeonsi: implement pipeline stats workaround
DISABLE_INSTANCE_PACKING needs to be enabled when stats queries are
active to fix incorrect results.

We need to emit this for indexed and non-indexed draws.

Based on PAL's waDisableInstancePacking.

This fixes:
  KHR-GL46.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
2022-05-12 09:16:10 +02:00
Dave Airlie
14b1ed1ce1 radeonsi: port tess ring calcs to the common helper.
This uses the common helper code to implement the tess ring sizing.

One question is if radeonsi should be using tess_offchip_ring_offset
in some places it's using tess_factor_ring_size?

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16415>
2022-05-11 02:08:08 +00:00
Marek Olšák
e3b4e1fe85 radeonsi: inline si_cp_dma_prefetch in si_draw_vbo for lower overhead
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16328>
2022-05-10 04:29:55 +00:00
Marek Olšák
9fecac091f radeonsi/gfx11: scattered register deltas
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16328>
2022-05-10 04:29:55 +00:00