Commit graph

668 commits

Author SHA1 Message Date
Marek Olšák
66f254b4e6 radeonsi,radv: fix a late alloc deadlock with <= 6 CUs per SA
We should always prevent 1 CU from executing VS and GS waves
to prevent a deadlock.

Fixes: c377f45c18 "radeonsi/gfx10: rewrite late alloc computation"

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11754>
2021-07-08 18:37:41 +00:00
Marek Olšák
786678a017 radeonsi: restructure si_get_vs_vgpr_comp_cnt for readability
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11102>
2021-06-21 19:03:29 +00:00
Mike Blumenkrantz
a3a6611e96 util/queue: add a global data pointer for the queue object
this better enables object-specific (e.g., context) queues where the owner
of the queue will always be needed and various pointers will be passed in
for tasks

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11312>
2021-06-16 15:10:09 -04:00
Marek Olšák
a0fcd37731 radeonsi: remove a twice duplicated workaround for VERT_GRP_SIZE
This enables better lane occupancy.

Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>
2021-05-25 16:15:44 +00:00
Marek Olšák
c8e8979d6b radeonsi: fix the fast launch vert/prim thread counts if they are trimmed
This fixes the case when the counts were out of sync because one of them
was decreased.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>
2021-05-25 16:15:44 +00:00
Marek Olšák
0e8100bf58 radeonsi: simplify the NGG culling vertex count heuristic
This removes another chip-specific switch.
It enables a lower threshold on Navi1x, which should be fine.

Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10878>
2021-05-24 17:41:34 +00:00
Samuel Pitoiset
726cb2d6f6 ac: ac_gpu_info::has_vgt_flush_ngg_legacy_bug
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10911>
2021-05-21 19:46:56 +00:00
Marek Olšák
c53f25b668 radeonsi: kill 16-bit VS outputs if PS doesn't use them or doing Z-only draw
The kill_outputs logic uses our internal IO indices. Just add indices for
16-bit varyings. We don't have enough free indices to use, but we can reuse
the indices that GLES doesn't have. Those are all the legacy desktop GL
varyings.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9051>
2021-04-13 21:10:43 -04:00
Marek Olšák
7db43960f6 radeonsi: implement 16-bit VS->PS varyings
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9051>
2021-04-13 21:10:43 -04:00
Pierre-Eric Pelloux-Prayer
8c6a64c9b0 radeonsi/rgp: export compute shader programs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10105>
2021-04-12 14:27:29 +02:00
Axel Davy
ff6f11acdc radeonsi: fix leak when the in-memory cache is full
When the hw_binary is not put in the in-memory
cache it must be freed.

Fixes: 8283ed65cf ("radeonsi: Limit the size of the in-memory shader cache")

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9587>
2021-03-17 21:05:06 +00:00
Axel Davy
8283ed65cf radeonsi: Limit the size of the in-memory shader cache
The in-memory shader cache can get significantly
huge in some rare cases.
Limit its size to 64MB on 32 bits, and 1GB else.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9578>
2021-03-13 21:51:38 +00:00
Dave Airlie
8027a7ba8a shader_info: convert textures_used to a bitset.
For now keep it a bitset of 1 32-bit dword.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9456>
2021-03-10 06:16:09 +10:00
Pierre-Eric Pelloux-Prayer
c276bde34a radeonsi/sqtt: export shader code to RGP
With these changes the shader code is visible in RGP.

Vk pipeline feature is emulated using si_update_shaders: when shaders are
updated we compute a sha1 of their code and use it as a pipeline hash.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Pierre-Eric Pelloux-Prayer
0e97d817f5 radeonsi: properly set SPI_SHADER_PGM_HI_ES
When not using S_00B324_MEM_BASE the value isn't properly truncated.

Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Marek Olšák
c97ebe1461 radeonsi: don't index si_context::shaders with enum gl_shader_stage
Fixes: a8373b3d38 "radeonsi: store si_context::xxx_shader members in union"

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9313>
2021-03-02 01:14:44 +00:00
Marek Olšák
8288882965 radeonsi: set MEM_ORDERED optimally
It must be 1 only if both sampler and non-sampler VMEM instructions
that return something are used. BVH counts as a sampler instruction.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9028>
2021-02-17 04:49:24 -05:00
Pierre-Eric Pelloux-Prayer
a8373b3d38 radeonsi: store si_context::xxx_shader members in union
This allows to access them individually (sctx->shader.ps) or
using array indexing (sctx->shaders[PIPE_SHADER_FRAGMENT]).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8869>
2021-02-17 09:11:46 +00:00
Marek Olšák
61fd8fc10b radeonsi: skip s_sendmsg(gs_alloc_req) for NGG passthrough on new chips
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8892>
2021-02-13 04:56:05 +00:00
Marek Olšák
34114e1dcb radeonsi: tune NGG shader culling vertex threshold for each chip
These are based on my testing and estimation.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8434>
2021-02-02 05:42:32 +00:00
Marek Olšák
ffbf3a5f8b radeonsi: simplify the NGG culling condition in si_draw_vbo
Changes:

- disallow NGG culling for GS, fast launch for tess using template args
  (GS can't do NGG culling, tess can't do fast launch)

- skip checking current_rast_prim with tessellation
  (bake the condition into ngg_cull_vert_threshold)

- use only 1 vertex count threshold for enabling NGG shader culling
  to simplify it. I think it doesn't have a big impact. The threshold
  computation depends on more parameters than just fast launch.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8434>
2021-02-02 05:42:32 +00:00
Marek Olšák
7581743510 radeonsi: set current_rast_prim at bind time for tess and GS
It doesn't have to be done in draw_vbo.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8434>
2021-02-02 05:42:32 +00:00
Marek Olšák
11293d71f2 radeonsi: delete si_pm4_delete_state
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8794>
2021-01-30 15:41:23 -05:00
Marek Olšák
1a2dde8f86 radeonsi: add internal blitter_running flag
to skip the indirection in si_decompress_textures

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8653>
2021-01-22 16:45:30 +00:00
Marek Olšák
a0978fffb8 radeonsi: add new possibly faster command submission helpers
This decreases the release libgallium_dri.so size without debug symbols
by 16384 bytes. The CPU time spent in si_emit_draw_packets decreased
from 4.5% to 4.1% in viewperf13/catia/plane01.

The previous code did:
    cs->current.buf[cs->current.cdw++] = ...;
    cs->current.buf[cs->current.cdw++] = ...;
    cs->current.buf[cs->current.cdw++] = ...;
    cs->current.buf[cs->current.cdw++] = ...;

The new code does:
    unsigned num = cs->current.cdw;
    uint32_t *buf = cs->current.buf;
    buf[num++] = ...;
    buf[num++] = ...;
    buf[num++] = ...;
    buf[num++] = ...;
    cs->current.cdw = num;

The code is the same (radeon_emit is redefined as a macro) except that
all set and emit functions must be surrounded by radeon_begin(cs) and
radeon_end().

radeon_packets_added() returns whether there has been any new packets added
since radeon_begin.

radeon_end_update_context_roll(sctx) sets sctx->context_roll = true
if there has been any new packets added since radeon_begin.

For now, the "cs" parameter is intentionally unused in radeon_emit and
radeon_emit_array.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8653>
2021-01-22 16:45:29 +00:00
Marek Olšák
76d6351dab radeonsi: don't validate inlinable uniforms at draw time
Let's trust the state tracker that it sets inlinable uniforms only
when shaders can use them.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8600>
2021-01-20 21:53:13 +00:00
Marek Olšák
c5d3341b6e radeonsi: inline the last use of si_get_vs_state
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8600>
2021-01-20 21:53:13 +00:00
Marek Olšák
f1e34f125d radeonsi: don't use si_get_vs_state in most places
It's incorrect because si_get_vs_state returns gs_copy_shader for legacy
GS. It was harmless, but let's use si_get_vs, which is simpler.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8548>
2021-01-18 01:17:19 +00:00
Marek Olšák
f2a5148701 radeonsi: make sctx->vertex_elements always non-NULL
Bind a state with 0 vertex elements there.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8548>
2021-01-18 01:17:19 +00:00
Marek Olšák
62703b79a5 radeonsi: remove si_gs_prolog_bits::gfx9_prev_is_vs
It didn't do anything useful. GS doesn't use the other user SGPRs.
If we decrease the number of user SGPRs we declare for the GS prolog,
we can remove gfx9_prev_is_vs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8344>
2021-01-06 23:28:04 -05:00
Marek Olšák
b6b6d1ff3c radeonsi: fix hang caused by for loop with exec=0 in LS and ES
LLVM expects that exec != 0 when entering loops and generates this code
that becomes an infinite loop if exec == 0:

BB5_1:
    vcc_lo = (inverted terminating condition)
    s_and_b32 vcc_lo, exec_lo, vcc_lo
    s_cbranch_vccnz BB5_3    // jump if vcc != 0 (break statement)
    // ... loop body ...
    s_branch BB5_1
BB5_3:

For non-monolithic VS before TCS, VS before GS, and TES before GS,
we set exec = (thread enabledmask), which sets 0 for HS-only and GS-only
waves, causing the infinite loop condition above.

Fix it as follows:
- set exec = ~0 at the beginning
- wrap the whole shader (LS and ES) in a conditional block, so that HS-only
  and GS-only waves jump over it and never enter such a loop

The TES before GS hang can be reproduced by gfxbench:
    testfw_app --gfx egl -w 1920 -h 1080 --gl_api gles -t gl_tess

Fixes: 68d6d097f1 - radeonsi/gfx9: add GFX9 and VEGA10 enums

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8344>
2021-01-06 23:28:01 -05:00
Yogesh mohan marimuthu
8a22fc9502 radeonsi: enable vrs2x2 coarse shading if flat shading (v9)
Enable vrs2x2 coarse shading if flat shading as per
idea and guidance given by Marek.

is_flat_shading variable in struct si_shader_info is set
based on the data from gather_intrinsic_info() function
and struct si_state_rasterizer. If is_flat_shading_variable
is set, then in function si_emit_db_render_state() vrs2x2
shading is enabled in hardware.

v2: Fix review comments from Pierre-Eric. Code optimizations.
v3: Fix indentation style issue.
v4: Fix review comments from Marek. Fixed logical issue pointed
    by Marek where info->is_flat_shading variable can be corrupted
    and other code cleanup.
v5: Make the code compact as suggested by Pierre-Eric.
v6: Fix new review comments from Marek.
v7: use info->uses_interp_color variable fix from Marek.
v8: Fix coding style comment from Marek.
v9: Add uses_fbfetch_output check as suggested by Marek.

Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8161>
2021-01-06 10:12:10 +05:30
Marek Olšák
76eb3478cf radeonsi: take color interpolation into account for shader variants
Fixes:
- Sample shading now uses per-sample interpolation for colors if colors
  are the only inputs. (this is the only case that was broken)

Optimizations:
- BC_OPTIMIZE (barycentric optimization) is now enabled with MSAA if colors
  are qualified with both center and centroid. (BC_OPTIMIZE means that
  the hardware skips initializing centroid (i,j) if they are equal to
  center (i,j))
- If MSAA is disabled and at least 2 out of (center, centroid, sample) are
  used by all inputs now including colors, center is forced for all inputs.
- If INTERP_MODE_COLOR is not used and the legacy GL shade model is flat,
  the shader variant for flat shading is not generated.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8225>
2021-01-05 02:43:55 +00:00
Marek Olšák
dffc27e5e1 radeonsi: fix small primitive culling with MSAA force-disabled and smoothing
The problem was that the shader constants were based on the framebuffer
sample count and ignored the multisample enable state and the line/polygon
smoothing state, which uses MSAA rasterization that only sets SampleMaskIn
to get the coverage for alpha-blended smoothing (the PS epilog computes
the alpha channel from SampleMaskIn and blending generates the AA results).

- This is a complete rework that adds a new state for NGG cull constants.
- It fixes the same thing for the prim discard compute shader.
- It documents how VS_STATE.SMALL_PRIM_PRECISION is encoded.

It fixes blue corruption in Unigine Heaven with MSAA and Medium details
or better.

Fixes: 7648060dc0 - radeonsi: enable NGG culling by default on gfx10.3 dGPUs

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8022>
2020-12-16 00:43:45 -05:00
Marek Olšák
2b09bde1f5 radeonsi: use a C++ template to decrease draw_vbo overhead by 13 %
With GALLIUM_THREAD=0 to disable draw merging.

Before:
   1, DrawElements ( 1 VBO| 0 UBO|  0    ) w/ no state change,                 8736

After:
   1, DrawElements ( 1 VBO| 0 UBO|  0    ) w/ no state change,                10059

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7807>
2020-12-09 16:01:32 -05:00
Marek Olšák
3bd9db5be3 r300,r600,radeonsi: inline struct radeon_cmdbuf to remove dereferences
It's straightforward except that the amdgpu winsys had to be cleaned up
to allow this.

radeon_cmdbuf is inlined and optionally the winsys can save the pointer
to it. radeon_cmdbuf::priv points to the winsys cs structure.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7907>
2020-12-05 10:52:17 -05:00
Marek Olšák
c7470c1760 radeonsi: don't set DrawID and StartInstance if they are unused
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
2020-12-01 15:33:03 -05:00
Marek Olšák
69c927debe radeonsi: disable WGP mode on gfx10.3 to prevent hangs
I think that reducing the CU mask to 1 disabled CU per SA broke the WGP mode
on VanGogh, causing a hang. To be sure, disable it on all chips.

Fixes: 9538b9a68e - radeonsi: add support for Sienna Cichlid

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
2020-12-01 15:33:03 -05:00
Marek Olšák
bbad432e96 radeonsi: eliminate shader code for disabled or masked color outputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
2020-12-01 15:33:03 -05:00
Marek Olšák
509142876b radeonsi: add AMD_DEBUG=nofastlaunch for debugging
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
2020-12-01 15:33:03 -05:00
Marek Olšák
de799b2270 radeonsi: enable NGG and NGG culling on gfx10.3 APUs by default
VanGogh benefits.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
2020-12-01 15:33:03 -05:00
Pierre-Eric Pelloux-Prayer
0b3bd7c516 radeonsi/gfx10: flush gfx cs on ngg -> legacy transition
with a sequence like this:

  glClear(STENCIL)
  glBeginTransformFeedback()
  ...
  glEndTransformFeedback()
  glClear(STENCIL)

The second clear sometimes may produce an unexpected result.

Calling si_flush_gfx_cs() when doing ngg -> legacy transition seems to be a
valid workaround (both for the synthetic reproducer and the real Blender bug).

Using flush flags or events (BOTTOM_OF_PIPE_TS, RESET_TO_LOWEST_VGT) didn't help.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2941
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7750>
2020-11-26 10:19:26 +01:00
Marek Olšák
61fe66a2e4 radeonsi: pass VS->TCS IO via VGPRs if VS and TCS have the same thread count
It can only be done if a TCS input is accessed without indirect indexing and
with gl_InvocationID as the vertex index, and the number of VS and TCS threads
is the same.

This eliminates LDS stores and loads for VS->TCS IO, reducing shader lifetime
and LDS traffic.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7623>
2020-11-23 02:22:21 +00:00
Marek Olšák
1190808eca radeonsi: if VS and TCS have the same number of threads, merge the conditonals
Instead of:
    if (VS) {
	VS;
    }
    if (TCS) {
	TCS;
    }

Do this if the number of threads is the same in VS and TCS:
    exec = enabled_threads;
    VS;
    TCS;

Skipping declare_vb_descriptor_input_sgprs is needed to match the VS return
values.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7623>
2020-11-23 02:22:21 +00:00
Marek Olšák
be905b74f7 radeonsi: don't add num_vbos_in_user_sgprs to the shader cache key for non-VS
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7542>
2020-11-18 06:19:59 +00:00
Marek Olšák
025bc9e50e radeonsi: add options.inline_uniforms to the shader cache key
It affects how shaders are finalized before caching.

Fixes: b7501184b9 ("radeonsi: implement inlinable uniforms")

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7542>
2020-11-18 06:19:59 +00:00
Marek Olšák
c4ebdf9ee7 radeonsi: do VGT_FLUSH when switching NGG -> legacy on Sienna Cichlid
Other chips don't need this.

Fixes: 9538b9a68e - radeonsi: add support for Sienna Cichlid

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7542>
2020-11-18 06:19:58 +00:00
Marek Olšák
c3432ad852 radeonsi: add an option to enable 2x2 coarse shading for non-GUI elements
This is for experiments with VRS.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7646>
2020-11-17 22:16:19 +00:00
Marek Olšák
baa5807e36 nir: rename needs_helper_invocations to needs_quad_helper_invocations
This indicates that only quad operations use helper invocations.
Also handle quad_swizzle_amd.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7586>
2020-11-12 21:02:05 +00:00
Marek Olšák
b7501184b9 radeonsi: implement inlinable uniforms
This improves performance for uber shaders.

It must be enabled using the new driconf option.

The driver compiles the specialized shaders in another thread without stalls,
same as all other optimizations.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7057>
2020-10-30 11:07:22 +00:00