Changes:
- disallow NGG culling for GS, fast launch for tess using template args
(GS can't do NGG culling, tess can't do fast launch)
- skip checking current_rast_prim with tessellation
(bake the condition into ngg_cull_vert_threshold)
- use only 1 vertex count threshold for enabling NGG shader culling
to simplify it. I think it doesn't have a big impact. The threshold
computation depends on more parameters than just fast launch.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8434>
so that we don't have to enter the state emit loop and invoke the more
complicated function si_emit_graphics_shader_pointers.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8794>
Move statements that use the least number of local variables as close
to the beginning as possible. Also move local variables closer to their use.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8794>
Move statements that use the least number of local variables as close
to the beginning as possible. Also move local variables closer to their use.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8794>
When using the prefetch with VS_ONLY=true followed by VS_ONLY=false,
we tested the VS_ONLY bits in the mask when executing VS_ONLY=false where
the bits were always 0. It's also useless to clear the prefetch mask when
VS_ONLY=true.
This commit skips those tests by splitting the function properly using
BEFORE_DRAW and AFTER_DRAW template parameters.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8794>
Using event identifiers allows to add a bit more context to the RGP trace.
Without this all draw calls are identified as vkCmdDraw.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8746>
This looks unnecessary, but the next commit will build upon it and add
more stuff into the function.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8653>
This decreases the release libgallium_dri.so size without debug symbols
by 16384 bytes. The CPU time spent in si_emit_draw_packets decreased
from 4.5% to 4.1% in viewperf13/catia/plane01.
The previous code did:
cs->current.buf[cs->current.cdw++] = ...;
cs->current.buf[cs->current.cdw++] = ...;
cs->current.buf[cs->current.cdw++] = ...;
cs->current.buf[cs->current.cdw++] = ...;
The new code does:
unsigned num = cs->current.cdw;
uint32_t *buf = cs->current.buf;
buf[num++] = ...;
buf[num++] = ...;
buf[num++] = ...;
buf[num++] = ...;
cs->current.cdw = num;
The code is the same (radeon_emit is redefined as a macro) except that
all set and emit functions must be surrounded by radeon_begin(cs) and
radeon_end().
radeon_packets_added() returns whether there has been any new packets added
since radeon_begin.
radeon_end_update_context_roll(sctx) sets sctx->context_roll = true
if there has been any new packets added since radeon_begin.
For now, the "cs" parameter is intentionally unused in radeon_emit and
radeon_emit_array.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8653>
for lower register pressure, though I haven't measured this.
si_draw_vbo will be handled in a future commit.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8600>
It's incorrect because si_get_vs_state returns gs_copy_shader for legacy
GS. It was harmless, but let's use si_get_vs, which is simpler.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8548>
This radically simplifies the code to decrease CPU overhead in si_draw_vbo.
The generic CP DMA copy function is too complicated.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8548>
This is a great candidate for a template. There are a lot of conditions
that are already templated in si_draw_vbo.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8548>
It's probably not needed and we also have draw merging on gfx10,
so we should be able to use total_driver_count in theory.
(I may be wrong, but I don't know if having avg_direct_count really
improves anything)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8548>