That shouldn't change anything for VS as LS (or as ES) and for
TES as ES because radv_vs_output_info is only used by the last
vertex stage. So, if we have TES+GS, radv_vs_output_info for TES
will be overwritten by GS. This allows to decouple the shader info
pass from other stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18210>
Task shaders always use a ring, so this field was useless somehow.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18210>
This is probably a leftover when task shader has been reworked, but it
has no effect.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18210>
radv_nir_shader_info_pass() should run on individual shaders only, and
"linked" shader info should be done separately for better design.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18210>
This structure isn't really useful and it contains only one field.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18210>
Size is in bytes, not bits.
Fixes plenty of crashes in CI, like
dEQP-VK.synchronization.op.single_queue.event.write_image_fragment_read_image_tess_eval.image_128_r32_uint.
Fixes: 46f6e2ddbb ("aco: Implement storage image A16.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18266>
The following sequence would be broken if we don't re-emit viewports.
vkCmdSetViewport()
VkCmdBindPipeline(negative_one_to_one = false)
vkCmdDraw()
VkCmdBindPipeline(negative_one_to_one = true)
vkCmdDraw()
Found by inspection.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18245>
A better explanation for SP_HS_WAVE_INPUT_SIZE is that it is the size
of local memory to allocate per wave (which can be more than one
patch), in 256B units.
Then the maximum of 64 makes sense because only 16KB of local memory
is reserved for VS<->HS linkage.
The resulting formula matches the blob behaviour, even when
patch_control_points and tcs_vertices_out have different values,
while the past formula gave wrong answers on gen3+.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Suggested-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17957>
Mirrors 31835ac3b8 change in freedreno.
Together with "tu: Fix HS input size formula for gen3+" fixes following
tests from GL CTS running via Zink:
dEQP-GLES31.functional.tessellation.invariance.inner_triangle_set.quads_fractional_odd_spacing
dEQP-GLES31.functional.tessellation.invariance.inner_triangle_set.triangles_fractional_odd_spacing
dEQP-GLES31.functional.tessellation.invariance.primitive_set.triangles_fractional_odd_spacing_ccw
dEQP-GLES31.functional.tessellation.invariance.primitive_set.triangles_fractional_odd_spacing_cw
dEQP-GLES31.functional.tessellation.invariance.triangle_set.triangles_fractional_odd_spacing
dEQP-GLES31.functional.tessellation.primitive_discard.quads_fractional_odd_spacing_ccw
dEQP-GLES31.functional.tessellation.primitive_discard.quads_fractional_odd_spacing_cw
dEQP-GLES31.functional.tessellation.primitive_discard.triangles_fractional_odd_spacing_ccw
dEQP-GLES31.functional.tessellation.primitive_discard.triangles_fractional_odd_spacing_cw
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17957>
It can be useful not just to create functions, but also being able to
call them. This adds the spirv_builder-helper for this.
Cc: mesa-stable
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18244>
This type will be reused later on, so let's have the name describe what
is *is*, not what it's *used for*.
Cc: mesa-stable
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18244>
When SSBO instructions use constant address values the address loading
is immediately ready, scheduling the address loads early increases
the register pressure, so force a new instruction block to work around
this problem.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6975
Fixes: 79ca456b48
r600/sfn: rewrite NIR backend
v2: do handling in shader block to be thread save (hinted to by Filip)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18212>
Limit the number of tested instructions and the number of
ready instructions that might be taken into account.
This reduces the time needed to run the scheduler significantly.
Fixes: 79ca456b48
r600/sfn: rewrite NIR backend
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18212>
Export instructions allow burst writes, so it makes send to try
to allocate consecutive registers, but for ring writes we don't
schedule the outputs correctly to exploit this, so for now
don't mark these instructions as export to let the RA restart
picking colors.
When the scheduler starts to emit the ring writes in the right order
to allow for bust writes we might revisit this.
This fixes
spec@glsl-1.50@execution@variable-indexing@gs-output-array-vec4-index-wr
Fixes: 79ca456b48
r600/sfn: rewrite NIR backend
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6975
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18212>
Linear PE has already shown to have some rough corner cases in the hardware
and also has performance implications. Add a debug option to allow to disable
the feature, so users can more easily check if some issue is caused by this
feature.
CC: mesa-stable #22.2
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18232>
When linear rendering is used together with TS, the color tiles must be fully
contained in a single row of pixels. When wrapping around to the next row
TS gets confused and records wrong tile status information, leading to visual
corruption when the surface is resolved/decompressed.
The corruption can be fixed by increasing the stride alignment for linear
render targets, but that would break some existing use-cases, as some display
engines used together with Vivante GPUs currently don't support strides that
don't match the horizontal display resolution.
For now only enable linear PE rendering when the surface is properly aligned
already. This allows to use the optimization in a lot of common use-cases, but
falls back to the proven tiled rendering with subsequent resolve into linear
for the problematic cases.
CC: mesa-stable #22.2
Fixes: 53445284a4 ("etnaviv: add linear PE support")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18232>
The decision whether to use fast clear aka TS currently checks for two
feature bits: FAST_CEAR and MC20. We check for MC20, as TS on MC1.0 bypasses
the memory offset and we don't have any way to fixup the GPU address to
account for that. It could be done with some support of the kernel driver,
but then GPUs with MC1.0 are very rare to find these days, so not sure if we
are ever going to bother with that.
Instead of checking two separate feature bits to determine if TS can be used,
mask out the FAST_CLEAR bit from the features when MC20 isn't present. This
way we only have to check for a single feature bit.
CC: mesa-stable #22.2
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18232>
RMW context registers have been removed in RadeonSI a while ago
because they don't seem good for performance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18234>
Some dynamic states always need to be emitted when the first pipeline
is emitted, some others depend on pipeline state.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18234>
These fields aren't set at pipeline creation, so clearing them is
just useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18234>
This is useless because logic op is a dynamic state and it's already
emitted from the cmdbuf.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18234>
Port from radeonsi.
Besides vertex position based primitive culling, clipdist
attribute can also be used to cull a primitive. Normally
it's used by fixed-pipeline, but when NGG we can treate it
as a culling condition to filter out invisible primitive
before fixed-pipeline.
There are two kinds of clipdist:
1. user define a clip plane explicitly by glClipPlane(),
fixed-pipeline calculate with vertex position to get
clipdist, then cull. This is the legacy way.
2. Now GLSL define gl_ClipDistance/gl_CullDiatance so that
user can calculate clipdist in any way he like.
This implementation support both way.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
Port from radeonsi.
Cull primitive after GS thread and before final vertex/primitive
export. GS culling is like VS/TES culling which read out saved
vertex positions of a primitive from LDS then call the primitive
culling algorithm to check whether it's visiable or not, only
passed primitives will be exported.
Unlike the VS/TES culling that read vertex index of a primitive
from VGPRs as shader args, GS will set a primitive complete flag
for each last vertex of a primitive in LDS, so that vertex thread
know the previous 1/2/3 vertex can form a primitive and do primitive
culling.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
radeonsi has different driver_location and io location.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
radeonsi does not have io nir variables, so need to save output
bit size when lower store_output intrinsic.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
driver_location and io location are different for radeonsi,
and radeonsi llvm rely on the correct driver_location to
index output variables.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
Simplify: 64bit IO has been lowered by nir_lower_io with
nir_lower_io_lower_64bit_to_32, so no need to handle in the
ngg lower.
Fix: we need to increase io_sem.location by base_offset for
correct gs_output_info.
radeonsi has different driver_location and io location, so
also change the output variable index to io location.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
Make accept_func optional, and return accpect result for caller
react when primitive is rejected.
This is for GS culling.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>