This will be used by SQTT to implement a workaround on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20529>
This allows ACO to combine the addition into the store without checking
for wraparound.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20296>
Instead let each driver call it.
radeonsi ignores the error because it doesn't require correct
pci-bus info to work properly.
radv keeps the existing behavior and fails if the pci-bus infos
is missing.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20645>
And add a validity flag because there's no way to
tell if they're valid, unless for the caller of
drmGetDevice2.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20645>
Gather outputs in advance to save both output data and type. Output data
is used for streamout and gfx11 param export. Output type is used for
streamout latter.
The output info will also be used for nir vertex export in the future.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Singed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
When slot has no param offset, we should not emit store output for
them on gfx11.
Fixes: abe2e99e9e ("ac/nir/ngg: gs support 16bit outputs")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
Follow the spec, all outputs even not this stream need to be
reset after emit vertex.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
Because we called nir_lower_io_to_temporaries which ensure the
store output and emit vertex in the same block.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
This is for VARYING_SLOT_VARx_16BIT slots varying streamout.
OpenGL ES will store 16bit medium precision varying to these slots.
Vulkan is not allowed to streamout varying less than 32bit.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20157>
GFX11 can have more than 4 SEs. I think it would be better to allocate
an array but that's for later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20337>
A bit count of es_accepted works for both when ngg is and isn't
dynamically enabled. Unlike the other sequence, this should only be a
single SALU instruction.
fossil-db (gfx1100, nggc):
Totals from 41388 (30.75% of 134574) affected shaders:
Instrs: 25783544 -> 25432959 (-1.36%); split: -1.36%, +0.00%
CodeSize: 127281160 -> 125878820 (-1.10%); split: -1.10%, +0.00%
Latency: 92849566 -> 92723047 (-0.14%); split: -0.14%, +0.00%
InvThroughput: 9542194 -> 9485012 (-0.60%); split: -0.60%, +0.00%
Copies: 2031074 -> 1928796 (-5.04%); split: -5.04%, +0.00%
Branches: 642407 -> 642409 (+0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20321>
No shader-db or fossil-db changes on any Intel platform.
v2: Add missed i2b1 in ir3_nir_opt_preamble.c.
v3: Add missed i2b1 in ac_nir_lower_ngg.c.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
For legacy (non-NGG) GS to lower outputs to memory stores and add
shader query when required.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20158>
To be shared by NGG GS and legacy GS. Legacy GS need this when
GFX10 which mix use NGG and legacy GS. For example when streamout
is enabled, it uses legacy GS, otherwise uses NGG GS. So legacy
GS also need to update query emulation which is a sum of NGG and
legacy GS results.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20158>
Before this commit each stream will emit a query block, now
we merge them to a single block.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20074>
More precise type info, can be used for 16bit output streamout
to convert 16bit int/uint/float to 32bit one later.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19697>
Max slot number for 16bit output is 16, so no need to use
64 array size for them.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19697>
pipe_stream_output_info is built from nir_xfb_info, why not just use
nir_xfb_info directly.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20015>
Device memory scope is necessary because we need to ensure there is
always a waitcnt_vscnt instruction in order to avoid a race condition
between payload stores and their loads after mesh shaders launch.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19967>
radeonsi use packed location base while radv use un-packed location.
So we adjust instance_rate_inputs in each driver to hide the difference.
Note the attribute slot number is less than 16, so we can shift
instance_rate_inputs in radv by VERT_ATTRIB_GENERIC0 which is 16.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19868>
Add negative value is possible to wrap around. I haven't seen this
"nuw" causes any problem yet, but let's remove it for safe.
Fixes: 60ac5dda82 ("ac: Add NIR lowering for NGG GS.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19718>
We should not use "nuw" here as negative add positive may wrap
around (negative is 0xffffff??).
This problem can be observed with LLVM15 (I can't see when LLVM14):
%.neg = mul nsw i32 %31, -4
%163 = add nuw nsw i32 %.neg, 16
%164 = lshr i32 257, %.neg
%165 = lshr i32 %164, %163
LLVM just assume %.neg is possitive, so pre-shift 0x01010101 by 16.
This get wrong value because we can't get back the shifted bits with
a negative shift right.
Fixes: 75dbb40439 ("ac/nir: Remove byte permute from prefix sum of the repack sequence.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19718>
Outputs are always considered 32-bits.
Found by inspection.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19612>
XFB queries need to be enabled with NGG streamout and VS/TES.
Previously, the NGG lowering code relied on has_prim_query for XFB.
This fixes failures with RADV_PERFTEST=ngg_streamout on GFX10.3 with
the vkd3d-proton testsuite. Vulkan CTS is missing TES tests with XFB
queries apparently.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19493>