The primitive indices have to be swapped to follow the drawing
order.
This fixes corruption with Overwatch when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Do not need to declare it for other stages because this is for
streamout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This exposes what's required for DX and this is what we already
configure. The driver flushes denorms for FP32 and preserves them
for FP16/FP64. Note that we can't allow both preserving and
flushing denorms because this won't work for merged shaders. This
will require LLVM to update the float mode register to make it work.
Only enabled on GFX8+ with the LLVM path because it's untested on
previous chips and ACO doesn't support it.
This extension is required for SPIRV 1.4.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The number of vertices has to be adjusted with the output primitive
type.
This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The GS outputs are stored differently in the LDS storage, they
are indexed by out_idx which is incremented for each stored DWORD.
Thus, we need a different path for exporting the stream outputs.
This fixes a bunch of CTS failures when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's unnecessary to store/load more components that needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The LDS storage allocated for stream outputs is 4 * N, where N
is the number of outputs. So, we have to store/load with N as index
and not with the output location as index.
This doesn't fix anything known but it should fix out-of-bounds
access and it also reduces the number of outputs written to the
LDS storage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Seems to fix a hang with excessive vertex emissions when NGG is used for
GS.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We have to load 2 32-bit integer and to cast correctly.
This fixes crashes with gs-double-interpolator.vk_shader_test.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111734
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f8d0337299 ('radv: add multiple streams support for the GS copy shader')
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It's still disabled by default because transform feedback randomly
hangs and it seems like it's related to GDS (cf. RadeonSI).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Streamout outputs are stored in the ESGS ring.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It needs more space for multiple streams.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This internal option is turned off by default because NGG streamout
still hangs. It seems like it's related to GDS as RadeonSI.
That option will be turned on once all issues are resolved.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
streamout_buffers is assigned after that function, so the previous
fix was completely wrong. This probably fix something when streamout
buffers and push constants are used/inlined in the same shader.
Fixes: 378e2d2414 ("radv: fix computing number of user SGPRs for streamout buffers")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes some interactions when NGG GS is enabled. It fixes:
- dEQP-VK.clipping.user_defined.clip_cull_distance_dynamic_index.*geom*
- dEQP-VK.tessellation.geometry_interaction.passthrough.*
For some reasons, using the computed ESGS ring size randomly hangs
with CTS. For now, just use the maximum LDS size for ESGS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This shouldn't be in NIR->LLVM because ACO also needs the shader
info. This will also help for computing some NGG values that are
necessary for declaring LDS symbols.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only the pipeline layout and the shader keys are needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This doesn't fix anything known but it's correct now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It shouldn't matter much because output varyings should have been
compacted during NIR shader linking but it mirrors what the driver
does when emitting NGG GS vertex parameters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the fragment shader needs the layer index, we have to allocate
one more dword in the NGG GS storage. Found by inspection. This
doesn't fix anything known.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Having two different structs is useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The codegen handles it and it adds the correct casts. This fixes
a bunch of LLVM validation errors when enabling Wave32 for compute.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This avoids multiple copies for nothing and it's more elegant.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The workaround was entirely in common code, and it's needed in radeonsi
too so just always do it when necessary. Fixes
KHR-GL45.shader_image_load_store.advanced-allStages-oneImage on gfx9
with LLVM 8.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Now that LLVM 9 will be released soon, we will only support
LLVM 8, 9 and master (10).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead of having the three values everywhere. This is also more
future proof if we want the driver to make those decisions eventually.
Reviewed-by: Dave Airlie <airlied@redhat.com>
When the application does not ask for robust buffer access.
Only implemented the check in radv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It can be enabled with RADV_PERFTEST=gewave32.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>