The view index user sgpr wasn't being accounted for properly,
this refactors out the code to decide if it's required and then
uses that info to account for it.
Fixes: 180c1b924e (ac/nir: Add shader support for multiviews.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This shares more code and calls the new shared load_tess_varyings()
abi so that the radeonsi nir path now supports tcs output loads.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The code to load outputs is essentially the same as load inputs
so we make the interface more generic to maximise code sharing.
We will make use of the new support in the following patch.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is an optimisation that is recommended by Matt Arsenault,
and used by RadeonSI, but it's not compatible with Vulkan.
Note that AC_FLOAT_MODE_UNSAFE_FP_MATH includes the no signed
zeros flag in LLVM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When that debug option is not used, we use the default float mode
because the no signed zeros optimisation is not Vulkan compatible.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This also replaces llvm.AMDGPU.kilp by llvm.AMDGPU.kill with
LLVM < 6. Similar to RadeonSI codepath.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This can't work for two reasons:
- TESSINNER/TESSOUTER are shader input values, so never translated
to the intrinsic ops
- the shader info pass scans the current stage but we want to know
in TCS, if TES reads the tess factors.
This fixes 6 regressions related to
deqp-vk/tessellation/shader_input_output/tess_level_{inner,outer}_XXX_tes
This reverts commit 5ba1a61648.
InstanceID is in VGPR2, not 1.
One more failure that CTS didn't catch up...
Reported-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This shouldn't be scanned in the pipeline.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes a number of int64 piglit tests, for example:
generated_tests/spec/arb_gpu_shader_int64/execution/built-in-functions/fs-sign-i64vec2.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
V2: just zero-extend the 32-bit value.
Fixes a number of int64 piglet tests, for example:
generated_tests/spec/arb_gpu_shader_int64/execution/conversion/frag-conversion-explicit-bool-int64_t.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Running dota2 since the below commit crashes with an llvm assert.
Trim the vector like the other user. This possible could also be
avoided by not padding inside the load vec3->vec4.
Fixes: 41c36c4549 (amd/common: use ac_build_buffer_load() for emitting UBO loads)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Without this we end up with the llvm error message:
"Both operands to a binary operator are not of the same type!"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Without this we end up with the llvm error message:
"Both operands to a binary operator are not of the same type!"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Without this we end up with the llvm error message:
"Both operands to a binary operator are not of the same type!"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes the follow test for radeonsi nir:
tests/spec/arb_tessellation_shader/execution/quads.shader_test
Also stops 8 other tests from crashing, they now just fail e.g.
tcs-output-array-float-index-rd-after-barrier.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If they were promoted from inputs/outputs, they could have a
non-zero value left over, which messed with our store handling.
Fixes: 06f05040eb "radv: Link shaders."
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
It makes more sense to rely on nir_intrinsic_load_push_constant
instead of the pipeline layout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
nir_to_llvm_context will always be NULL for radeonsi so we need
work around this.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes the following piglit tests in radeonsi:
vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tcs-tes-tessinner-tessouter-inputs-tris.shader_test
vs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tes-tessinner-tessouter-inputs-tris.shader_test
v2: make use of si_shader_io_get_unique_index_patch()
via the helper in the previous patch rather than
shader_io_get_unique_index()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Similar to RadeonSI.
This fixes:
dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When allocate_user_sgprs() was called, ctx->stage was actually
unset and 0 is for the vertex shader. This doesn't change
anything for now because of the spill support thing.
Though, the number of user SGPRs has to be fixed for merged
shaders on GFX9. It was broken before anyway.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This might decrease VGPR spilling, because we no longer
have to use v4i32 for 2D fetches when level == 0. We now
use v2i32 for those cases.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It makes more sense to move all scan stuff in the same place.
Also, we don't really need to duplicate the uses_primid field
for each stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
... to set_vs_specific_input_locs().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The idea is to clean up the add arguments logic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>