This makes u_vbuf_get_minmax_index_mapped return min = 0 / max = 0
when info->count == 0.
That should never happen anyway, but this commit makes it at least
return a sane value that callers expect, and also allows us - and
GCC - to assume count != 0 for optimization purposes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050>
With this patch, GCC generates vectorized code that does the comparisons
without converting the indices to 32-bit first.
This optimization makes the aforementioned function almost twice as fast
for ARM NEON, and should speed up vectorised code on other platforms.
Without vectorisation, the function is still a percent or two faster,
but slightly larger.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050>
Based on the GL driver:
-Compute needs different opcode (this fixes a GPU hang problem)
-REG_A6XX_SP_IBO_LO/REG_A6XX_SP_CS_IBO_LO were swapped
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3104>
Current tu6_emit_border_color doesn't work for compute and there's no
example from the GL driver to base it on, so replace it with a finishme.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3104>
We can execute it unconditionally and the values computed for disabled
threads won't be used anyway.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
This fixes a bug with NGG that is probably harmless.
Basically, !is_monolithic makes the VS prolog emit
llvm.amdgcn.init.exec.from.input, which sets the EXEC mask to only enable
ES threads. In the NGG non-GS case, the GS threads <= ES threads, so it was
never an issue.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
gcc generates exceptionally bad code for panfrost_pack_work_groups_fused
otherwise ... although that routine is somehow still hot ...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3067>
lower_mul_2x32_64 generates mul_high opcodes, and lower_mul_high is done by
nir_lower_alu, so call nir_lower_alu after nir_opt_algebraic.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
If for some reason the fence associated with an image doesn't signal,
we're likely in a device lost scenario, we should report that error.
We can't really wait for a given amount of time because we could get a
timeout and that is not a valid error to report for vkQueuePresentKHR,
so just wait forever.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/830
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Was hoping to find potential issues but nothing. Still probably a good
idea.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I'm honestly unsure what this is for, but it's needed on MFBD systems
for unknown reasons, at least when MRT is actually in use and then
sometimes without MRT (it fixes a blend shader issue on T760?)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
Epilogues are special fixed-function blocks, so they need special
handling for liveness analysis to work completely. This in turns fixes
RA issues for many shaders using MRT.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
The flow is considerably more complicated. Instead of one writeout loop
like usual, we have a separate write loop for each render target. This
requires some scheduling shenanigans to get right.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
This is a branch, like discard, so we need a barrier to make it safe.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
We have to key the blend shader for the render target number due to
writeout silliness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>