For instruction sequences that change the address register with every load
the current limit to bail out of the scheduler and reject the optimisation
was too tight, i.e. it was expected that at least one pending instruction
would be scheduled each time.
Give the scheduler more margin to sort out these load sequences by allowing
a number of rounds where no instruction is scheduled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106163
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This patch is based on
https://lists.freedesktop.org/archives/mesa-dev/2018-February/185805.html
Dave Airlie:
"A bunch of CTS tests led me to write
tests/shaders/ssa/fs-while-loop-rotate-value.shader_test
which r600/sb always fell over on.
GCM seems to move some of the copies into other basic blocks,
if we don't allow this to happen then it doesn't seem to schedule
them badly.
Everything I've read on SSA/phi copies say they have to happen
in parallel, so keeping them in the same basic block seems like
a good way to keep some of that property."
This patch differs from the one proposed by Dave in that it only adds
the NF_DONT_MOVE flag to copy_move instructions that are created by split_phi*
and that are located in loops.
Fixes piglit: tests/shaders/ssa/fs-while-loop-rotate-value.shader_test
(no regressions in the shader set). It also fixes all failing tests from
dEQP-GLES3.functional.shaders.loops.*
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This was mistakenly exposed, even though we want atomic counters to be
lowered to atomic ops on an SSBO like nearly every other GPU. Which
somehow recently started getting segfaults due to calling a null
pipe->set_hw_atomic_buffers().
Fixes a crash in stk, and probably other things.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This completely reworks the pass to support deref instructions and
delete support for old deref chains
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that it's rewritten for deref instructions, we can turn it back on.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's only used by the ir3 stand-alone compiler and Rob said we could
delete it.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
vc4+vc5 is not really effected by the deref chain to deref instr
conversion, so it no longer needs this pass. For others, now that
all the passes mesa/st uses are using deref instructions, push the
lowering to deref chains back into driver.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To simplify the transition, and make things bisectable, split out a
legacy copy or lower_samplers. This way the i965 and gallium drivers
can independently switch over to deref instructions.
Since the lower_samplers_as_deref pass is only used by gallium drivers,
it can be converted in lock-step with moving the lower_deref_instrs
pass, and so does not need a corresponding _legacy clone.
This legacy pass will be removed in a future commit.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This pass doesn't handle deref instructions yet. Making it handle both
legacy derefs and deref instructions would be painful. Since it's not
important for correctness, just disable it for now.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This inserts a call to nir_lower_deref_instrs at every call site of
glsl_to_nir, spirv_to_nir, and prog_to_nir.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
fix spilling regression introduced by 5428066f5e
this is just a minor mistake done while moving the code out into a new
function. The function contained a loop which might have been terminated
earlier and skipped setting noSpill to 1. After the refactoring it was always
set.
Fixes: 5428066f5e
("nv50/ir: make a copy of tex src if it's referenced multiple times")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
the format of the CLEAR_COLOR register doesn't depend on the target format
this fixes clear color when rendering to 32-bit RGBA and 16-bit targets
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
this patch adds support for a20x, which has some differences with a220:
-no VGT_MAX_VTX_INDX register
-no CLEAR_COLOR register
-set RB_BC_CONTROL in restore (hangs without)
-different CP_DRAW_INDX format
tested with kmscube and glmark2 scenes, on par with a220
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The offset field is 22 bit large.
11 bits are necessary because MaxVertexAttribRelativeOffset = 2047
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is reproducible on Stoney, but other chips may be affected too.
Cc 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Similar to the combined limit for VS+FS, there is an upper limit for
shader size to run from internel memory.
Signed-off-by: Rob Clark <robdclark@gmail.com>
As the function says - only the visual is changed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
RADV now requires LLVM 5.0 or greater, and thus we can't build dist
tarball because swr requires LLVM 4.0.
Let's bump required LLVM to 5.0 in swr too.
Fixes: f9eb1ef870 ("amd: remove support for LLVM 4.0")
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Bruce Cherniak <bruce.cherniak@intel.com>
This allows to avoid having to see garbage in Dying Light loading screen
at least, which probably expects Windows/NV behavior of all allocations
being zeroed by default.
Analogous to radv flag with the same name.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Avoids a branch and reduces code size a tiny bit:
text data bss dec hex filename
10804563 398653 2070368 13273584 ca89f0 /tmp/radeonsi_dri.so.old
10804499 398653 2070368 13273520 ca89b0 /tmp/radeonsi_dri.so
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The sampleMaskIn workaround (b936f4d1ca)
tries to figure out if the shader is running at per-sample frequency, but
there's a typo bug so it will only recognize per-sample linar inputs,
not per-sample perspective ones.
Spotted by Eric Engestrom <eric.engestrom@intel.com>
Fixes: b936f4d1ca0d2ab1e828a "r600: partly fix sampleMaskIn value"
Otherwise, a blit from separate stencil may fail to flush the job that
initialized it, or new drawing could fail to flush a blit reading from
stencil.
Fixes:
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_basic
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_scale
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only
dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8