Cc: 10.6 10.5 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christain.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The first argument to UCMP needs to be compared against 0, but the
latter arguments are treated as float and need to be able to properly
apply neg/abs arguments. Adjust the inferSrcType function accordingly.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Same problem and fix as for nouveau's ZaphodHeads trouble.
See patch ...
"nouveau: Use dup fd as key in drm-winsys hash table to fix ZaphodHeads."
... for reference.
Cc: "10.3 10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If an instruction using address register value gets eliminated, we need
to remove it from the indirects list, otherwise it causes mayhem in
sched for scheduling address register usage.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A handful of fixes and cleanups:
1) If we split addr/pred, we need the newly created instruction to
end up in the unscheduled_list
2) Avoid scheduling a write to the address register if there is no
instruction using the address register that is otherwise ready
to schedule. Note that I currently don't bother with the same
logic for predicate register, since the only instructions using
predicate (br/kill) don't take any other src registers, so this
situation should not arise.
3) few other cosmetic cleanups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
cp would update instr->address but not update the indirects array
resulting in sched getting confused when it had to 'spill' the address
register. Add an ir3_instr_set_address() helper to set instr->address
and also update ir->indirects, and update all places that were writing
instr->address to use helper instead.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to distinguish a shader that has separate writes to each MRT
from one which is supposed to write the data from MRT 0 to all the MRTs.
In TGSI this is done with a property. NIR doesn't have that, so encode
it as a funny location and decode on the other end.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the immediate form, src2 == dst, so it does not need to be emitted.
Otherwise it overlaps with the immediate value's low bits.
Fixes: 09ee907266 (nv50/ir: Fold IMM into MAD)
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Prefer blit-based texture transfers only if the chip has dedicated VRAM
since it would translate to a copy into the same memory on shared-memory
chips.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is required on non-coherent architectures to ensure the value of
the fence is correct at all times. Failure to do this results in the
display freezing for a few seconds every now and then on Tegra.
The NOUVEAU_BO_COHERENT is a no-op for coherent architectures, so behavior
on x86 should not be affected by this patch.
Also bump the required libdrm version to 2.4.62, which introduced this
flag.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
The current implementation only moves the joinAt when splitting after
the given instruction, not before it. So if you have a BB with
foo
instr
bar
joinat
and thus with joinAt set, we end up first splitting before instr, at
which point the instr's bb is updated to the new bb. Since that bb
doesn't have a joinAt set (despite containing one), when splitting after
the instr, there is nothing to copy over. Since the joinat will be in
the "split" bb irrespective of whether we're splitting before or after
the instruction, move it over in either case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91124
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
This adds support for ARB_gpu_shader_fp64 and ARB_vertex_attrib_64bit to
llvmpipe.
Two things that don't mix well are SoA and doubles, see
emit_fetch_double, and emit_store_double_chan in this.
I've also had to split emit_data.chan, to add src_chan,
which can be different for doubles.
It handles indirect double fetches from temps, inputs, constants
and immediates. It doesn't handle double stores to indirects,
however it appears the mesa/st doesn't currently emit these,
it always does UARL/MOV combos, which will work fine.
tested with piglit, no regressions, all the fp64 tests seem to pass.
v2:
switch to using shuffles for fetch/store (Roland)
assert on indirect double stores - mesa/st never emits these (it uses MOV)
fix indirect temp/input/constant/immediates (Roland)
typos/formatting fixes (Roland)
v2.1:
cleanup some long lines, emit_store_double_chan cleanups.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.
The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.
v2: rename and flip meaning of flag (Jason)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Previously we were unconditionally doing ttn_get_src() even for
instructions with no src's. Which created a lot of unnecessary
load_const instructions. These were mostly harmless since NIR opt
passes would strip them back out. But for an ENDIF following a
BRK, it would result in load_const instructions created after the
NIR break instruction. Which nir_validate dislikes.
But we can actually just dtrt by using NumSrcRegs instead.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It isn't quite yet practical to enable TGSI_ANY_INOUT_DECL_RANGE shader
cap yet, at least not in drivers that need lower_to_scalar pass (which
right now is all of the ttn users), since the register arrays do not get
converted to SSA, which angers nir_lower_alu_to_scalar.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It is silly to traverse back to find first instruction that writes part
of a larger "virtual" register many times per instruction (plus per use
as a src to later instructions). Cache this information so we only
figure it out once.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The fanin source could be grouped, for example with shaders like:
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[9]
DCL SAMP[0]
DCL SVIEW[0], 2D, FLOAT
DCL TEMP[0], LOCAL
0: MOV TEMP[0].xy, IN[1].xyyy
1: MOV TEMP[0].w, IN[1].wwww
2: TXF TEMP[0], TEMP[0], SAMP[0], 2D
3: MOV OUT[1], TEMP[0]
4: MOV OUT[0], IN[0]
5: END
The second arg to the isaml is IN[1].w, so we need to look at the fanin
source to get the correct offset.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Split out most of dump_info() from ir3_cmdline compiler into a function
that can be used both by cmdline compiler and also for the disasm debug
option. This way, for FD_MESA_DEBUG=disasm we also get to see intput/
output registers, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some piglit tests, like arb_fragment_program-sparse-samplers, result in
having a null samp#0 but valid samp#1.
TODO: a3xx probably needs similar fix
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We can't rely on what we get from the assembler if we have indirect
addressing of constant file, since the assembler doesn't know the array
index. This got lost in the transition to NIR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A clear will do a partial validate, which will in turn reference all the
buffers in the bufctx again. However the fragprog last validated might
have already been deleted. So reset the bufctx when updating state.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>