Note that this has no effect yet. A case where can_sample_z/s can be false
in radeonsi will be added in a later patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a left-over of when I considered generalizing the separate stencil
support. I do prefer the new name since it emphasizes what flushing vs.
non-flushing means from a functional point-of-view, namely special handling
of the texture format.
v2: adjust r600_init_color_surface as well
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Height should be aligned with 2 macroblocks, thus making safer
for tiled mode
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Otherwise things will fail to build, if the builder is using another
version of LLVM.
v2: annotate all the dependencies of builder_gen.h
v3: clean the generated files as needed
v4: comment cleanups (Tim)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com> (v2)
Reported-by: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Considering how hard/annoying it was for many peoples' workflow to
properly generate the macro, it will be demoted to conditionally
available with follow-up commits.
v2: Kill off gracious blank line (Vedran).
Cc: mesa-stable@lists.freedesktop.org
Cc: Vedran Miletić <vedran@miletic.net>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Vedran Miletić <vedran@miletic.net>
While we are at it, fix a typo inside the comment which describes
what those constants are for.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
In presence of an indirect image access, the base offset should be
zeroed because the stride will be computed twice. This is a pretty
rare situation but it can happen when tex.r > 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
When emitting OP_SUB, the sign bit for FADD and FADD32I is not
at the same position. It's at position 45 for FADD but 51 for FADD32I.
This fixes the following piglit test:
tests/spec/arb_fragment_program/fdo30337b.shader_test
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way
ALU1 did. This should make the ALU[012] macros much clearer, by moving
most of their contents to vc4_qir.c
Now that we're about to start generating control flow in our NIR, we want
this in place. It optimizes things frequently in the CS, when the GL VS
has control flow that doesn't affect the vertex position.
Tiny change on shader-db currently, but it will be important when we start
emitting a lot of SFs from the same variable as part of control flow
support.
total instructions in shared programs: 89463 -> 89430 (-0.04%)
instructions in affected programs: 1522 -> 1489 (-2.17%)
total estimated cycles in shared programs: 250060 -> 250015 (-0.02%)
estimated cycles in affected programs: 8568 -> 8523 (-0.53%)
The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling. We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.
No change on shader-db.
I'm going to add an optimization for redundant SF update removal, which
will just remove the SF and leave us (in many cases) with an instruction
with a NULL destination and no side effects. Rather than teaching that
pass whether the whole instruction can be removed, leave that
responsibility to this pass.
We need to not DCE them even though they don't have a destination in QIR.
We also shouldn't relocate them in vc4_opt_vpm. Neither of these things
happen, but I'm about to make DCE consider instructions with a NULL
destination.
Noticed by code inspection. This hasn't been too big of a deal, because
our cond usages all start out as adder ops, either MOVs or the FTOI for Z
writes. MOVs *can* get converted to mul ops during scheduling, but
apparently we hadn't hit this.
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.
This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.
If an app creates more shaders, at most 4 threads will be used to compile
them.
Debug output disables this for shader stats to be printed in the correct
order.
(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
to allow multiple shaders to be compiled simultaneously.
ALso, shader-db can again use all 4 cores.
v2: Remove the pipe_mutex_unlock call in the error path.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
The function interface is ready to be used by util_queue.
Also, si_shader_select_with_key can no longer accept si_context.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Getting LLVM IRs of hanging shaders have never been easier.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will be needed after some LLVM changes that haven't landed yet.
v2: - use LLVMIsConstant to fix an LLVM assertion failure.
LLVMSetMetadata doesn't work with constants.
- don't set float metadata as string
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782
Fixes: 54c4d525da ("r600g: Enable FMA on chips that support it")
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: James Harvey <lothmordor@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Devices with smaller GMEM size need more tiles. On db410c at 2048x1152,
glmark2 shadow needed ~330 tiles for fullscreen. Lets bump it up to
512. (Maybe with MRT you could end up needing more, but at that point
things are probably going to be painfully slow.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some games are sloppy.. perhaps because it is defined behavior for DX or
perhaps because nv blob driver defaults things to zero.
So add driconf param to force uninitialized variables to default to zero.
This issue was observed with rust, from steam store. But has surfaced
elsewhere in the past.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>