Commit graph

127 commits

Author SHA1 Message Date
Eric Anholt
248a7fb392 v3d: Do uniform pretty-printing in the QPU dump.
If you're trying to trace what's going on in a QPU dump, this will
definitely help you find your way.
2018-12-14 17:48:01 -08:00
Eric Anholt
532b6c5671 v3d: Move uniform pretty-printing to its own helper function.
I want to reuse it in the QPU dump.
2018-12-14 17:48:01 -08:00
Eric Anholt
a7e15a5086 v3d: Avoid assertion failures when removing end-of-shader instructions.
After generating VIR, we leave c->cursor pointing at the end of the
shader.  If the shader had dead code at the end (for example from preamble
instructions in a shader with no side effects), we would assertion fail
that we were leaving the cursor pointing at freed memory.  Since anything
following DCE should be setting up a new cursor anyway, just clear the
cursor at the start.
2018-12-14 17:48:01 -08:00
Eric Anholt
3f9bcf9136 v3d: Make sure that a thrsw doesn't split a multop from its umul24.
The thrsw will invalidate rtop, just like accumulators and flags.  Caught
by simulator assertions in CS imulextended/umulextended tests.

Fixes: 90269ba353 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
2018-12-14 17:48:01 -08:00
Eric Anholt
f1d98204c3 v3d: Fix a leak of the disassembled instruction string during debug dumps.
Fixes: ade416d023 ("broadcom: Add VC5 NIR compiler.")
2018-12-07 16:48:23 -08:00
Eric Anholt
bad95bb13c v3d: Add VIR dumping of TMU config p0/p1.
I had a bit of it for V3D 3.x, but didn't update it for 4.x.
2018-12-07 16:48:23 -08:00
Eric Anholt
1fc78ff3f1 v3d: Simplify VIR uniform dumping using a temporary. 2018-12-07 16:48:23 -08:00
Eric Anholt
5932575299 v3d: Garbage collect unused uniforms code. 2018-12-07 16:48:23 -08:00
Eric Anholt
acecee4c2d v3d: Return the right gl_SampleMaskIn[] value.
It's supposed to be the dispatched sample mask for this pixel, not the GL
state's sample mask.
2018-12-07 16:48:23 -08:00
Eric Anholt
6870111051 v3d: Fix a comment typo 2018-12-07 16:48:23 -08:00
Eric Anholt
ca0e4ae4bc v3d: Convert to using nir_src_as_uint() from const_value derefs.
Follows 16870de8a0 ("nir: Use nir_src_is_const and nir_src_as_* in core
code") to clean up v3d.
2018-12-07 16:48:23 -08:00
Eric Anholt
42652ea51e v3d: Use combined input/output segments.
The HW apparently has some issues (or at least a much more complicated VCM
calculation) with non-combined segments, and the closed source driver also
uses combined I/O.  Until I get the last CTS failure resolved (which does
look plausibly like some VPM stomping), let's use combined I/O too.
2018-12-07 16:48:23 -08:00
Jason Ekstrand
dca6cd9ce6 nir: Make boolean conversions sized just like the others
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64.  This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:03:07 -06:00
Kenneth Graunke
5b682143da nir: Make nir_lower_clip_vs optionally work with variables.
The way nir_lower_clip_vs() works with store_output intrinsics makes a
ton of assumptions about the driver_location field.

In i965 and iris, I'd rather do this lowering early and work with
variables.  v3d may want to switch to that as well, and ir3 could too,
but I'm not sure exactly what would need updating.  For now, handle
both methods.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-11-19 14:33:16 -08:00
Eric Anholt
538bca78e2 v3d: Don't try to set PF flags on a LDTMU operation
We need an ALU op in order to set PF.  Fixes a recent assertion failure in
dEQP-GLES3.functional.ubo.single_basic_type.shared.bool_vertex
2018-11-15 11:12:54 -08:00
Eric Anholt
4e1b163eed v3d: Update the TLB config for depth writes on V3D 4.2.
Fixes 311 piglit cases on the simulator.
2018-11-01 13:56:30 -07:00
Eric Anholt
cc54e1acf9 v3d: Use nir_remove_unused_io_vars to handle binner shader output DCE
We were doing this late after nir_lower_io, but we can just reuse the core
code.  By doing it at this stage, we won't even set up the VS attributes
as inputs, reducing our VPM size.
2018-10-30 10:46:52 -07:00
Eric Anholt
c152c79d5e v3d: Only add output slot tracking for the current varying slot.
We always emit 4 slots per slot because things like color output and
position processing in the epilogue will potentially look up more values
than the variable declaration had.  However, when we get a .location_frac
!= 0, we don't want to overwrite components of the following
.driver_location.
2018-10-30 10:46:52 -07:00
Eric Anholt
17c8198952 v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components.
This lets us trim unused trailing components in the vertex attributes,
reducing the size of our VPM allocations.
2018-10-30 10:46:52 -07:00
Eric Anholt
fc85f7cfdc v3d: Don't rely on sorting input vars for VPM read setup.
For supporting scalar VPM i/o at the NIR level, we need to do a pass over
the vars to figure out how big each attribute is after DCE.  Once we've
done that, we can just walk over c->vattr_sizes[] instead of bothering
with vars.
2018-10-30 10:46:52 -07:00
Eric Anholt
cc78676030 v3d: Split out NIR input setup between FS and VPM.
They don't share much code, and I'm about to rewrite the remaining shared
code for the VPM case.
2018-10-30 10:46:52 -07:00
Eric Engestrom
bb84fa146f util: use C99 declaration in the for-loop hash_table_foreach() macro
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-10-25 12:43:18 +01:00
Eric Anholt
8ec83dc51e v3d: Add support for hardware pack/unpack of half floats.
Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test
in half.
2018-10-15 17:16:44 -07:00
Eric Anholt
a91b158bd9 v3d: Fix setup of the VCM cache size.
There were two bugs working together to make things mostly work: I wasn't
dividing the VPM output size available by the size of a batch (vertex),
but I also had the size of the VPM reduced by a factor of 8.

Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it
seems also my intermittent varying failures.

Fixes: 1561e4984e ("v3d: Emit the VCM_CACHE_SIZE packet.")
2018-09-07 08:11:38 -07:00
Eric Anholt
1561e4984e v3d: Emit the VCM_CACHE_SIZE packet.
This is needed to ensure that we don't get blocked waiting for VPM space
with bin/render overlapping.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
50a8713d4f v3d: Avoid spilling that breaks the r5 usage after a ldvary.
Fixes bad rendering when forcing 2 spills in glxgears.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
f2c0d310d6 v3d: Make sure that QPU instruction-has-a-dest matches VIR.
Found when debugging register spilling -- we would try to spill the dest
of a STVPMV, inserting spill code after entering the last segment.  In
fact, we were likely to to choose to do this, given that the STVPMV "dest"
temp was never read from, making it cheap to spill.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
3f9cb2eb05 v3d: Wait for TMU writes to complete before continuing after a spill.
The simulator complained that we had write responses outstanding at shader
end.  It seems that a TMU read does not guarantee that previous TMU writes
by the thread have completed, which surprised me.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
ccbe33af5b v3d: Make sure we don't emit a thrsw before the last one finished.
Found while forcing some spilling, which creates a lot of short
tmua->thrsw->ldtmu sequences.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
f9d54dc3cf v3d: Add some debug code for forcing register spilling.
This is useful for periodically testing out register spilling to see how
it goes on simple shaders, rather than only failing on insanely
complicated ones.
2018-08-06 13:03:23 -07:00
Eric Anholt
3471ce9985 v3d: Add support for the TMUWT instruction.
This instruction is used to ensure that TMU stores have been processed
before moving on.  In particular, you need any TMU ops to be done by the
time the shader ends.
2018-07-31 16:05:04 -07:00
Eric Anholt
27f1bfe471 vc4: Fix meson build when enabled without v3d.
Reported-by: Rob Clark <robdclark@gmail.com>
Fixes: e92959c4e0 ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
2018-07-29 19:13:29 -07:00
Eric Anholt
d934d3206e nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.
This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-26 11:00:34 -07:00
Eric Anholt
6b73a97f84 v3d: Implement a small immediates optimization, based on VC4's.
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).

total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs:     82711 -> 80163 (-3.08%)
2018-07-23 10:21:43 -07:00
Eric Anholt
79e0f042bc v3d: Return an invalid src number if asked for a missing implicit uniform.
Sometimes when iterating over sources, we might want to check if it's the
implicit one.  We wouldn't want to match on a non-implicit src using this
function.
2018-07-23 10:21:43 -07:00
Eric Anholt
f2ea936f48 v3d: Skip emitting texture config parameter 2 if it's just the defaults.
shader-db:
total instructions in shared programs: 91275 -> 90768 (-0.56%)
instructions in affected programs:     20702 -> 20195 (-2.45%)
2018-07-23 10:21:43 -07:00
Eric Anholt
421e99d777 v3d: Update an XXX comment for a path we handled in HW on V3D 4.x. 2018-07-23 10:21:43 -07:00
Eric Anholt
e7ae900341 v3d: Switch to using the new SFU instructions on V3D 4.x.
These instructions let us write directly to the phys regfile, instead of
just R4.  That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.

There is still an extra instruction of latency, which is not represented
in the scheduler at the moment.  If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value.  That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.

total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs:     82590 -> 78196 (-5.32%)
2018-07-23 10:21:43 -07:00
Eric Anholt
cdfa99657d v3d: Fix the name of the "flpop" operation.
Noticed while trying to sort a new op into the appropriate place to match
the documentation.
2018-07-23 10:21:43 -07:00
Eric Anholt
a1beb333d8 v3d: Drop unused vir_SAT() operation.
We lower saturates in NIR.
2018-07-23 10:21:42 -07:00
Eric Anholt
8dfc6ee317 v3d: Rotate through registers to improve post-RA scheduling options.
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier.  I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.

shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs:     77254 -> 76092 (-1.50%)
2018-07-23 10:21:42 -07:00
Eric Anholt
1fb31819ae v3d: Allow reading from physical regs written in the previous instruction.
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.

shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs:     48520 -> 47234 (-2.65%)
2018-07-23 10:21:23 -07:00
Eric Anholt
229836fb37 v3d: Disable shader-db cycle estimates until we sort out TMU estimates.
I keep having to ignore these shader-db changes since I don't trust them,
so just disable the reports entirely.
2018-07-16 14:39:59 -07:00
Eric Anholt
2baab6bf2a v3d: Emit the lowered uniform just before its first use in a block.
total instructions in shared programs: 98578 -> 98119 (-0.47%)
instructions in affected programs:     27571 -> 27112 (-1.66%)

and it also eliminates most spills/fills on the CTS's randomized uniform
usage testcases.
2018-07-16 14:39:59 -07:00
Eric Anholt
26f830d9fc v3d: Add an assert that we don't provide an invalid texture return words.
The docs had an update noting this restriction, so reflect it in the code.
2018-07-16 14:39:59 -07:00
Eric Anholt
d661d78464 v3d: Apply GFXH-1625 restriction on TMUWT in the end of the shader.
This doesn't affect us yet since we're not doing TMUWTs, but I think we
will for GLES 3.1.
2018-07-16 14:39:59 -07:00
Eric Anholt
beeb94402f v3d: Implement noperspective varyings on V3D 4.x.
Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.
2018-07-09 11:48:32 -07:00
Eric Anholt
5601ab3981 v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.
Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one
2018-07-05 12:39:36 -07:00
Eric Anholt
7b63371420 v3d: Respect swap_color_rb for the f32_color_rb case.
We don't actually set the two flags together, but I want to use the
r/g/b/a reordered fields in the next commit.
2018-07-05 12:39:36 -07:00
Eric Anholt
f49d112a01 v3d: Implement ALPHA_TO_COVERAGE.
There's a convenient "FTOC" instruction for generating the coverage now,
unlike vc4.  This fixes
dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
2018-06-20 09:30:46 -07:00