Commit graph

3596 commits

Author SHA1 Message Date
Eric Anholt
3f9cb2eb05 v3d: Wait for TMU writes to complete before continuing after a spill.
The simulator complained that we had write responses outstanding at shader
end.  It seems that a TMU read does not guarantee that previous TMU writes
by the thread have completed, which surprised me.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
ccbe33af5b v3d: Make sure we don't emit a thrsw before the last one finished.
Found while forcing some spilling, which creates a lot of short
tmua->thrsw->ldtmu sequences.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
f9d54dc3cf v3d: Add some debug code for forcing register spilling.
This is useful for periodically testing out register spilling to see how
it goes on simple shaders, rather than only failing on insanely
complicated ones.
2018-08-06 13:03:23 -07:00
Eric Anholt
c2eab33b08 v3d: Actually put the "%s" in the snprintf.
I missed an important part when porting the change over, fixing my
compiler warning but breaking -Werror=format-security.

Fixes: e6ff5ac446 ("v3d: use snprintf(..., "%s", ...) instead of strncpy")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107443
2018-08-01 11:39:19 -07:00
Eric Anholt
e6ff5ac446 v3d: use snprintf(..., "%s", ...) instead of strncpy
Fixes a compiler warning about terminator NUL, based on f836d799f9
("intel/decoder: use snprintf(..., "%s", ...) instead of strncpy")
2018-07-31 16:42:11 -07:00
Eric Anholt
3471ce9985 v3d: Add support for the TMUWT instruction.
This instruction is used to ensure that TMU stores have been processed
before moving on.  In particular, you need any TMU ops to be done by the
time the shader ends.
2018-07-31 16:05:04 -07:00
Eric Anholt
d934492ff9 v3d: Dump the contents off all the buffers in CLIF mode.
A V3D_DEBUG=clif file from a non-texturing .shader_test can now be
successfully run through the CLIF runner in the simulator.  Now I need to
build an open source CLIF runner against the v3d DRM module.
2018-07-30 14:29:01 -07:00
Eric Anholt
99a5ac250b v3d: Split walking the CLs to generate relocs from walking CLs to dump.
We need to dump each buffer's contents in order for a CLIF file, so we
need to collect all of the relocs into a buffer (such as the indirect CL
full of both uniforms and GL shader states) before we start dumping.
2018-07-30 14:29:01 -07:00
Eric Anholt
2df6f1a3df v3d: Include commands to run the BCL and RCL in CLIF dumps. 2018-07-30 14:29:01 -07:00
Eric Anholt
c6449e33e3 v3d: Use a short, underscored name for packets in CLIF/CL dumping.
These will match the names that the CLIF parser expects to see.  I may in
the future decide to change more of the other names so that I match the
names the HW/closed SW team uses for their packets, rather than the names
in the spec (which only they and I can read anyway).
2018-07-30 14:29:01 -07:00
Eric Anholt
b56f8c475e v3d: Rename "configuration" and "config" in the XML to "cfg"
This matches what CLIF parsing expects, and makes
TILE_BINNING_MODE_CONFIGURATION_COMMON_CONFIGURATION into a much more
legible TILE_BINNING_MODE_CFG_COMMON.
2018-07-30 14:29:01 -07:00
Eric Anholt
300e609feb v3d: s/colour/color in the XML.
The CLIF format expects american english spelling, and the rest of Mesa is
too.  I was previously adhering to the spec's spelling, which is
counterproductive.
2018-07-30 14:29:01 -07:00
Eric Anholt
3a8550ad06 v3d: Rename primitives to prims in the XML to match CLIF names.
This makes us match up with the V3D HW team's names a bit more.
2018-07-30 14:29:01 -07:00
Eric Anholt
6237c64049 v3d: Print CLIF fixed-point values as just their decimal value.
The parser doesn't handle float input, so we have to dump the raw value.
2018-07-30 14:29:01 -07:00
Eric Anholt
8da47b7648 v3d: When not doing terminal pretty-printing, comment struct field names.
The struct field names aren't part of the CLIF ABI, just the order of
fields within the struct.  The comments are there for human readability.
2018-07-30 14:29:01 -07:00
Eric Anholt
103f21b13d v3d: Add a separate flag for CLIF ABI output versus human-readable CLs.
A few of the upcoming changes would make the V3D_DEBUG=cl output less
readable, so let's make proper CLIF file production be under a separate
V3D_DEBUG=clif flag.
2018-07-30 14:29:01 -07:00
Eric Anholt
89ac6fa403 v3d: Add pack header support for f187 values.
V3D only has one of these (the top 16 bits of a float32) left in its CLs,
but VC4 had many more.  This gets us proper pretty-printing of the values
instead of a large uint.
2018-07-30 14:29:01 -07:00
Eric Anholt
27f1bfe471 vc4: Fix meson build when enabled without v3d.
Reported-by: Rob Clark <robdclark@gmail.com>
Fixes: e92959c4e0 ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
2018-07-29 19:13:29 -07:00
Eric Anholt
942456f646 v3d: Skip printing sub-id or pad fields in CLIF dumping.
The parser doesn't expect them, so our fields would end up mismatched.
They're not really useful in console output, either.
2018-07-27 18:00:48 -07:00
Eric Anholt
3ee0ab599e v3d: Emit commands to switch CLIF parser to CL/shader/attr input mode.
By default after saying you are emitting a buffer, it'll expect a buffer
size.  Once you set a format, it'll keep parsing that format until you
announce something else.
2018-07-27 18:00:46 -07:00
Eric Anholt
a57770aa37 v3d: Dump fields in CLIF output in increasing offset order.
Previously, we emitted in XML order, which I happen to type in the
decreasing offset order of the specifications.  However, the CLIF parser
wants increasing offsets.
2018-07-27 17:56:55 -07:00
Eric Anholt
95bafeeabf v3d: Print addresses in CLIFs as references to buffers.
With CLIFs, the parser will choose an address for the buffer being
created, so we need to use effectively relocations to buffers instead of
the addresses that the driver uses.  This is also a whole lot more
intelligible for console output than raw addresses!
2018-07-27 17:56:36 -07:00
Eric Anholt
3c02838d29 v3d: Stop doing pretty-printed colorful booleans in CLIF output.
The parser wants to see a 1 or 0.  We can put "true" and "false" in a
comment to clarify that it's a boolean and the parser will skip it.
2018-07-27 17:55:57 -07:00
Eric Anholt
422910d2e7 v3d: Move clif dumping to a separate step from noting where the CLs are.
Now all the printing happens from the same worklist processing.
2018-07-27 17:08:35 -07:00
Eric Anholt
01b4952773 v3d: Move clif dump BO lookup into the clif dumper.
The clif dumper is going to need information about all of our BOs if we're
going to dump them for replay purposes.
2018-07-27 17:08:35 -07:00
Eric Anholt
e92959c4e0 v3d: Pass the whole clif_dump structure to v3d_print_group().
To generate CLIF files that the v3dv3 simulator can parse, we're going to
need to decode addresses, and for that we'll need the vaddr lookup
function from the clif structure from within v3d_decoder.
2018-07-27 17:08:35 -07:00
Eric Anholt
9bf9a6d6a1 v3d: Drop the VG support from the XML.
This reflects a change on the HW/closed SW side to drop this unused HW.
With it dropped on their side, the CLIF parser no longer expects to find
VG fields.
2018-07-27 12:56:36 -07:00
Eric Anholt
5a1cc3861c v3d: Use /* */ instead of () for enum names in CLIF output.
This lets the comments be ignored by the CLIF parser.
2018-07-27 12:56:36 -07:00
Eric Anholt
95a0f99825 v3d: CLIF-dump the "Vec size" field as 0 == maximum value.
That's what a user should want to see, and what the CLIF parser wants.
This should maybe be generalized.
2018-07-27 12:56:36 -07:00
Eric Anholt
d934d3206e nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.
This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-26 11:00:34 -07:00
Mathieu Bridon
9ebd8372b9 python: Use range() instead of xrange()
Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.

Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().

As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-07-24 11:07:04 -07:00
Eric Anholt
6b73a97f84 v3d: Implement a small immediates optimization, based on VC4's.
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).

total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs:     82711 -> 80163 (-3.08%)
2018-07-23 10:21:43 -07:00
Eric Anholt
79e0f042bc v3d: Return an invalid src number if asked for a missing implicit uniform.
Sometimes when iterating over sources, we might want to check if it's the
implicit one.  We wouldn't want to match on a non-implicit src using this
function.
2018-07-23 10:21:43 -07:00
Eric Anholt
f2ea936f48 v3d: Skip emitting texture config parameter 2 if it's just the defaults.
shader-db:
total instructions in shared programs: 91275 -> 90768 (-0.56%)
instructions in affected programs:     20702 -> 20195 (-2.45%)
2018-07-23 10:21:43 -07:00
Eric Anholt
421e99d777 v3d: Update an XXX comment for a path we handled in HW on V3D 4.x. 2018-07-23 10:21:43 -07:00
Eric Anholt
e7ae900341 v3d: Switch to using the new SFU instructions on V3D 4.x.
These instructions let us write directly to the phys regfile, instead of
just R4.  That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.

There is still an extra instruction of latency, which is not represented
in the scheduler at the moment.  If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value.  That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.

total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs:     82590 -> 78196 (-5.32%)
2018-07-23 10:21:43 -07:00
Eric Anholt
58c1d3860f v3d: Add QPU pack/unpack for the new SFU instructions.
These instructions allow writing the result to any register, instead of a
special writeback to r4.
2018-07-23 10:21:43 -07:00
Eric Anholt
cdfa99657d v3d: Fix the name of the "flpop" operation.
Noticed while trying to sort a new op into the appropriate place to match
the documentation.
2018-07-23 10:21:43 -07:00
Eric Anholt
91e24e5718 v3d: Print the instruction we're testing in the QPU disasm/pack round-trip.
If we fail initial disassembly, it's good to know what instruction it was
that failed.
2018-07-23 10:21:42 -07:00
Eric Anholt
a1beb333d8 v3d: Drop unused vir_SAT() operation.
We lower saturates in NIR.
2018-07-23 10:21:42 -07:00
Eric Anholt
8dfc6ee317 v3d: Rotate through registers to improve post-RA scheduling options.
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier.  I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.

shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs:     77254 -> 76092 (-1.50%)
2018-07-23 10:21:42 -07:00
Eric Anholt
1fb31819ae v3d: Allow reading from physical regs written in the previous instruction.
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.

shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs:     48520 -> 47234 (-2.65%)
2018-07-23 10:21:23 -07:00
Eric Anholt
229836fb37 v3d: Disable shader-db cycle estimates until we sort out TMU estimates.
I keep having to ignore these shader-db changes since I don't trust them,
so just disable the reports entirely.
2018-07-16 14:39:59 -07:00
Eric Anholt
2baab6bf2a v3d: Emit the lowered uniform just before its first use in a block.
total instructions in shared programs: 98578 -> 98119 (-0.47%)
instructions in affected programs:     27571 -> 27112 (-1.66%)

and it also eliminates most spills/fills on the CTS's randomized uniform
usage testcases.
2018-07-16 14:39:59 -07:00
Eric Anholt
26f830d9fc v3d: Add an assert that we don't provide an invalid texture return words.
The docs had an update noting this restriction, so reflect it in the code.
2018-07-16 14:39:59 -07:00
Eric Anholt
d661d78464 v3d: Apply GFXH-1625 restriction on TMUWT in the end of the shader.
This doesn't affect us yet since we're not doing TMUWTs, but I think we
will for GLES 3.1.
2018-07-16 14:39:59 -07:00
Eric Anholt
beeb94402f v3d: Implement noperspective varyings on V3D 4.x.
Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.
2018-07-09 11:48:32 -07:00
Eric Anholt
93f437d128 v3d: Fix typo in dither mode offset.
We weren't using the field yet, so it didn't affect anything.

Fixes: c0476d964a ("v3d: Express dithering mode in the same way that the CLIF parser does.")
2018-07-09 11:48:32 -07:00
Eric Anholt
5601ab3981 v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.
Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one
2018-07-05 12:39:36 -07:00
Eric Anholt
7b63371420 v3d: Respect swap_color_rb for the f32_color_rb case.
We don't actually set the two flags together, but I want to use the
r/g/b/a reordered fields in the next commit.
2018-07-05 12:39:36 -07:00