By default after saying you are emitting a buffer, it'll expect a buffer
size. Once you set a format, it'll keep parsing that format until you
announce something else.
Previously, we emitted in XML order, which I happen to type in the
decreasing offset order of the specifications. However, the CLIF parser
wants increasing offsets.
With CLIFs, the parser will choose an address for the buffer being
created, so we need to use effectively relocations to buffers instead of
the addresses that the driver uses. This is also a whole lot more
intelligible for console output than raw addresses!
To generate CLIF files that the v3dv3 simulator can parse, we're going to
need to decode addresses, and for that we'll need the vaddr lookup
function from the clif structure from within v3d_decoder.
This reflects a change on the HW/closed SW side to drop this unused HW.
With it dropped on their side, the CLIF parser no longer expects to find
VG fields.
This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.
Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().
As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).
total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs: 82711 -> 80163 (-3.08%)
Sometimes when iterating over sources, we might want to check if it's the
implicit one. We wouldn't want to match on a non-implicit src using this
function.
These instructions let us write directly to the phys regfile, instead of
just R4. That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.
There is still an extra instruction of latency, which is not represented
in the scheduler at the moment. If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value. That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.
total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs: 82590 -> 78196 (-5.32%)
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier. I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.
shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs: 77254 -> 76092 (-1.50%)
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.
shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs: 48520 -> 47234 (-2.65%)
total instructions in shared programs: 98578 -> 98119 (-0.47%)
instructions in affected programs: 27571 -> 27112 (-1.66%)
and it also eliminates most spills/fills on the CTS's randomized uniform
usage testcases.
We weren't using the field yet, so it didn't affect anything.
Fixes: c0476d964a ("v3d: Express dithering mode in the same way that the CLIF parser does.")
We were overlapping it with the threadable/nan flags, resulting in
incorrect relocations (threadable/nan included in the offset) and wrong
ordering in the CLIF files.
The XML ends up noisier if you're only looking at one version, but from
the diffstat there's obvious wins in terms of deduplication. This will
get even more significant if we ever support 3.2 or 4.0.
The XML zipper wants one XML per version for filling out its tables, but
we want to do more than one GPU version per XML now. Assume that the
"gen" field will be the same as min_ver and look up our XML text assuming
that they're listed in increasing min_ver.
It turns out that most V3D versions change very few packets, so keeping
separate copies of the XML per version makes changing the XML a pain as
you have to replicate your changes to each one. This is the start of
changing it so that one XML can generate headers for multiple versions.
Right now, we name these fields as "field name minus one" so that your C
code obviously states what the value should be. However, it's easy enough
to handle at the codegen level with another little XML attribute, meaning
less C code and easier-to-read values in CLIF dumping and gdb as well.
(The actual CLIF format for simulator and FPGA replay takes in
pre-minus-one values, so we need it there too).
For a meson -Db_ndebug=true release build on x86_64, reduces text size of
libv3d.a from 53.0k to 51.6k. Inspired by 0d5329d626 ("anv: Disable
__gen_validate_value if NDEBUG is set.")
There's a convenient "FTOC" instruction for generating the coverage now,
unlike vc4. This fixes
dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage