This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.
Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().
As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).
total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs: 82711 -> 80163 (-3.08%)
Sometimes when iterating over sources, we might want to check if it's the
implicit one. We wouldn't want to match on a non-implicit src using this
function.
These instructions let us write directly to the phys regfile, instead of
just R4. That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.
There is still an extra instruction of latency, which is not represented
in the scheduler at the moment. If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value. That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.
total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs: 82590 -> 78196 (-5.32%)
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier. I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.
shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs: 77254 -> 76092 (-1.50%)
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.
shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs: 48520 -> 47234 (-2.65%)
total instructions in shared programs: 98578 -> 98119 (-0.47%)
instructions in affected programs: 27571 -> 27112 (-1.66%)
and it also eliminates most spills/fills on the CTS's randomized uniform
usage testcases.
We weren't using the field yet, so it didn't affect anything.
Fixes: c0476d964a ("v3d: Express dithering mode in the same way that the CLIF parser does.")
We were overlapping it with the threadable/nan flags, resulting in
incorrect relocations (threadable/nan included in the offset) and wrong
ordering in the CLIF files.
The XML ends up noisier if you're only looking at one version, but from
the diffstat there's obvious wins in terms of deduplication. This will
get even more significant if we ever support 3.2 or 4.0.
The XML zipper wants one XML per version for filling out its tables, but
we want to do more than one GPU version per XML now. Assume that the
"gen" field will be the same as min_ver and look up our XML text assuming
that they're listed in increasing min_ver.
It turns out that most V3D versions change very few packets, so keeping
separate copies of the XML per version makes changing the XML a pain as
you have to replicate your changes to each one. This is the start of
changing it so that one XML can generate headers for multiple versions.
Right now, we name these fields as "field name minus one" so that your C
code obviously states what the value should be. However, it's easy enough
to handle at the codegen level with another little XML attribute, meaning
less C code and easier-to-read values in CLIF dumping and gdb as well.
(The actual CLIF format for simulator and FPGA replay takes in
pre-minus-one values, so we need it there too).
For a meson -Db_ndebug=true release build on x86_64, reduces text size of
libv3d.a from 53.0k to 51.6k. Inspired by 0d5329d626 ("anv: Disable
__gen_validate_value if NDEBUG is set.")
There's a convenient "FTOC" instruction for generating the coverage now,
unlike vc4. This fixes
dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
Fixes simulator assertion failures in
dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment
and similar complicated cases.
The docs called this field "uses both center W and centroid W", but
actually it's "do you need center W even if varyings don't obviously call
for it?"
Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
These together get the GLSL 3.00 unorm/snorm pack functions and
MESA_shader_integer operations working.
v2: Fix commit message typo.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
You'd need src/broadcom/cle/ in the -I previously, for srcdir != builddir.
nir was fine at that, but automake didn't have it.
Bugzilla: https://github.com/anholt/mesa/issues/104
This allows the driver to load against the merged kernel DRM driver. In
the process, rename most of the build system variables and gallium
plumbing functions.