Commit graph

175 commits

Author SHA1 Message Date
Eric Anholt
3c08ecf147 v3d: Whitespace consistency fix. 2019-02-05 15:46:42 -08:00
Eric Anholt
940501a446 v3d: Fix copy-propagation of input unpacks.
I had a single function for "does this do float input unpacking" with two
major flaws: It was missing the most common thing to try to copy propagate
a f32 input nunpack to (the VFPACK to an FP16 render target) along with
several other ALU ops, and also would try to propagate an f32 unpack into
a VFMUL which only does f16 unpacks.

instructions in affected programs: 659232 -> 655895 (-0.51%)
uniforms in affected programs: 132613 -> 135336 (2.05%)

and a couple of programs increase their thread counts.

The uniforms hit appears to be a pattern in generated code of doing (-a >=
a) comparisons, which when a is abs(b) can result in the abs instruction
being copy propagated once but not fully DCEed.
2019-02-05 15:46:04 -08:00
Eric Anholt
d0fdbd4211 v3d: Fix dumping of shaders with alpha test.
We were trying to print a NULL entry from the table.
2019-02-05 15:42:14 -08:00
Eric Anholt
bdef17b052 v3d: Store the actual mask of color buffers present in the key.
If you only bound rt 1+, we'd still emit a write to the rt0 that isn't
present (noticed while debugging an
ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero
regression in another change).
2019-02-05 15:42:04 -08:00
Eric Anholt
ab4d5775b0 v3d: Fix image_load_store clamping of signed integer stores.
This was copy-and-paste fail, that oddly showed up in the CTS's
reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i
to rgba8i or reinterprets to other signed int formats.

Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
2019-01-31 08:39:40 -08:00
Eric Anholt
f7769b5121 v3d: Fix the autotools build.
Noticed while looking at the gitlab-CI MR.
2019-01-29 14:00:27 -08:00
Eric Anholt
3e743d8cd8 v3d: Avoid duplicating limits defines between gallium and v3d core.
We don't want to pull the compiler into every include in the gallium
driver, so just make a new little header to store the limits.
2019-01-27 08:30:03 -08:00
Eric Anholt
fe6a21c867 v3d: Fix overly-large vattr_sizes structs.
We want one vector size per vector, not per component.
2019-01-27 08:30:03 -08:00
Eric Anholt
f72820c851 v3d: Add support for CS barrier() intrinsics. 2019-01-14 15:40:55 -08:00
Eric Anholt
9b45b06d7c v3d: Add support for CS shared variable load/store/atomics.
CS shared variables are handled effectively as SSBO access to a temporary
buffer that will be allocated at CS dispatch time.
2019-01-14 15:40:55 -08:00
Eric Anholt
01d913cf90 v3d: Add support for CS workgroup/invocation id intrinsics.
We get a payload for the ivec3 workgroup and an int local invocation
index, and we use the core lowering to turn into the global invocation id
and the local invocation id ivec3s.
2019-01-14 15:40:55 -08:00
Eric Anholt
6281f26f06 v3d: Add support for shader_image_load_store.
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
2019-01-14 15:40:55 -08:00
Eric Anholt
5932c2f0b9 v3d: Add SSBO/atomic counters support.
So far I assume that all the buffers get written.  If they weren't, you'd
probably be using UBOs instead.
2019-01-14 15:40:55 -08:00
Eric Anholt
1a63227ea0 v3d: Add support for matrix inputs to the FS.
We've been relying on linking splitting up our varying matrices into
separate vectors, but with SSO that doesn't happen.  Supporting matrix
inputs isn't too hard, though.
2019-01-14 13:18:02 -08:00
Eric Anholt
3790ee07e6 v3d: Fix txf_ms 2D_ARRAY array index.
We need to pass the array index through our coordinate transform
unchanged.  Fixes
dEQP-GLES31.functional.texture.multisample.samples_1.*_2d_array
2019-01-14 13:18:02 -08:00
Eric Anholt
051a41d3d5 v3d: Add support for the early_fragment_tests flag.
If this flag hasn't been set by the shader and it has some visible side
effects, then we need to disable EZ.
2019-01-14 13:18:02 -08:00
Eric Anholt
6051c11d17 nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results.
V3D returns the texels in a different order in the resulting vec4 from
what GLSL wants, so we need to put in a swizzle.  Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-08 13:03:41 -08:00
Eric Anholt
8847370424 v3d: Use the core tex lowering.
Even without any clever optimization on the unpack operations, this gives
us a useful value for the channels read field, which we can use to avoid
ldtmu instructions to the no-op register.

instructions in affected programs: 890712 -> 881974 (-0.98%)
2019-01-04 15:59:59 -08:00
Eric Anholt
81b9361b68 v3d: Stop scalarizing our uniform loads.
We can pull a whole vector in a single indirect load.  This saves a bunch
of round-trips to the TMU, instructions for setting up multiple loads,
references to the UBO base in the uniforms, and apparently manages to
reduce register pressure as well.

instructions in affected programs: 3086665 -> 2454967 (-20.47%)
uniforms in affected programs: 919581 -> 721039 (-21.59%)
threads in affected programs: 1710 -> 3420 (100.00%)
spills in affected programs: 596 -> 522 (-12.42%)
fills in affected programs: 680 -> 562 (-17.35%)

Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
2019-01-04 15:41:23 -08:00
Eric Anholt
f8a8de8b9a v3d: Do UBO loads a vector at a time.
In the process of adding support for SSBOs and CS shared vars, I ended up
needing a helper function for doing TMU general ops.  This helper can be
that starting point, and saves us a bunch of round-trips to the TMU by
loading a vector at a time.
2019-01-04 15:41:23 -08:00
Eric Anholt
b0e0086257 v3d: Remove dead switch cases and comments from v3d_nir_lower_io.
Moving things to NIR left this mess around.  All we lower now is uniforms.
2019-01-04 15:41:23 -08:00
Eric Anholt
e1385e879d v3d: Reinstate the new shader-db output after v3d_compile() refactor.
I misplaced it in the rebase conflicts.
2019-01-04 15:26:19 -08:00
Eric Anholt
d2b899c0ec v3d: Refactor compiler entrypoints.
Before, I had per-stage entryoints with some helpers shared between them.
As I extended for compute shaders and shader-db, it turned out that the
other common code in the middle wanted to be shared too.
2019-01-02 14:12:29 -08:00
Eric Anholt
0805060573 v3d: Handle dynamically uniform IF statements with uniform control flow.
Loops will be trickier, since we need some analysis to figure out if the
breaks/continues inside are uniform.  Until we get that in NIR, this gets
us some quick wins.

total instructions in shared programs: 6192844 -> 6174162 (-0.30%)
instructions in affected programs: 487781 -> 469099 (-3.83%)
2019-01-02 14:12:29 -08:00
Eric Anholt
5e9ee6e841 v3d: Fold comparisons for IF conditions into the flags for the IF.
total instructions in shared programs: 6193810 -> 6192844 (-0.02%)
instructions in affected programs: 800373 -> 799407 (-0.12%)
2019-01-02 14:12:29 -08:00
Eric Anholt
078dc176bc v3d: Don't try to fold non-SSA-src comparisons into bcsels.
There could have been a write of a src in between the comparison and the
bcsel that would invalidate the comparison.
2019-01-02 14:12:29 -08:00
Eric Anholt
2e0433b687 v3d: Move the "Find the ALU instruction generating our bool" out of bcsel.
This will be reused for if statements.
2019-01-02 14:12:29 -08:00
Eric Anholt
c3ae0aa264 v3d: Simplify the emission of comparisons for the bcsel optimization.
I wanted to reuse the comparison stuff for nir_ifs, but for that I just
want the flags and no destination value.  Splitting the conditions from
the destinations ended up cleaning the existing code up, anyway.
2019-01-02 14:12:29 -08:00
Eric Anholt
ad1e59cf8d v3d: Add support for gl_HelperInvocation.
We can just look at the MSF flags -- if they're unset, then we're
definitely in a helper invocation.  Fixes
dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.
2018-12-30 08:05:11 -08:00
Eric Anholt
20021e3473 v3d: Add support for textureSize() on MSAA textures.
Fixes failures in
dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d
in the GLES3.1 suite.
2018-12-30 08:05:11 -08:00
Eric Anholt
906fca1b4b v3d: Add support for non-constant texture offsets.
Fixes
dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat
and others.
2018-12-30 08:05:11 -08:00
Eric Anholt
47caefc7b4 v3d: Force sampling from base level for tg4.
This is what the GLSL ES 310 spec tells us to do, but apparently the
"gather mode" flag doesn't imply it in the HW.  Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear
2018-12-30 08:05:11 -08:00
Eric Anholt
f9bdce9966 v3d: Add a note for a potential performance win on multop/umul24.
Noticed while debugging a testcase.
2018-12-30 08:05:11 -08:00
Eric Anholt
b36757448d v3d: Dead-code eliminate unused flags updates.
The greedy comparison folding in bcsel means that we may have left the
original bool-generating NIR ALU instruction dead, but DCE wasn't
eliminating the VIR code for it because of the flags updates.

total instructions in shared programs: 5186024 -> 5100894 (-1.64%)
instructions in affected programs: 1448695 -> 1363565 (-5.88%)
2018-12-30 08:05:11 -08:00
Eric Anholt
20e3526298 v3d: Don't generate temps for comparisons.
This was just generated work for vir_opt_dead_code and cluttered up the
dumps.
2018-12-30 08:04:54 -08:00
Eric Anholt
ebde5afb93 v3d: Move "does this instruction have flags" from sched to generic helpers.
I wanted to reuse it for DCE of flags updates.
2018-12-30 08:03:51 -08:00
Eric Anholt
39b1112189 v3d: Drop incorrect dependency for flpop.
It is just shifting probably-means-flags bits out of a value, it doesn't
actually update the flags on its own.
2018-12-30 08:03:51 -08:00
Eric Anholt
a7c9fd7573 v3d: Drop unused count_nir_instrs() helper.
This was for shader-db, but I haven't cared about NIR instruction counts
in a long time.
2018-12-30 08:03:51 -08:00
Eric Anholt
696f63f1b4 v3d: Hook up some shader-db output to GL_ARB_debug_output.
This allows the original shader-db project's run.c runner to parse things
easily, and is probably a good thing to have for GL_ARB_debug_output in
general.  I formatted it more like Intel's so I can mostly reuse their
report script.
2018-12-30 08:03:51 -08:00
Eric Anholt
9ec6a3d621 v3d: Fix uniform pretty printing assertion failure with branches.
Fixes: 248a7fb392 ("v3d: Do uniform pretty-printing in the QPU dump.")
2018-12-29 13:52:09 -08:00
Eric Anholt
d80761b8f3 v3d: Drop shadow comparison state from shader variant key.
The shadow state is now in the sampler.
2018-12-20 11:29:30 -08:00
Ian Romanick
378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
09b7e1d8e4 nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Eric Anholt
00e2cbc049 v3d: Fix the argument type for vir_BRANCH().
Apparently this has been spewing warnings for Jason's clang, but not my
gcc.
2018-12-17 09:52:23 -08:00
Jason Ekstrand
11dc130779 nir: Add a bool to int32 lowering pass
We also enable it in all of the NIR drivers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
80e8dfe9de nir: Rename Boolean-related opcodes to include 32 in the name
This is a squash of a bunch of individual changes:

    nir/builder: Generate 32-bit bool opcodes transparently

    nir/algebraic: Remap Boolean opcodes to the 32-bit variant

    Use 32-bit opcodes in the NIR producers and optimizations

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

     Use 32-bit opcodes in the NIR back-ends

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Eric Anholt
2977c77758 v3d: Use the original bit size when scalarizing uniform loads.
Prevents a regression in jekstrand's 1-bit series.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-16 21:03:01 +00:00
Eric Anholt
29927e7524 v3d: Drop in a bunch of notes about performance improvement opportunities.
These have all been floating in my head, and while I've thought about
encoding them in issues on gitlab once they're enabled, they also make
sense to just have in the area of the code you'll need to work in.
2018-12-14 17:48:01 -08:00
Eric Anholt
248a7fb392 v3d: Do uniform pretty-printing in the QPU dump.
If you're trying to trace what's going on in a QPU dump, this will
definitely help you find your way.
2018-12-14 17:48:01 -08:00
Eric Anholt
532b6c5671 v3d: Move uniform pretty-printing to its own helper function.
I want to reuse it in the QPU dump.
2018-12-14 17:48:01 -08:00