Commit graph

571 commits

Author SHA1 Message Date
Eric Anholt
060979a380 v3d: Fix temporary leaks of temp_registers and when spilling.
On each iteration of successfully spilling a reg, we'd allocate another
copy of temp_registers, and when decrementing thread conut we'd allocate
another copy of the graph.  These all got cleaned up on freeing the
compile.
2019-03-05 12:57:39 -08:00
Eric Anholt
2780a99ff8 v3d: Move the stores for fixed function VS output reads into NIR.
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them.  Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.

total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)

Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)
2019-03-05 10:59:40 -08:00
Eric Anholt
a9dd227a47 v3d: Translate f2i(fround_even) as FTOIN.
This appears to be just what the opcode does.  Needed for equivalence when
moving FF VPM stores into NIR.
2019-03-05 10:59:40 -08:00
Eric Anholt
fd1d22b92e v3d: Stop treating exec masking specially.
In our backend, the successor edges from the blocks only point to where
QPU control flow goes, not where the notional control flow goes from a
"break" or "continue" modifying the execution mask to resume writing to
some channels later.  As a result, this attempt at restricting live ranges
ended up missing the live range of a value where a conditional
break/continue was present in a loop before the later def of a variable.
The previous commit ended up fixing the problem that the flag tried to
solve.

Fixes glsl-vs-loop-continue.shader_test and/or
glsl-vs-loop-redundant-condition.shader_test based on register allocation
results.
2019-03-05 07:36:24 -08:00
Eric Anholt
c6ae666cf5 v3d: Restrict live intervals to the blocks reachable from any def.
In the backend, we often have condition codes on writes to variables, such
that there's no screening def anywhere and the previous live ranges
algorithm would conclude that the start of the range extends to the start
of the program.  However, we do know that the live range can only extend
as early as you can reach from all blocks writing to the variable.

The motivation was that, while we have a couple of hacks to try to promote
conditional writes up to being a def within the block, the exec_mask one
was broken and needed a replacement.

Based on c3c1aa5aeb ("intel/fs: Restrict live intervals to the subset
possibly reachable from any definition.").
2019-03-05 07:36:24 -08:00
Eric Anholt
97566efe5c v3d: Rematerialize MOVs of uniforms instead of spilling them.
If we have a MOV of a uniform value available to spill, that's one of our
best choices.  We can just not spill the value, and emit a new load of the
uniform as the fill.  This saves bothering the TMU and the thrsw, and is
the same cost in uniforms (since the spill offset is a uniform anyway).

This doesn't have a huge impact on shader-db, since there aren't a whole
lot of spills and we usually copy-prop the uniforms at the VIR level such
that the only uniform MOVs are from vir_lower_uniforms:

total instructions in shared programs: 6430292 -> 6430279 (<.01%)
total uniforms in shared programs: 2386023 -> 2385787 (<.01%)
total spills in shared programs: 4961 -> 4960 (-0.02%)
total fills in shared programs: 6352 -> 6350 (-0.03%)

However, I'm interested in dropping the uniforms copy-prop in the backend,
since it would be cheaper to not load repeated uniforms if we have the
registers to spare.  This also saves many spills on
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20, which is what
motivated a bunch of my recent backend work in the first place:

before: 46 spills, 106 fills, 3062 instructions
after: 0 spills, 0 fills, 2611 instructions
2019-02-25 21:33:47 -08:00
Eric Anholt
e0fada983d v3d: Dump the VIR after register spilling if we were forced to.
Spilling is unusual, but one often has to debug it when it happens, so
dump it.
2019-02-25 21:26:24 -08:00
Eric Anholt
2786d2161a v3d: Fix vir_is_raw_mov() for input unpacks.
There are no users at the moment, but I wanted to start using this in
register spilling.
2019-02-25 21:26:24 -08:00
Eric Anholt
dbe3af67a4 v3d: Move i2b and f2b support into emit_comparison.
This lets us save a resolve to NIR true/false for ifs and discard_if.  No
change in shader-db.
2019-02-18 18:18:37 -08:00
Eric Anholt
0bba9c8489 v3d: Emit a simpler negate for the iabs implementation.
One program affected in my shader-db.

instructions in affected programs: 110 -> 108 (-1.82%)
2019-02-18 18:13:09 -08:00
Eric Anholt
1a775d43c9 v3d: Delay emitting ldvpm on V3D 4.x until it's actually used.
For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need
to do VPM setup when the load_inputs are out of order.  For V3D 4.x, we
can reduce register pressure by delaying our loads until they're actually
needed.  This also avoids a bunch of silly MOVs in the pre-opt VIR dump.

total instructions in shared programs: 6421415 -> 6419933 (-0.02%)
total uniforms in shared programs: 2393139 -> 2393140 (<.01%)
total threads in shared programs: 153864 -> 153906 (0.03%)
2019-02-18 18:09:07 -08:00
Eric Anholt
5a84d46896 v3d: Stop tracking num_inputs for VPM loads.
It's unused in the VS (since we need vattr_sizes[] anyway), so move it to
FS prog data.
2019-02-18 18:09:07 -08:00
Eric Anholt
581eba072d v3d: Add a function to describe what the c->execute.file check means.
This is what pointed out that we were misusing the check for last_thrsw in
the previous commit.
2019-02-18 18:09:07 -08:00
Eric Anholt
441294962c v3d: Fix the check for "is the last thrsw inside control flow"
The execute.file check used to be good enough, until I stopped setting up
the execute mask for uniform ifs.

No known tests fixed, noticed while doing a refactor.

Fixes: 0805060573 ("v3d: Handle dynamically uniform IF statements with uniform control flow.")
2019-02-18 18:09:07 -08:00
Eric Anholt
07d5b5a972 v3d: Fix f2b32 behavior.
Now that we don't have the vir_PF() magic, it's obvious that we were doing
the wrong thing for f2b32 by allowing -0.0 to produce true instead of
false.
2019-02-18 18:09:07 -08:00
Eric Anholt
3022b4bd82 v3d: Kill off vir_PF(), which is hard to use right.
You were allowed to pass in any old temp so that you could hopefully fold
the PF up into the def of the temp.  If we couldn't find one, it
implicitly generated a MOV(nop, reg).  However, that PF could have
different behavior depending on whether the def being folded into was a
float or int opcode, which the caller doesn't necessarily control.

Due to the fragility of the function, just switch all callers over to
vir_set_pf().  This also encourages the callers to use a _dest call for
the inst they're putting the PF on, eliminating a bunch of temps in the
pre-optimization VIR.

shader-db says the change is in the noise:

total instructions in shared programs: 6226247 -> 6227184 (0.02%)
instructions in affected programs: 851068 -> 852005 (0.11%)
2019-02-18 18:09:06 -08:00
Eric Anholt
6186a8d44e v3d: Do bool-to-cond for discard_if as well.
Turns this minimal conditional discard (glsl-fs-discard-01.shader_test):

0x3de0b086c5fe9000 fcmp.pushn  -, r1, r5; mov  r2, 0
0x3dec3086bbfc001f nop                  ; mov.ifa  r2, -1
0x3c047186bbe80000 nop                  ; mov.pushz  -, r2
0x3dea3186ba837000 setmsf.ifna  -, 0    ; nop

into:

0x3c00b186c582a000 fcmp.pushn  -, r2, r5; nop
0x3de83186ba837000 setmsf.ifa  -, 0     ; nop

total instructions in shared programs: 6229820 -> 6226247 (-0.06%)
2019-02-18 18:09:06 -08:00
Eric Anholt
718eef62cb v3d: Refactor bcsel and if condition handling.
Both were doing the same thing to try to get a condition to predicate on.
Noticed when I wanted to do this for discard_if as well.

No change in shader-db.
2019-02-18 18:09:06 -08:00
Eric Anholt
4586f9f902 v3d: Add a helper function for getting a nop register.
Just a little refactor to explain what's going on with QFILE_NULL.
2019-02-18 18:09:06 -08:00
Eric Anholt
339155122b v3d: Drop our hand-lowered nir_op_ffract.
The NIR lowering works fine, though it causes some slight noise due to
what looks like choices about propagating constants up multiply chains
changing.

total instructions in shared programs: 6229671 -> 6229820 (<.01%)
total uniforms in shared programs: 2312171 -> 2312324 (<.01%)
2019-02-18 18:09:06 -08:00
Eric Anholt
16f5085490 v3d: Drop a perf note about merging unpack_half_*, which has been implemented.
This is handled with copy-propagation now.
2019-02-18 18:09:06 -08:00
Eric Anholt
146e432b49 v3d: Fix incorrect flagging of ldtmu as writing r4 on v3d 4.x.
Fixes some stalls in 3DMMES's main vertex shader.

total instructions in shared programs: 6280751 -> 6211270 (-1.11%)
instructions in affected programs: 2935050 -> 2865569 (-2.37%)
2019-02-18 18:09:06 -08:00
Eric Anholt
cd5e0b2729 v3d: Use the early_fragment_tests flag for the shader's disable-EZ field.
Apparently we need disable-EZ flagged, not just "does Z writes".

Fixes
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
on 7278, even though it passed in simulation.

Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: 051a41d3d5 ("v3d: Add support for the early_fragment_tests flag.")
2019-02-18 18:09:06 -08:00
Eric Anholt
3f22b35a43 v3d: Use the NIR lowering for isign instead of rolling our own.
min/max instead of comparisons saves 2 instructions on
fs-sign-int.shader_test.
2019-02-14 00:32:30 +00:00
Eric Anholt
3c08ecf147 v3d: Whitespace consistency fix. 2019-02-05 15:46:42 -08:00
Eric Anholt
940501a446 v3d: Fix copy-propagation of input unpacks.
I had a single function for "does this do float input unpacking" with two
major flaws: It was missing the most common thing to try to copy propagate
a f32 input nunpack to (the VFPACK to an FP16 render target) along with
several other ALU ops, and also would try to propagate an f32 unpack into
a VFMUL which only does f16 unpacks.

instructions in affected programs: 659232 -> 655895 (-0.51%)
uniforms in affected programs: 132613 -> 135336 (2.05%)

and a couple of programs increase their thread counts.

The uniforms hit appears to be a pattern in generated code of doing (-a >=
a) comparisons, which when a is abs(b) can result in the abs instruction
being copy propagated once but not fully DCEed.
2019-02-05 15:46:04 -08:00
Eric Anholt
e5c6938590 v3d: Fix input packing of .l for rounding/fdx/fdy.
Avoids a regression in
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.* once we start
copy-propagating more input packs.
2019-02-05 15:45:23 -08:00
Eric Anholt
1a4170952d v3d: Fix pack/unpack of VFPACK operand unpacks.
We want to be able to copy propagate our texture unpacks into the vfpack.
2019-02-05 15:45:23 -08:00
Eric Anholt
d0fdbd4211 v3d: Fix dumping of shaders with alpha test.
We were trying to print a NULL entry from the table.
2019-02-05 15:42:14 -08:00
Eric Anholt
bdef17b052 v3d: Store the actual mask of color buffers present in the key.
If you only bound rt 1+, we'd still emit a write to the rt0 that isn't
present (noticed while debugging an
ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero
regression in another change).
2019-02-05 15:42:04 -08:00
Eric Anholt
ab4d5775b0 v3d: Fix image_load_store clamping of signed integer stores.
This was copy-and-paste fail, that oddly showed up in the CTS's
reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i
to rgba8i or reinterprets to other signed int formats.

Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
2019-01-31 08:39:40 -08:00
Eric Anholt
6053c7bb43 v3d: Fix a release build set-but-unused compiler warning. 2019-01-29 16:02:51 -08:00
Emil Velikov
385843ac3c vc4: Declare the last cpu pointer as being modified in NEON asm.
Earlier commit addressed 7 of the 8 instances available.

v2: Rebase patch back to master (by anholt)

Cc: Carsten Haitzler (Rasterman) <raster@rasterman.com>
Cc: Eric Anholt <eric@anholt.net>
Fixes: 300d3ae8b1 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-29 16:00:25 -08:00
Dylan Baker
90a7a9c973 automake: Add include dir for nir src directory
Fixes: 6281f26f06
       ("v3d: Add support for shader_image_load_store.")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-01-29 23:24:57 +00:00
Eric Anholt
f7769b5121 v3d: Fix the autotools build.
Noticed while looking at the gitlab-CI MR.
2019-01-29 14:00:27 -08:00
Carsten Haitzler (Rasterman)
300d3ae8b1 vc4: Declare the cpu pointers as being modified in NEON asm.
Otherwise, the compiler is free to reuse the register containing the input
for another call and assume that the value hasn't been modified.  Fixes
crashes on texture upload/download with current gcc.

We now have to have a temporary for the cpu2 value, since outputs must be
lvalues.

(commit message by anholt)

Fixes: 4d30024238 ("vc4: Use NEON to speed up utile loads on Pi2.")
2019-01-28 16:45:45 -08:00
Carsten Haitzler (Rasterman)
522f688471 vc4: Use named parameters for the NEON inline asm.
This makes the asm code more intelligible and clarifies the functional
change in the next commit.

(commit message and commit squashing by anholt)
2019-01-28 16:40:46 -08:00
Eric Anholt
c496b60ed8 v3d: Create separate sampler states for the various blend formats.
The sampler border color is encoded in the TMU's blending format (half
floats, 32-bit floats, or integers) and must be clamped to the format's
range unorm/snorm/int ranges by the driver.  Additionally, the TMU doesn't
know about how we're abusing the swizzle to support BGRA, A, and LA, so we
have to pre-swizzle the border color for those.

We don't really want to spend half a kb on sampler states in most cases,
so skip generating the variants when the border color is unused or is
0,0,0,0.
2019-01-27 08:30:03 -08:00
Eric Anholt
09472006ff v3d: Use the symbolic names for wrap modes from the XML. 2019-01-27 08:30:03 -08:00
Eric Anholt
060575bea8 v3d: Drop maximum number of texture units down to 16.
This is the GLES 3.2 minmax, and also what the closed source driver does.
Avoids hitting OOMs in the CTS's
dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
2019-01-27 08:30:03 -08:00
Eric Anholt
3e743d8cd8 v3d: Avoid duplicating limits defines between gallium and v3d core.
We don't want to pull the compiler into every include in the gallium
driver, so just make a new little header to store the limits.
2019-01-27 08:30:03 -08:00
Eric Anholt
fe6a21c867 v3d: Fix overly-large vattr_sizes structs.
We want one vector size per vector, not per component.
2019-01-27 08:30:03 -08:00
Eric Anholt
f72820c851 v3d: Add support for CS barrier() intrinsics. 2019-01-14 15:40:55 -08:00
Eric Anholt
9b45b06d7c v3d: Add support for CS shared variable load/store/atomics.
CS shared variables are handled effectively as SSBO access to a temporary
buffer that will be allocated at CS dispatch time.
2019-01-14 15:40:55 -08:00
Eric Anholt
01d913cf90 v3d: Add support for CS workgroup/invocation id intrinsics.
We get a payload for the ivec3 workgroup and an int local invocation
index, and we use the core lowering to turn into the global invocation id
and the local invocation id ivec3s.
2019-01-14 15:40:55 -08:00
Eric Anholt
6281f26f06 v3d: Add support for shader_image_load_store.
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
2019-01-14 15:40:55 -08:00
Eric Anholt
5932c2f0b9 v3d: Add SSBO/atomic counters support.
So far I assume that all the buffers get written.  If they weren't, you'd
probably be using UBOs instead.
2019-01-14 15:40:55 -08:00
Eric Anholt
1a63227ea0 v3d: Add support for matrix inputs to the FS.
We've been relying on linking splitting up our varying matrices into
separate vectors, but with SSO that doesn't happen.  Supporting matrix
inputs isn't too hard, though.
2019-01-14 13:18:02 -08:00
Eric Anholt
3790ee07e6 v3d: Fix txf_ms 2D_ARRAY array index.
We need to pass the array index through our coordinate transform
unchanged.  Fixes
dEQP-GLES31.functional.texture.multisample.samples_1.*_2d_array
2019-01-14 13:18:02 -08:00
Eric Anholt
051a41d3d5 v3d: Add support for the early_fragment_tests flag.
If this flag hasn't been set by the shader and it has some visible side
effects, then we need to disable EZ.
2019-01-14 13:18:02 -08:00