Commit graph

64 commits

Author SHA1 Message Date
Eric Anholt
c2b13627d9 broadcom/vc5: Fix extraneous register index in QIR dumping of TLBU writes.
Just like TLB without a config uniform, we don't have a register index.
2018-03-26 17:46:23 -07:00
Eric Anholt
d7a015cbc6 broadcom/vc5: Account for InstanceID/VertexID in VPM segment size.
Fixes failure in
GTF-GLES3.gtf.GL3Tests.draw_instanced.draw_instanced_attrib_size
2018-03-22 15:12:21 -07:00
Eric Anholt
ba29b89dc7 broadcom/vc5: Set up a vertex position if the shader doesn't.
Our backend needs some sort of vertex position value to emit the scaled
viewport values and such.  Fixes potential segfaults in
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx
2018-03-22 15:12:21 -07:00
Eric Anholt
baeb6a4b4a broadcom/vc5: Fix up the NIR types of FS outputs generated by NIR-to-TGSI.
Unfortunately TGSI doesn't record the type of the FS output like GLSL
does, but VC5's TLB writes depend on the output's base type.  Just record
the type in the key at variant compile time when we've got a TGSI input
and then fix it up.

Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a
GPU hang that breaks most tests that come after it.
2018-03-21 14:02:34 -07:00
Eric Anholt
00910e3057 broadcom/vc5: Don't annotate dumps with stale live intervals.
As you're debugging register allocation, you may have changed the
intervals and not recomputed yet.  Just skip the dump in that case.
2018-03-19 16:44:20 -07:00
Eric Anholt
facc3c6f58 broadcom/vc5: Add support for register spilling.
Our register spilling support is nice to have since vc4 couldn't at all,
but we're still very restricted due to needing to not spill during a TMU
operation, or during the last segment of the program (which would be nice
to spill a value of, when there's a long-lived value being passed through
with little modification from the start to the end).

We could do better by emitting unspills for the last-segment values just
before the last thrsw, since the last segment is probably not the maximum
interference area.

Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3
others.
2018-03-19 16:44:06 -07:00
Eric Anholt
271fc58ba1 broadcom/vc5: Remove redundant last_inst lookup.
The point was to get the MOV, which the MOV_dest already returned.
2018-03-19 16:42:59 -07:00
Eric Anholt
34dc64f627 broadcom/vc5: On QPU pack error, dump the instruction and return cleanly.
This is nice for debugging when you've made a bad instruction.
2018-03-19 16:42:59 -07:00
Eric Anholt
d721348dcd broadcom/vc5: Add cursors to the compiler infrastructure, like NIR's.
This will let me do lowering late in compilation using the same
instruction builder as we use in nir_to_vir.
2018-03-19 16:42:59 -07:00
Eric Anholt
c81d681742 broadcom/vc5: Move the umul macro to a header.
Anywhere we want to multiply, we probably want this.
2018-03-19 16:42:59 -07:00
Eric Anholt
9e28c18cd1 broadcom/vc5: Correct the arg count of TIDX/EIDX. 2018-03-19 16:42:59 -07:00
Eric Anholt
55bf298333 broadcom/vc5: Re-do live variables after removing thrsws.
Otherwise our start/ends ips won't line up with the actual instructions.
2018-03-19 16:42:59 -07:00
Eric Anholt
4760040c09 broadcom/vc5: Extract v3d_qpu_writes_tmu() helper.
This will be reused in register spilling.
2018-03-19 16:42:59 -07:00
Timothy Arceri
a050ea60ee nir: add lower_ldexp to nir compiler options
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-28 09:23:49 +11:00
Eric Anholt
8bb000f460 broadcom/vc5: Try to merge more than 2 QPU instructions together.
Obviously it would be good to have an ADD and a MUL and a signal together,
but we can even potentially have multiple signals merged, as well.

total instructions in shared programs: 100423 -> 97874 (-2.54%)
instructions in affected programs:     78812 -> 76263 (-3.23%)
2018-02-05 09:29:37 +00:00
Eric Anholt
dc78643ace broadcom/vc5: Remove no-op MOVs after register allocation.
We emit some MOVs to track lifetimes of payload registers, but we don't
need there to be actual MOV instructions for them.

total instructions in shared programs: 101045 -> 100423 (-0.62%)
instructions in affected programs:     37083 -> 36461 (-1.68%)
2018-02-05 09:29:37 +00:00
Eric Anholt
f3978a7380 broadcom/vc5: Add missing shader-db instruction counting.
I must have misplaced it in the instruction packing rework.
2018-02-05 09:29:37 +00:00
Eric Anholt
353b42ccc7 broadcom/vc5: Fix a segfault on mix of booleans.
We don't have a src1 to look up if the compare instruction is "i2b".
2018-02-01 11:02:29 -08:00
Timothy Arceri
9a2e085680 nir: add lower_all_io_to_temps flag
This will be used for freedreno and vc4 which require all inputs
and outputs to be copied to temps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:08 +11:00
Eric Anholt
91f899cbc1 broadcom/vc5: Update the compiler for V3D 4.2. 2018-01-27 19:04:21 +11:00
Eric Anholt
5bc0b63799 broadcom/vc5: Use MSF to ignore discards/non-dispatched channels in loops.
Prevents potential infinite loops when a non-dispatched or discarded
channel never triggers the loop break condition.
2018-01-12 21:58:24 -08:00
Eric Anholt
762dd52951 broadcom/vc5: Use XOR instead of SUB for execute flags comparisons.
I think this should be equivalent other than power, and it's the kind of
comparison we use for nir_op_ieq.
2018-01-12 21:58:18 -08:00
Eric Anholt
8e4cba9d92 broadcom/vc5: Also check the update flags for avoiding DCE.
I was trying to do a NULL-destination UF, and it got removed.
2018-01-12 21:58:11 -08:00
Eric Anholt
368bab43fd broadcom/vc5: Add support for loading varyings in V3D 4.1.
The LDVARY signal now writes an arbitrary register, so I took out the
magic src register file and replaced it with an instruction with LDVARY
set so we have somewhere to hang a QFILE_TEMP destination for register
allocation.
2018-01-12 21:57:21 -08:00
Eric Anholt
5aaea3c4a0 broadcom/vc5: Add compiler support for V3D 4.x texturing. 2018-01-12 21:56:57 -08:00
Eric Anholt
42a35da96d broadcom/vc5: Move V3D 3.3 texturing to a separate file.
V3D 4.x texturing changes enough that #ifdefs would just make a mess of
it.
2018-01-12 21:56:37 -08:00
Eric Anholt
acf30e4916 broadcom/vc5: Move V3D 3.3 VPM write setup to a separate file.
For V4.1 texturing, I need the V4.1 XML, so the main compiler needs to
stop including V3.3 XML.
2018-01-12 21:56:24 -08:00
Eric Anholt
90269ba353 broadcom/vc5: Use THRSW to enable multi-threaded shaders.
This is a major performance boost on all of V3D, but is required on V3D
4.x where shaders are always either 2- or 4-threaded.
2018-01-12 21:55:30 -08:00
Eric Anholt
86a12b4d5a broadcom/vc5: Properly schedule the thread-end THRSW.
This fills in the delay slots of thread end as much as we can (other than
being cautious about potential TLBZ writes).

In the process, I moved the thread end THRSW instruction creation to the
scheduler.  Once we start emitting THRSWs in the shader, we need to
schedule the thread-end one differently from other THRSWs, so having it in
there makes that easy.
2018-01-12 21:55:23 -08:00
Eric Anholt
a075bb6726 broadcom/vc5: Implement GFXH-1684 workaround.
Apparently the VPM writes need to be flushed out before we end the shader.
2018-01-12 21:55:15 -08:00
Eric Anholt
edbd817c30 broadcom/vc5: Use a physical-reg-only register class for LDVPM.
This is needed for LDVPM on V3D 4.x, but will also be needed for keeping
values out of the accumulators across THRSW.
2018-01-12 21:54:42 -08:00
Eric Anholt
22a02f3e34 broadcom/vc5: Use the new LDVPM/STVPM opcodes on V3D 4.1.
Now, instead of a magic write register for VPM stores we have an
instruction to do them (which means no packing of other ALU ops into it),
with the ability to reorder the VPM stores due to the offset being baked
into the instruction.

VPM loads also gain the ability to be reordered by packing the row into
the A argument.  They also no longer write to the r3 accumulator, and
instead must be stored to a physical register.
2018-01-12 21:54:33 -08:00
Eric Anholt
dfee62eed3 broadcom/vc5: Add support for V3Dv4 signal bits.
The WRTMUC replaces the implicit uniform loads in the first two texture
instructions.  LDVPM disappears in favor of an ALU op.  LDVARY, LDTMU,
LDTLB, and LDUNIF*RF now write to arbitrary registers, which required
passing the devinfo through to a few more functions.
2018-01-12 21:53:45 -08:00
Dylan Baker
2083a14179 meson: Use dependencies for nir
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.

This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Eric Anholt
e60e3a56a2 broadcom/vc5: Fix discard_if during control flow.
I want to do the SETMSF.IFA to discard only if execute == 0 and cond, so
our dest of the PUSHZ needs to be nonzero if execute or !cond are nonzero.

Fixes dEQP-GLES3.functional.shaders.discard.dynamic_loop_dynamic.
2018-01-03 14:31:36 -08:00
Eric Anholt
635131a238 broadcom/vc5: Don't emit component 3/4 F16 TLB writes for float/vec2.
Fixes a simulator assertion failure on
dEQP-GLES3.functional.fragment_out.array.fixed.r8_highp_float.
2018-01-03 14:31:28 -08:00
Eric Anholt
8e5a0ed953 broadcom/vc5: Emit flat shade flags for varying components > 24.
This means that with no flatshading we'll emit the single-byte
ZERO_ALL_FLAT_SHADE_FLAGS, and otherwise emit a set of FLAT_SHADE_FLAGS to
get all the bits we need set.

There's a _SET enum in the packet we could use to possibly set entire
ranges of the bitfield without using another packet, but this at least
fixes the conformance failure.
2018-01-03 14:25:23 -08:00
Eric Anholt
2056e4a777 broadcom/vc5: Emit proper flatshading code for glShadeModel(GL_FLAT).
In updating the simulator, behavior changed slightly so that our old code
wasn't getting glxgears's flatshading interpolated right.  Emit flat
shading code just like we would for a normal flat-shaded varying, by
passing a flag in the shader key for glShadeModel(GL_FLAT) state and
customizing the color inputs based on that.
2018-01-03 14:25:23 -08:00
Eric Anholt
ba965084b6 broadcom/vc5: Move texture return channel setup into the compiler.
The compiler decides how many LDTMUs we're going to emit, and that must
match the P1 flags.  This brings the return channel counting to a single
place (so all that's passed into the compiler is "how many return channels
you may request from this texture's format), and was a necessary step for
shadow samplers once we stop using OVRTMUOUT=0.
2018-01-03 14:25:23 -08:00
Eric Anholt
1171f1749d broadcom/vc5: Enable NIR txd lowering on all txd instructions.
Fixes almost all of piglit's arb_shader_texture_lod grad tests, except for
the base -texgrad/texgradcube ones which fail on what appear to be
precision problems.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-14 14:36:17 -08:00
Eric Anholt
52f024b052 broadcom/vc5: Fix shader input/outputs for gallium's new NIR linking. 2017-12-14 14:36:17 -08:00
Eric Anholt
6a78416dab broadcom/vc5: Fix BASE_LEVEL handling with txl.
The HW doesn't add the base level anywhere (the min/max lod clamping is
what does base level), so we need to add it manually in this case.

Fixes piglit tex-miplevel-selection *Lod 2D.
2017-11-22 10:56:31 -08:00
Eric Anholt
87391e23cf broadcom/vc5: Ensure that there is always a TLB write.
This should fix some GPU hangs in our (currently always single-threaded)
fragment shaders, and definitely fixes assertion failures in simulation.
2017-11-17 16:09:55 -08:00
Andreas Boll
4f29ed38f3 broadcom/vc5: Remove unused v3d_compiler.c
Unused since original import of VC5.

Fixes: ade416d023 ("broadcom: Add VC5 NIR compiler.")

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-11-08 18:30:47 +00:00
Eric Anholt
50906e4583 broadcom/vc5: Do 16-bit unpacking of integer texture returns properly.
We were doing f16 unpacks, which trashed "1" values.  Fixes many piglit
texwrap GL_EXT_texture_integer cases.
2017-11-07 12:58:03 -08:00
Eric Anholt
dfff9ce45e broadcom/vc5: Fix scheduling for a non-SFU R4 write after a dead R4 write.
The v3d_qpu_writes_r*() were only checking for fixed-function accumulator
writes, not normal ALU writes to those regs.

Fixes fs-discard-exit-2 on simulation (but not HW).
2017-11-07 12:57:49 -08:00
Eric Anholt
4d2619a6b3 broadcom/vc5: Stop lowering negates to subs.
In the case of fneg(0.0), we were getting back 0.0 instead of -0.0.  We
were also needing an immediate 0 value for ineg, when there's an opcode to
do the job properly.

Fixes fs-floatBitsToInt-neg.shader_test.
2017-10-30 13:31:28 -07:00
Eric Anholt
e717e3e7cd broadcom/vc5: Add lowering for txf_ms to a txf on a 2x2-scaled texture.
The HW has no native sampler support for multisample textures, but since
we only need to support txf_ms and the layout is UIF, we just need to
scale up the texcoords and then add in the sample.

This drops the old TEXTURE_MSAA_ADDR special uniform, since we're treating
MSAA textures as textures, rather than basically texbos like VC4 had to.
2017-10-30 13:31:27 -07:00
Eric Anholt
125f2a751e broadcom/vc5: Lower unpack_*_4x8 to normal math.
We only have 2x16 unpacking in our ALUs.  To enable this, we also need
lower_fdiv for its new instructions, which had been handled at a higher
level previously.
2017-10-30 13:31:16 -07:00
Eric Anholt
eecdbaa985 broadcom/vc5: Add PIPE_TEX_WRAP_CLAMP support for linear-filtered textures.
I already had the texture's wrapping set up to use different behavior for
nearest or linear, so we just needed to saturate the coordinates in linear
mode to get the "proper" blend between the edge and border values.
2017-10-30 13:31:16 -07:00