Commit graph

22773 commits

Author SHA1 Message Date
Rob Clark
876550ff97 freedreno/ir3: handle "holes" in inputs
If, for example, only the x/y/w components of in.xyzw are actually used,
we still need to have a group of four registers and assign all four
components.  The hardware can't write in.xy and in.w to discontiguous
registers.  To handle this, pad with a dummy NOP instruction, to keep
the neighbor chain contiguous.

This fixes a problem noticed with firefox OMTC.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-13 08:17:18 -05:00
Marek Olšák
bed6f20f28 r600g: fix build failure when building the driver without LLVM 2015-01-12 23:20:26 +01:00
Eric Anholt
ff1948a1be vc4: Clamp the inputs to the blend equation to [0, 1].
Fixes the remaining ARB_color_buffer_float rendering tests.
2015-01-11 17:17:20 +13:00
Eric Anholt
1519a1928a vc4: Add a little helper for clamping to [0,1]. 2015-01-11 17:17:20 +13:00
Eric Anholt
1a328120d3 vc4: Fix up statechange management for uncompiled/compiled FS/VS.
No need to recheck the FS compile when the VS source has changed, but
there *is* a need to recheck the VS compile when the compiled VS has
changed (since the live inputs may change).

Fixes es3conform's blend test.
2015-01-11 17:17:20 +13:00
Eric Anholt
c122662984 vc4: Fix clear color setup for RGB565.
The util_pack_color() thing only sets up the low bits of the union, so
only return them, too.  Fixes intermittent failure on
fbo-alphatest-formats and es3conform's framebuffer-objects test under
simulation.
2015-01-11 17:17:19 +13:00
Eric Anholt
355156d2f7 vc4: Avoid the save/restore of r3 for raddr conflicts, just use ra31.
Turns out this was harmful in code quality:

total instructions in shared programs: 39487 -> 38845 (-1.63%)
instructions in affected programs:     22522 -> 21880 (-2.85%)

This costs us yet another register, which is painful since it means more
programs might fail to compile).  However, the alternative was causing us
trouble where we'd save/restore r3 while it contained a MIN-ed direct
texture offset, causing the kernel to fail to validate our shaders (such
as in GLB2.7).
2015-01-11 08:57:24 +13:00
Eric Anholt
a8e14c293b vc4: Allow dead code elimination of VPM reads.
This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.

total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs:     4721 -> 4455 (-5.63%)
2015-01-10 20:55:37 +13:00
Eric Anholt
b920ecf793 vc4: Cook up the draw-time VPM setup info during shader compile.
This will give the compiler the chance to dead-code eliminate unused VPM
reads.  This is particularly a big deal in the CS where a bunch of vattrs
are just not going to be used.
2015-01-10 15:24:56 +13:00
Eric Anholt
c772c92153 vc4: Split two notions of instructions having side effects.
Some ops can't be DCEd, while some of the ops that are just important due
to the args they have can be.
2015-01-10 15:24:46 +13:00
Eric Anholt
a58ae83882 vc4: Redo VPM reads as a read file.
This will let us do copy propagation of the VPM reads.
2015-01-10 14:35:24 +13:00
Eric Anholt
06b6a72a3e vc4: Fix miscalculation of the VPM space.
We pass in a byte offset, not dword.  I'm rather scared that this actually
managed to pass piglit, but it does fix gears.
2015-01-10 14:35:06 +13:00
Eric Anholt
92a0b0bd70 vc4: Pack VPM attr contents according to just the size of the attribute.
total instructions in shared programs: 40960 -> 39753 (-2.95%)
instructions in affected programs:     20871 -> 19664 (-5.78%)
2015-01-10 13:54:12 +13:00
Eric Anholt
72cb6619cb vc4: Restructure color packing as a series of channel replacements.
I'm using this in some WIP commits for doing blending in 8888 instead of
vec4.  But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:

total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs:     4381 -> 4314 (-1.53%)
2015-01-10 13:54:12 +13:00
Eric Anholt
3093bfacf0 vc4: Fix the no-copy-propagating-from-TLB_COLOR_READ check.
Our MOV's dst obviously won't be the TLB_COLOR_READ's def, because we're
ssa.
2015-01-10 13:54:12 +13:00
Eric Anholt
1d04432677 vc4: Move global seqno short-circuiting to vc4_wait_seqno().
Any other caller would want it, too.
2015-01-10 13:54:12 +13:00
José Fonseca
6c9b695a9c st/wgl: Ignore ulVersion in DrvValidateVersion.
We never used ulVersion for proper version checks.

Most 3rd party drivers use version 1, but recently NVIDIA OpenGL driver
started using a different version number, so the handy trick of renaming
Mesa's ICDs as nvoglv32.dll on Windows machines with NVIDIA hardware for
quick testing of Mesa software renderers stopped working.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-01-08 18:57:04 +00:00
Rob Clark
e7026ac486 freedreno/ir3: fix pos_regid > max_reg
We can't (or don't know how to) turn this off.  But it can end up being
stored to a higher reg # than what the shader uses, leading to
corruption.

Also we currently aren't clever enough to turn off frag_coord/frag_face
if the input is dead-code, so just fixup max_reg/max_half_reg.  Re-org
this a bit so both vp and fp reg footprint fixup are called by a common
fxn used also by ir3_cmdline.  Also add a few more output lines for
ir3_cmdline to make it easier to see what is going on.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
1e5c207dba freedreno/ir3: start on indirect gpr reads
Handle TEMP[ADDR[]] src registers by generating a fanin to group array
elements, similarly to how texture fetch instructions work.

NOTE:
For all the scalar instructions generated for a single tgsi vector
operation which uses an array src (or possibly even uses the same array
as multiple srcs), re-use the same fanin node.  Since a vector operation
operates on all components at the same time, it should never see more
than one version of the same array.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
63e5b72da8 freedreno/ir3: make reg array dynamic
To use fanin's to group registers in an array, we can potentially have a
much larger array of registers.  Rather than continuing to bump up the
array size, just make it dynamically allocated when the instruction is
created.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
9a9f2a893b freedreno/ir3: simplify RA
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers.  This lets us simplify RA by
working in terms of neighbor groups.

NOTE: has the slight problem that it can't optimize out mov's for things
like:

  MOV OUT[n], IN[m]

To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them).  This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
dddfe6c21e freedreno/ir3: regmask support for relative addr
For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't
need to support an arbitrary mask.  So for this case use a simple size
field rather than a bitmask.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
9bb865b3cf freedreno/ir3: split up ssa_src
Slight bit of refactoring that will be needed for indirect gpr
addressing (TEMP[ADDR[]]).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
d15db9e7c0 freedreno/ir3: drop instr_clone() stuff
Unnecessary and overly complicated.  And gets in the way for temp arrays
(TEMP[ADDR[]]).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
212b909643 freedreno/ir3: runtime enable RA debug for DEBUG builds
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
8c3952051e freedreno/ir3: handle relative addr in ir3_dump
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
56370b9feb freedreno/ir3: legalize vs unused sam dst components
We probably could be more clever elsewhere and mask out components that
are not used.  But either way, legalize should realize that there is
also a write-after-write hazard with texture sample instructions.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
063e2ef76a freedreno/ir3: hack for old compiler
Old compiler doesn't have ir3_block's.. so we need a special path.  This
hack can be dropped when ir3_compiler_old is retired.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
18899d1b80 tgsi: track max array per file
NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can
optionally have them.  So we implicitly assume that ArrayID==0 always
exists for each file.  This is why array_max[file] is never less than
zero.

You can tell from indirect_files(_read/written) if the legacy array-
id zero was actually used.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
49b4a6331f tgsi: keep track of read vs written indirects
At least temporarily, I need to fallback to old compiler still for
relative dest (for freedreno), but I can do relative src temp.  Only
a temporary situation, but seems easy/reasonable for tgsi-scan to
track this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-01-07 19:37:28 -05:00
Marek Olšák
d7cd9bfc7f Revert "radeonsi: reduce the size of si_pm4_state"
This reverts commit 9141d88555.

It broke OpenCL.
2015-01-08 00:10:36 +01:00
Tom Stellard
e28f9d0e60 radeonsi: Fix crash when destroying si_screen
We were invalidating si_screen:tm by calling
r600_destroy_common_screen() which frees the si_screen object.  This
caused the driver to crash in LLVMDisposeTargetMachine() since we
were passing it an invalid pointer.

https://bugs.freedesktop.org/show_bug.cgi?id=88170
2015-01-07 16:28:40 -05:00
Marek Olšák
1829f9c928 radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shaders
v2: complete rewrite

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-01-07 18:27:54 +01:00
Marek Olšák
d8185aa9a8 radeonsi: emit SURFACE_SYNC last
This fixes a case where a transform feedback buffer is fed back as an index
buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
7c9ec6ca7e radeonsi: flush all CB/DB caches unconditionally when changing the framebuffer
This is easier to read and will work better with shader image stores.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
a1bbccf521 radeonsi: change TC cache flushing strategy for textures
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
ca9c5b2be5 radeonsi: improve and fix streamout flushing
- we don't usually need to flush TC L2
- we should flush KCACHE
  (not really an issue now since we always flush KCACHE when updating
   descriptors, but it could be a problem if we used CE, which doesn't
   require flushing KCACHE)
- add an explicit VS_PARTIAL_FLUSH flag

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
18a30c9778 radeonsi: use TC L2 for CP DMA operations with shader resources on CIK
So that TC L2 doesn't need to be flushed.

The only problem is with index buffers, which don't use TC.
A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty).

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
11b76369f5 radeonsi: use TC L2 for updating descriptors on CIK
This allows not flushing TC L2 on CIK later.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
02ba7334d3 radeonsi: don't use TC L2 for updating descriptors on SI
It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA
when updating the same memory.

The solution for SI is to use uncached access here, because CP DMA doesn't
support cached access.

CIK will be handled in the next patch.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
edf18da85d radeonsi: only flush the right set of caches for CP DMA operations
That's either framebuffer caches or caches for shader resources.
The motivation is that framebuffer caches need to be flushed very rarely
here.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
73c2b0d18c radeonsi: implement separate ICACHE and KCACHE flush for SI
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
0aecf9e2d1 radeonsi: add a combined flag for flushing a framebuffer
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
2bfe9d4538 radeonsi: rename flush flags, split the TC flag into L1 and L2
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
d217819e78 r600g,radeonsi: separate cache flush flags
I will rename them for radeonsi.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
d14f2ab4ad r600g: move r6xx-specific streamout flush flagging into r600g
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
0543630d0b radeonsi: only set BC_OPTIMIZE_DISABLE when necessary
SPI_PS_IN_CONTROL is moved into the SPI mapping state.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
5d8e838dae radeonsi: do not define FACE as an ordinary PS input
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
15a7fff69a radeonsi: remove flatshade from the shader key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák
13de9475fc radeonsi: remove special handling of TGSI_INTERPOLATE_COLOR in shader codegen
It doesn't do anything useful. And colors are floating-point, so we can use
fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE
state only (in the next patch).

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00