Commit graph

22789 commits

Author SHA1 Message Date
Tom Stellard
dfdaf3eb7e radeon: Teach radeon_elf_read() how to parse reloc information v3
v2:
  - Use strdup for copying reloc names.
  - Free reloc memory.

v3:
  - Add free_relocs parameter to radeon_shader_binary_free_members()
2015-01-20 09:55:43 -05:00
Tom Stellard
5667aa58c4 radeon: Add a helper function for freeing members of radeon_shader_binary 2015-01-20 09:55:43 -05:00
Marek Olšák
ccc5b60b06 winsys/radeon: increase the size of buffer cache
This should fix this performance regression:
https://bugs.freedesktop.org/show_bug.cgi?id=88227

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-19 20:15:27 +01:00
Eric Anholt
84ef2d4156 vc4: Add some dumping for STORE_TILE_BUFFER_GENERAL. 2015-01-15 22:21:29 +13:00
Eric Anholt
1b241c59e8 vc4: Add dumping for the TILE_RENDERING_MODE_CONFIG packet.
I wanted to read it, so I wrote parsing.
2015-01-15 22:19:25 +13:00
Eric Anholt
d0d6d24723 vc4: Fix CL dumping trying to dump too far.
Execution will end at the cl->next, because that's what ct0ea/ct1ea get
programmed to.
2015-01-15 22:19:25 +13:00
Eric Anholt
0471f72755 vc4: Fix texture type masking.
Everything from ETC1 to RGBA64 was getting its top bit dropped, but we
didn't use any of those formats.
2015-01-15 22:19:25 +13:00
Eric Anholt
6313a2c8f0 vc4: Colormask should apply after all other fragment ops (like logic op).
Theoretically it should apply after dithering as well, but ditehring for
565 happens in fixed function in the TLB store.
2015-01-15 22:19:25 +13:00
Eric Anholt
0289a26201 vc4: No turning unpack arguments into small immediates.
Since unpack only happens on things read from the A register file, we have
to leave them as something that can be allocated to A (temp or uniform).
2015-01-15 22:19:25 +13:00
Eric Anholt
772c47aefe vc4: Move the tests for src needing to be an A register to vc4_qir.c.
I want it from another location.
2015-01-15 22:19:25 +13:00
Eric Anholt
8f2fb68026 vc4: Don't swap the raddr on instructions doing unpacks.
It would mean different unpacking behavior, since only the A file does
unpack (with PM==0).
2015-01-15 22:19:25 +13:00
Eric Anholt
5d5707707f vc4: Don't let pairing happen with badly mismatched unpack flags.
No difference on shader-db, but prevents definite regressions in the
blending changes.
2015-01-15 22:19:25 +13:00
Eric Anholt
3820866e40 vc4: Don't let pairing happen with badly mismatched pack flags.
No difference on shader-db, but will become more important as I introduce
more use of pack flags with the blending changes.
2015-01-15 22:19:25 +13:00
Eric Anholt
d1f2fc834d vc4: Fix early Z behavior on hardware.
It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z.  The result was
that you'd get big chunks of your rendering missing.
2015-01-15 22:19:25 +13:00
Michel Dänzer
82b7ee62fc Revert "radeonsi: only set BC_OPTIMIZE_DISABLE when necessary"
This reverts commit 0543630d0b.

It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.

We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.

Sorry I didn't remember this when reviewing the reverted change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-15 15:09:48 +09:00
Michel Dänzer
a6a75f1286 st/clover: Adapt to TargetLibraryInfo.h move in LLVM SVN r226078
Trivial.
2015-01-15 12:57:05 +09:00
Rob Clark
876550ff97 freedreno/ir3: handle "holes" in inputs
If, for example, only the x/y/w components of in.xyzw are actually used,
we still need to have a group of four registers and assign all four
components.  The hardware can't write in.xy and in.w to discontiguous
registers.  To handle this, pad with a dummy NOP instruction, to keep
the neighbor chain contiguous.

This fixes a problem noticed with firefox OMTC.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-13 08:17:18 -05:00
Marek Olšák
bed6f20f28 r600g: fix build failure when building the driver without LLVM 2015-01-12 23:20:26 +01:00
Eric Anholt
ff1948a1be vc4: Clamp the inputs to the blend equation to [0, 1].
Fixes the remaining ARB_color_buffer_float rendering tests.
2015-01-11 17:17:20 +13:00
Eric Anholt
1519a1928a vc4: Add a little helper for clamping to [0,1]. 2015-01-11 17:17:20 +13:00
Eric Anholt
1a328120d3 vc4: Fix up statechange management for uncompiled/compiled FS/VS.
No need to recheck the FS compile when the VS source has changed, but
there *is* a need to recheck the VS compile when the compiled VS has
changed (since the live inputs may change).

Fixes es3conform's blend test.
2015-01-11 17:17:20 +13:00
Eric Anholt
c122662984 vc4: Fix clear color setup for RGB565.
The util_pack_color() thing only sets up the low bits of the union, so
only return them, too.  Fixes intermittent failure on
fbo-alphatest-formats and es3conform's framebuffer-objects test under
simulation.
2015-01-11 17:17:19 +13:00
Eric Anholt
355156d2f7 vc4: Avoid the save/restore of r3 for raddr conflicts, just use ra31.
Turns out this was harmful in code quality:

total instructions in shared programs: 39487 -> 38845 (-1.63%)
instructions in affected programs:     22522 -> 21880 (-2.85%)

This costs us yet another register, which is painful since it means more
programs might fail to compile).  However, the alternative was causing us
trouble where we'd save/restore r3 while it contained a MIN-ed direct
texture offset, causing the kernel to fail to validate our shaders (such
as in GLB2.7).
2015-01-11 08:57:24 +13:00
Eric Anholt
a8e14c293b vc4: Allow dead code elimination of VPM reads.
This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.

total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs:     4721 -> 4455 (-5.63%)
2015-01-10 20:55:37 +13:00
Eric Anholt
b920ecf793 vc4: Cook up the draw-time VPM setup info during shader compile.
This will give the compiler the chance to dead-code eliminate unused VPM
reads.  This is particularly a big deal in the CS where a bunch of vattrs
are just not going to be used.
2015-01-10 15:24:56 +13:00
Eric Anholt
c772c92153 vc4: Split two notions of instructions having side effects.
Some ops can't be DCEd, while some of the ops that are just important due
to the args they have can be.
2015-01-10 15:24:46 +13:00
Eric Anholt
a58ae83882 vc4: Redo VPM reads as a read file.
This will let us do copy propagation of the VPM reads.
2015-01-10 14:35:24 +13:00
Eric Anholt
06b6a72a3e vc4: Fix miscalculation of the VPM space.
We pass in a byte offset, not dword.  I'm rather scared that this actually
managed to pass piglit, but it does fix gears.
2015-01-10 14:35:06 +13:00
Eric Anholt
92a0b0bd70 vc4: Pack VPM attr contents according to just the size of the attribute.
total instructions in shared programs: 40960 -> 39753 (-2.95%)
instructions in affected programs:     20871 -> 19664 (-5.78%)
2015-01-10 13:54:12 +13:00
Eric Anholt
72cb6619cb vc4: Restructure color packing as a series of channel replacements.
I'm using this in some WIP commits for doing blending in 8888 instead of
vec4.  But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:

total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs:     4381 -> 4314 (-1.53%)
2015-01-10 13:54:12 +13:00
Eric Anholt
3093bfacf0 vc4: Fix the no-copy-propagating-from-TLB_COLOR_READ check.
Our MOV's dst obviously won't be the TLB_COLOR_READ's def, because we're
ssa.
2015-01-10 13:54:12 +13:00
Eric Anholt
1d04432677 vc4: Move global seqno short-circuiting to vc4_wait_seqno().
Any other caller would want it, too.
2015-01-10 13:54:12 +13:00
José Fonseca
6c9b695a9c st/wgl: Ignore ulVersion in DrvValidateVersion.
We never used ulVersion for proper version checks.

Most 3rd party drivers use version 1, but recently NVIDIA OpenGL driver
started using a different version number, so the handy trick of renaming
Mesa's ICDs as nvoglv32.dll on Windows machines with NVIDIA hardware for
quick testing of Mesa software renderers stopped working.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-01-08 18:57:04 +00:00
Rob Clark
e7026ac486 freedreno/ir3: fix pos_regid > max_reg
We can't (or don't know how to) turn this off.  But it can end up being
stored to a higher reg # than what the shader uses, leading to
corruption.

Also we currently aren't clever enough to turn off frag_coord/frag_face
if the input is dead-code, so just fixup max_reg/max_half_reg.  Re-org
this a bit so both vp and fp reg footprint fixup are called by a common
fxn used also by ir3_cmdline.  Also add a few more output lines for
ir3_cmdline to make it easier to see what is going on.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
1e5c207dba freedreno/ir3: start on indirect gpr reads
Handle TEMP[ADDR[]] src registers by generating a fanin to group array
elements, similarly to how texture fetch instructions work.

NOTE:
For all the scalar instructions generated for a single tgsi vector
operation which uses an array src (or possibly even uses the same array
as multiple srcs), re-use the same fanin node.  Since a vector operation
operates on all components at the same time, it should never see more
than one version of the same array.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
63e5b72da8 freedreno/ir3: make reg array dynamic
To use fanin's to group registers in an array, we can potentially have a
much larger array of registers.  Rather than continuing to bump up the
array size, just make it dynamically allocated when the instruction is
created.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
9a9f2a893b freedreno/ir3: simplify RA
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers.  This lets us simplify RA by
working in terms of neighbor groups.

NOTE: has the slight problem that it can't optimize out mov's for things
like:

  MOV OUT[n], IN[m]

To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them).  This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
dddfe6c21e freedreno/ir3: regmask support for relative addr
For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't
need to support an arbitrary mask.  So for this case use a simple size
field rather than a bitmask.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
9bb865b3cf freedreno/ir3: split up ssa_src
Slight bit of refactoring that will be needed for indirect gpr
addressing (TEMP[ADDR[]]).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
d15db9e7c0 freedreno/ir3: drop instr_clone() stuff
Unnecessary and overly complicated.  And gets in the way for temp arrays
(TEMP[ADDR[]]).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
212b909643 freedreno/ir3: runtime enable RA debug for DEBUG builds
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
8c3952051e freedreno/ir3: handle relative addr in ir3_dump
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
56370b9feb freedreno/ir3: legalize vs unused sam dst components
We probably could be more clever elsewhere and mask out components that
are not used.  But either way, legalize should realize that there is
also a write-after-write hazard with texture sample instructions.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
063e2ef76a freedreno/ir3: hack for old compiler
Old compiler doesn't have ir3_block's.. so we need a special path.  This
hack can be dropped when ir3_compiler_old is retired.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
18899d1b80 tgsi: track max array per file
NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can
optionally have them.  So we implicitly assume that ArrayID==0 always
exists for each file.  This is why array_max[file] is never less than
zero.

You can tell from indirect_files(_read/written) if the legacy array-
id zero was actually used.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark
49b4a6331f tgsi: keep track of read vs written indirects
At least temporarily, I need to fallback to old compiler still for
relative dest (for freedreno), but I can do relative src temp.  Only
a temporary situation, but seems easy/reasonable for tgsi-scan to
track this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-01-07 19:37:28 -05:00
Marek Olšák
d7cd9bfc7f Revert "radeonsi: reduce the size of si_pm4_state"
This reverts commit 9141d88555.

It broke OpenCL.
2015-01-08 00:10:36 +01:00
Tom Stellard
e28f9d0e60 radeonsi: Fix crash when destroying si_screen
We were invalidating si_screen:tm by calling
r600_destroy_common_screen() which frees the si_screen object.  This
caused the driver to crash in LLVMDisposeTargetMachine() since we
were passing it an invalid pointer.

https://bugs.freedesktop.org/show_bug.cgi?id=88170
2015-01-07 16:28:40 -05:00
Marek Olšák
1829f9c928 radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shaders
v2: complete rewrite

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-01-07 18:27:54 +01:00
Marek Olšák
d8185aa9a8 radeonsi: emit SURFACE_SYNC last
This fixes a case where a transform feedback buffer is fed back as an index
buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00