Commit graph

13003 commits

Author SHA1 Message Date
Eric Anholt
24c5ab7bbb vc4: Drop dependency on r3 for color packing.
We can avoid it by carefully ordering the packing.  This is important as a
step in giving r3 to the register allocator.

total instructions in shared programs: 56087 -> 55957 (-0.23%)
instructions in affected programs:     18368 -> 18238 (-0.71%)
2014-12-08 16:08:13 -08:00
Eric Anholt
dfbf58c439 vc4: Add support for GL 1.0 logic ops. 2014-12-08 16:08:13 -08:00
Eric Anholt
5045d8ca42 vc4: Add support for TGSI_OPCODE_UCMP.
This is being emitted now from st_glsl_to_tgsi.cpp.
2014-12-08 16:08:13 -08:00
Tom Stellard
c16436149c radeonsi/compute: Clamp COMPUTE_TMPRING_SIZE.WAVES to: num_cu * 32
This is the maximum value allowed for this field.
2014-12-08 17:20:50 -05:00
Tom Stellard
0e1c085f17 winsys/radeon: Always report at least 1 compute unit
All uses of this require that the value be at least one, so it's
easier to report at least one than having to wrap all uses
in MAX2(max_compute_units, 1).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-12-08 17:20:50 -05:00
Tom Stellard
67dcbcd92c radeonsi: Program RASTER_CONFIG for harvested GPUs v5
Harvested GPUs have some of their render backends disabled, so
in order to prevent the hardware from trying to render things
with these disabled backends we need to correctly program
the PA_SC_RASTER_CONFIG register.

v2:
  - Write RASTER_CONFIG for all SEs.

v3:
  - Set GRBM_GFX_INDEX.INSTANCE_BROADCAST_WRITES bit.
  - Set GRBM_GFX_INFEX.SH_BROADCAST_WRITES bit when done setting
    PA_SC_RASTER_CONFIG.
  - Get num_se and num_sh_per_se from kernel.

v4:
  - Get correct value for num_se
  - Remove loop for setting PA_SC_RASTER_CONFIG
  - Only compute raster config when a backend has been disabled.

v5: Michel Dänzer
  - Fix computation for chips with multiple SEs

https://bugs.freedesktop.org/show_bug.cgi?id=60879

CC: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
2014-12-08 17:20:50 -05:00
Ilia Mirkin
043b79461f freedreno/a2xx: silence warning about missing DEPTH32X
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:53 -05:00
Ilia Mirkin
c416f49ebe freedreno/a3xx: handle index_bias (i.e. base_vertex)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:50 -05:00
Ilia Mirkin
b38b40d7bb freedreno/a3xx: add bgr565 texturing and rendering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:47 -05:00
Ilia Mirkin
e02ed16cb5 freedreno/a3xx: add support for SRGB render targets
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:43 -05:00
Ilia Mirkin
39a7c049d3 freedreno/a3xx: output RGBA16_FLOAT from fs for certain outputs
Fixes R11G11B10F rendering, and is required for SRGB format support.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:40 -05:00
Ilia Mirkin
3674c76edf freedreno/a3xx: re-enable rgb10_a2 render targets
There were previously regressions regarding border colors, which the
updated swizzle logic resolves.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:37 -05:00
Ilia Mirkin
fc94b2c2a0 freedreno/a3xx: fix border color swizzle to match texture format desc
This is a hack since it uses the texture information together with the
sampler, but I don't see a better way to do it. In OpenGL, there is a
1:1 correspondence.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:33 -05:00
Ilia Mirkin
97fef2db5c freedreno/a3xx: fix alpha-blending on RGBX formats
Expert debugging assistance provided by Chris Forbes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:20 -05:00
Roland Scheidegger
6f2cf5f3d0 llvmpipe: decrease MAX_SCENES from 2 to 1
Multiple scenes per context are meant to be used so a new scene can be built
while another one is processed in rasterization. However, quite surprisingly,
this does not actually work (and according to git log, possibly never did,
though maybe it did at some point further back (5 years+) but was buggy)
because we always wait immediately on the rasterizer to finish the scene when
contexts (and hence setup/scene) is flushed. This means when we try to get
an empty scene later, any old one is already empty again.
Thus using multiple scenes is just a waste of memory (not too bad, since the
additional scenes are guaranteed to be empty, which means their size ought to
be one data block (64kB) plus the size of some structs), without actually
really doing anything. (There is also quite some code for the whole concept of
multiple scenes which doesn't really do much in practice, but keep it hoping
the wait-on-scene-flush can be fixed some day.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-12-06 18:03:18 +01:00
Eric Anholt
befdff8142 vc4: Try swapping the regfile A to B to pair instructions.
total instructions in shared programs: 56995 -> 56087 (-1.59%)
instructions in affected programs:     40503 -> 39595 (-2.24%)
2014-12-05 16:27:58 -08:00
Eric Anholt
7d8b79f398 vc4: Allow pairing of some instructions that disagree about the WS bit.
No difference on shader-db because we tend to have a lot of other
conflicts going on as well (like RADDR_A disagreements)
2014-12-05 16:27:06 -08:00
Eric Anholt
6f32deb538 vc4: Add separate write-after-read dependency tracking for pairing.
If an operation is the last one to read a register, the instruction
containing it can also include the op that has the next write to that
register.

total instructions in shared programs: 57486 -> 56995 (-0.85%)
instructions in affected programs:     43004 -> 42513 (-1.14%)
2014-12-05 10:53:53 -08:00
Eric Anholt
042962df2d vc4: Fix inverted priority of instructions for QPU scheduling.
We were scheduling TLB operations as early as possible, and texture setup
as late as possible.  When I introduced prioritization, I visually
inspected that an independent operation got moved above texture results
collection, which tricked me into thinking it was working (but it was just
because texture setup was being pushed late).

total instructions in shared programs: 57651 -> 57486 (-0.29%)
instructions in affected programs:     18532 -> 18367 (-0.89%)
2014-12-05 10:43:14 -08:00
Eric Anholt
bd4057a5d7 vc4: Refuse to merge two ops that both access shared functions.
Avoids assertion failures in vc4_qpu_validate.c if we happen to find the
right set of operations available.
2014-12-05 10:43:14 -08:00
Eric Anholt
dadc32ac80 vc4: Allow dead code elimination of color reads.
This might happen if the blending functions are set up to not actually use
the destination color/alpha, for example.
2014-12-05 10:43:14 -08:00
Eric Anholt
34cf86bdc4 vc4: Add a debug flag for waiting for sync on submit.
This is nice when you're tracking down which command list is hanging the
GPU.
2014-12-05 10:43:14 -08:00
Rob Clark
4265148ac6 freedreno/a4xx: unify vertex/texture formats into a single table
Similar to the scheme that Ilia put in place for a3xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-04 16:01:37 -05:00
Rob Clark
e9589a8fcf freedreno/a4xx: fd4_util -> fd4_format
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-04 16:01:37 -05:00
Rob Clark
8bf69a29bb freedreno: update generated headers / a4xx fmt rename
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-04 16:01:37 -05:00
Rob Clark
c74f2db0a5 freedreno/a4xx: frag-depth fixes
Also seems to fix kill/discard.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-03 16:38:26 -05:00
Ilia Mirkin
79f9a106b9 freedreno/a3xx: implement anisotropic filtering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-12-03 09:23:46 -05:00
Rob Clark
b491d1ca6e freedreno/a4xx: rect textures
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-03 09:22:05 -05:00
Rob Clark
fbba633f2f freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-03 09:22:05 -05:00
Rob Clark
4cfe905a9b freedreno: fix signed vs unsigned lols
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-03 09:22:05 -05:00
Jan Vesely
02cc9e9f9e r600, llvm: Don't leak global symbol offsets
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-12-02 22:32:05 -05:00
Jan Vesely
ca0616f17e r600, llvm: Fix mem leak
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-12-02 11:30:13 -05:00
Eric Anholt
29c7cf2b2b vc4: Pair up QPU instructions when scheduling.
We've got two mostly-independent operations in each QPU instruction, so
try to pack two operations together.  This is fairly naive (doesn't track
read and write separately in instructions, doesn't convert ADD-based MOVs
into MUL-based movs, doesn't reorder across uniform loads), but does show
a decent improvement on shader-db-2.

total instructions in shared programs: 59583 -> 57651 (-3.24%)
instructions in affected programs:     47361 -> 45429 (-4.08%)
2014-12-01 22:29:42 -08:00
Dave Airlie
7b0067d23a r600g/sb: fix issues cause by GLSL switching to loops for switch
Since 73dd50acf6
glsl: implement switch flow control using a loop

The SB backend was falling over in an assert or crashing.

Tracked this down to the loops having no repeats, but requiring
a working break, initial code just called the loop handler for
all non-if statements, but this caused a regression in
tests/shaders/dead-code-break-interaction.shader_test.
So I had to add further code to detect if all the departure
nodes are empty and avoid generating an empty loop for that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86089
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-12-02 13:57:27 +10:00
Rob Clark
036f434ac2 freedreno/a4xx: alpha blend fixes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-01 20:31:23 -05:00
Rob Clark
a7d91c33c2 freedreno/a4xx: fix DRAW initiator encoding of index size
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-01 20:31:23 -05:00
Rob Clark
81194ac767 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-01 20:31:23 -05:00
Brian Paul
f54162857c svga: fix comment typo 2014-12-01 16:30:12 -07:00
Eric Anholt
3fe4d8e1e3 vc4: Introduce scheduling of QPU instructions.
This doesn't reschedule much currently, just tries to fit things into the
regfile A/B write-versus-read slots (the cause of the improvements in
shader-db), and hide texture fetch latency by scheduling setup early and
results collection late (haven't performance tested it).  This
infrastructure will be important for doing instruction pairing, though.

shader-db2 results:
total instructions in shared programs: 61874 -> 59583 (-3.70%)
instructions in affected programs:     50677 -> 48386 (-4.52%)
2014-12-01 11:00:23 -08:00
Eric Anholt
6958c404ca vc4: Drop the explicit scoreboard wait.
This is actually implicitly handled by the TLB operations.
2014-12-01 11:00:23 -08:00
Eric Anholt
334036fb64 vc4: Also deal with VPM reads at thread end.
Prevents a regression with QPU scheduling, which happens to put the no-op
reads for unused VPM contents end up at the end of the program.
2014-12-01 11:00:23 -08:00
Eric Anholt
a7b1a93137 vc4: Fix assertion about SFU versus texturing.
We're supposed to be checking that nothing else writes r4, which is done
by the TMU result collection signal, not the coordinate setup.

Avoids a regression when QPU instruction scheduling is introduced.
2014-12-01 11:00:23 -08:00
Eric Anholt
2d5784c825 vc4: Add another check for invalid TLB scoreboard handling.
This was caught by an assertion in the simulator.
2014-12-01 11:00:23 -08:00
Rob Clark
bb19f2c3c4 freedreno/a4xx: invalidate cache when vbo's change
Otherwise vertex shader can see stale cache data.  This in particular
happens when the same vbo is updated and reused.  Not sure yet if vbo's
at differing addresses but bound to same vertex buffer slot could have
issues, but seems safest to flush whenever new vertex buffers are bound.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-01 12:02:25 -05:00
Ilia Mirkin
4907c31385 freedreno/a3xx: add missing integer formats and enable rendering
The mesa state tracker doesn't fall back on similar integer formats, so
they must all be provided. Remove the restriction against integer color
rendering.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-11-30 13:04:28 -05:00
Ilia Mirkin
82104c19f3 freedreno/a3xx: enable sampling from integer textures
We need to produce a u32 destination type on integer sampling
instructions, so keep that in a shader key set based on the
currently-bound textures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-11-30 13:04:28 -05:00
Ilia Mirkin
8e336ef55b freedreno: allow each generation to hook into sampler view setting
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-11-30 13:04:28 -05:00
Ilia Mirkin
618ff11457 freedreno/a3xx: don't use half precision shaders for int/float32
Integer outputs end up getting mangled due to cov.f32f16, and float32
loses precision. Use full precision shaders in both of those cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-11-30 13:04:28 -05:00
Ilia Mirkin
f866446e8c freedreno/a3xx: disable blending for integer formats
Also add support for the BLENDABLE bind flag, similarly predicated on
non-int formats.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-11-30 13:04:28 -05:00
Ilia Mirkin
8e147e9ec8 freedreno/a3xx: remove blend clamp enables from gmem/clears
Just pass the data through unmolested. This probably has no effect since
blending isn't actually enabled.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-11-30 13:00:41 -05:00