Commit graph

29173 commits

Author SHA1 Message Date
Axel Davy
f0ec54ee32 st/nine: Fix some check flags
Uses the new defines introduced in previous commit.
See comment in the commit for more explanation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
39e98d351f st/nine: Unify some check flags
The new defines will be reused in a later patch.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:48 +02:00
Axel Davy
2290eac84e gallium/util: Really allow aliasing of dst for u_box_union_*
Gallium nine relies on aliasing to work with this function.
Without this patch, dirty region tracking was incorrect, which
could lead to incorrect textures or vertex buffers.
Fixes several game bugs with nine.
Fixes https://github.com/iXit/Mesa-3D/issues/234

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-10 23:43:48 +02:00
Axel Davy
5e7f0ebe29 softpipe: Cap to 2 GB on 32 bits
On 32 bits system, application memory is quite limited.
softpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.

Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-10 23:43:48 +02:00
Axel Davy
814ca96d0d llvmpipe: Cap to 2 GB on 32 bits
On 32 bits system, application memory is quite limited.
llvmpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.

Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-10 23:43:48 +02:00
Axel Davy
218459771a gallium/os: Fix overflow on 32 bits
On systems with more than 4GB of ram,
os_get_total_physical_memory was triggering an integer
overflow for the linux and haiku path, when on
32 bits.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94561

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 23:43:48 +02:00
Axel Davy
9904581dc6 st/nine: Memset pipe_resource templates
Fixes regression introduced by
ecd6fce261
and is more future proof than just clearing the next
field.

Other nine usages did already zero out the templates.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-10 23:43:48 +02:00
Samuel Pitoiset
d43151318a nvc0: fix valid range for shader buffers
When offset != 0, the valid range was wrong because the second
argument of util_range_add() is end, not size.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-10 21:32:16 +02:00
Ilia Mirkin
5239bd5920 nvc0/ir: fix overwriting of value backing non-constant gather offset
Normally the value is an immediate, which is moved to some temporary, so
there's no problem. In the case of a non-constant offset (as allowed by
ARB_gpu_shader5), we have to take care to copy it first before using it
to build up the bits.

This fixes a compilation error observed in F1 2015.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-10 14:28:32 -04:00
Ilia Mirkin
ec05331a7b nv50/ir: only stick one preret per function
A function with multiple returns would have had multiple preret settings
at the top of the function. While this is unlikely to have caused issues
since we don't use functions in earnest, it could have in some cases
overflowed the call stack, in case a function had a lot of early
returns.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-10 10:45:06 -04:00
Nicolai Hähnle
1f95121626 radeonsi: make more use of si_have_tgsi_compute
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:38:33 +02:00
Nicolai Hähnle
38cfd5160a gallium/radeon: assign a name to LLVM output variables in debug builds
This can be helpful with R600_DEBUG=preoptir.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:38:30 +02:00
Nicolai Hähnle
39a29c2431 gallium/radeon: avoid redundant work with overlapping in/out arrays
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:37:50 +02:00
Nicolai Hähnle
77c81164bc radeonsi: support ARB_compute_variable_group_size
Not sure if it's possible to avoid programming the block size twice (once for
the userdata and once for the dispatch).

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:36:42 +02:00
Rob Clark
495ba8884a gallium: add missing zero-init for resource templates
Mostly test code, plus one spot I noticed in r600.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 15:50:46 -04:00
Rob Clark
3ebfc44b42 freedreno: don't try to shadow layered textures
We will only hit this with multi-planar YUV external images, so we would
probably never hit this code path in the first place.  But if we did, it
wouldn't do the right thing so just bail.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-10-07 15:50:46 -04:00
Rob Clark
f88f025e8c freedreno/a3xx+a4xx: fix clip-plane lowering state
If enabled clip-planes have changed, we need to mark program state
dirty.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-10-07 15:50:46 -04:00
Eric Anholt
20d91e5ce9 vc4: Don't worry about partial Z/S clear if the other is already cleared.
We have to be careful to not smash the value they're clearing to, but
other than that we're fine.  Avoids quad clears in Processing, which likes
to do glClear(Z|S); glClear(Z).

Improves performance of Processing's QuadRendering demo at 5000 quads by
5.46507% +/- 1.35576% (n=15 before, 32 after)
2016-10-06 18:29:16 -07:00
Eric Anholt
cb328123fe vc4: Try to fix the HW-2116 workaround.
We were incrementing the count at the end of vc4_start_draw(), except that
that function returns immediately if we've already started drawing on this
batch.  It also failed to count the statechanges from the GFXH-515
workaround.

This incidentally allows repeated glClear() to be coalesced, because the
fast clears aren't counted in draw_calls_queued any more.  Fixes most of
the extra flushes in Processing, which emits glClear(Z|S); glClear(Z);
glClear(C) during its frame setup.

Improves performance of Processing's QuadRendering demo at 5000 quads by
3.33538% +/- 2.05846% (n=21 before, 15 after)
2016-10-06 18:29:12 -07:00
Eric Anholt
bca9a58d04 vc4: Drop dead argument from vc4_start_draw(). 2016-10-06 18:09:24 -07:00
Eric Anholt
9421a6065c vc4: Fix fallback to quad clears of depth in GLX.
The fix in the vc4-jobs series ended up triggering the fallback path on
GLX apps that use depth but not stencil.
2016-10-06 18:09:24 -07:00
Eric Anholt
8810270d06 vc4: Add the format name in miptree_debug.
I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format
of "0" didn't tell me much.
2016-10-06 18:09:24 -07:00
Eric Anholt
ee577e7fa7 vc4: Fix perf debug formatting on partial Z/S clear. 2016-10-06 18:09:24 -07:00
Eric Anholt
7c7bcbbc7d vc4: Drop destination register when it's unused.
This slightly reduces instructions on shader-db, but I think it's just
perturbing register allocation -- the allocator should have always
trivially colored these nodes, before.  This commit is just to make QIR
code failing more intelligible when register allocation fails.
2016-10-06 18:09:24 -07:00
Eric Anholt
d4ae5ca823 vc4: Fix live intervals analysis for screening defs in if statements.
If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).

Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.
2016-10-06 18:09:24 -07:00
Eric Anholt
06cc3dfda4 vc4: Fix simulator when more than one vc4_screen is opened.
We would assertion fail in setting up the simulator the second time
around.  This at least postpones the assertion failure until we've closed
all of the first set of screens and started opening a new set.
2016-10-06 18:09:24 -07:00
Eric Anholt
b30205b112 vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.
Fixes 100 piglit tests since the assertions were added to nir.h.  What's
amazing is that these tests used to pass, even when casting garbage.
2016-10-06 18:09:24 -07:00
Samuel Pitoiset
28ecd3eac2 nv50/ir: fix wrong check when optimizing MAD to SHLADD
Checking if MAD is supported is definitely wrong, and it's
more likely a typo I introduced few days ago which breaks
NV50 because SHLADD is not supported there.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-07 01:13:06 +02:00
Samuel Pitoiset
a198883bf7 nvc0: dump program binary only when NV50_PROG_DEBUG is set
When the chipset is forced with NV50_PROG_CHIPSET, we actually
only want to output the binary if NV50_PROG_DEBUG is also
enabled. Otherwise, this pollutes the shader-db output.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 01:01:17 +02:00
Samuel Pitoiset
56a0bed2c1 nvc0: expose ARB_compute_variable_group_size
Only expose 512 threads/block on Fermi to not be limited by
32 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
11e75fffeb nv50/ir: set number of threads/block for variable local size
When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.

This allows to use 64 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
07bb4513c6 gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK
v3: - use a new case statement in r600_pipe_common.c
    - fix compilation of softpipe...

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Karol Herbst
f96945c5b5 nv50/ir: optimize sub(a, 0) to a
helped some ue4 demos and divinity OS shaders

total instructions in shared programs : 2818674 -> 2818606 (-0.00%)
total gprs used in shared programs    : 379273 -> 379273 (0.00%)
total local used in shared programs   : 9505 -> 9505 (0.00%)
total bytes used in shared programs   : 25837792 -> 25837192 (-0.00%)

                local        gpr       inst      bytes
    helped           0           0          33          33
      hurt           0           0           0           0

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2016-10-06 19:39:51 +02:00
Jason Ekstrand
2ed17d46de nir: Make nir_foo_first/last_cf_node return a block instead
One of NIR's invariants is that control flow lists always start and end
with blocks.  There's no good reason why we should return a cf_node from
these functions since we know that it's always a block.  Making it a block
lets us remove a bunch of code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:37 -07:00
Steven Toth
e00fdd643b gallium/hud: Remove superfluous debug
No longer required.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 16:37:06 +01:00
Emil Velikov
b634be0e69 svga: add svga_mksstats.h to the sources list
Otherwise it won't be picked in the tarball and the build will fail.

Fixes: 0035f7f136 ("svga: add guest statistic gathering interface")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 16:17:09 +01:00
Emil Velikov
9b7fd4080a st/xvmc/tests: force enable assertions
Similar to the other 'tests', enable assertions in xvmc_bench.

This silences the GCC warnings about unused-variable(s), makes the
program actually useful, as the XvMC API called. Atm the function
calls are omitted, since they're called within the assert.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 15:03:46 +01:00
Samuel Pitoiset
a41cfbbf2b nvc0: dump program binary when chipset has been forced
Currently, program binaries are only dumped at upload time, but
when the chipset has been forced via NV50_PROG_CHIPSET we might
want to show the generated code, especially with shaderdb.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-05 21:15:44 +02:00
Marek Olšák
cc4a19c4ad radeonsi: fix texture border colors for compute shaders
There are VM faults without this.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:54 +02:00
Marek Olšák
844f8268e1 gallium/radeon/winsyses: set reasonable max_alloc_size
which is returned for GL_MAX_TEXTURE_BUFFER_SIZE.
It doesn't have any other use at the moment.
Bigger allocations are not rejected.

This fixes GL45-CTS.texture_buffer.texture_buffer_max_size on Bonaire.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:54 +02:00
Marek Olšák
1b37e5541c radeonsi: fix interpolateAt opcodes for .zw components
Not returning garbage in .zw seems pretty important.

This fixes:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_*_check.*

Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák
300a8221e9 radeonsi: add assertions to validate interpolation flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák
d4a8bf89ce radeonsi: interpolate colors after interpolation weight shuffling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák
faee2d6dda tgsi/scan: don't set interp flags for inputs only used by INTERP (v2)
(v1 pushed, then reverted)

This fixes 9 randomly failing tests on radeonsi:
  GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*

v2: use input_interpolate[input] (correct) instead of
    input_interpolate[index] (incorrect)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák
10e5f126dd ddebug: dump most driver information with GALLIUM_DDEBUG=always
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Karol Herbst
d8bcd3ef37 nv50/ra: let simplify return an error and handle that
fixes a crash in the case simplify reports an error

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-05 19:11:42 +02:00
Nicolai Hähnle
11cc59afca virgl: Fix build regression of commit 8a943564 2016-10-05 16:27:29 +02:00
Nicolai Hähnle
b5cd7dfe3e gallium/radeon: implement set_device_reset_callback
Check for device reset on flush. It would be nicer if the kernel just
reported this as an error on the submit ioctl (and similarly for fences),
but this will do for now.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:51:56 +02:00
Nicolai Hähnle
07bea09c64 ddebug: add pass-through of set_device_reset_callback
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:51:47 +02:00
Nicolai Hähnle
1a3c75e30e gallium: add pipe_context::set_device_reset_callback
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:51:34 +02:00