Commit graph

28249 commits

Author SHA1 Message Date
Marek Olšák
076db67217 gallium/radeon: inline radeon_winsys::query_memory_usage
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
9646ae7799 gallium/radeon/winsyses: expose per-IB used_vram and used_gart to drivers
The following patches will use this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
1c8f17599e gallium/radeon/winsyses: print CS submission error number
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
0edc2e433e radeonsi: flush if constant, shader, and streamout buffers use too much memory
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
c3efdeb8dd radeonsi: flush if sampler views and images use too much memory
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
d82cfab84c radeonsi: deal with high vertex buffer memory usage correctly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
e62caf576e radeonsi: take compute shader and dispatch indirect memory usage into account
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
c56ecb68e7 radeonsi: take scratch buffer and draw indirect memory usage into account
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
ed2254d157 radeonsi: check IB memory usage of CP DMA operations
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
f4b977bf3d gallium/radeon: add r600_resource::vram_usage and gart_usage
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Jason Ekstrand
f29fd7897a util: Move format_r11g11b10f.h to src/util
It's used from both mesa main and gallium.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:06:57 -07:00
Jason Ekstrand
6c665cdfc5 util: Move format_rgb9e5.h to src/util
It's used from both mesa main and gallium.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:06:31 -07:00
Tim Rowley
b521083ffb swr: [rasterizer core] static analysis fixes for conservative rast
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:35 -05:00
Tim Rowley
68dc544879 swr: [rasterizer core] implement InnerConservative input coverage
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:35 -05:00
Tim Rowley
4034f48833 swr: [rasterizer core] remove CanEarlyZ function
Test is now in SetupPipeline.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
b365989875 swr: [rasterizer core] use 32x32 macrotile for openswr
Significant performance increase (up to 2x) on high geometry workloads.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
5f4bc9e85b swr: [rasterizer fetch] add support for 24bit format fetch
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
527d45c8fe swr: [rasterizer fetch] additional fetch format support
Add support for 0 pitch in fetch.

Add support for USCALE/SSCALE for 32bit integer fetches.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
f438b7ba81 swr: [rasterizer jitter] fix potential jit exit crash
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
57b07498d2 swr: [rasterizer core] update sync handling
Sync now uses a callback to ensure that it's called by the last
thread moving past a DC.  This will help with the new counter
handling.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
191786d0f4 swr: [rasterizer core] rename variable
Avoid nested declarations of the same name within a single function.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:01:37 -05:00
Tim Rowley
61cc012e9a swr: [rasterizer jitter] adjust extern "C" block scope
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:01:31 -05:00
Tim Rowley
9f7d99fcfe swr: [rasterizer core] conservative rast degenerate handling
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:01:25 -05:00
Tim Rowley
f01827a469 swr: [rasterizer core] allow hexadecimal for integer knobs
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 13:52:12 -05:00
Eric Anholt
c976e164d2 vc4: Move scalarizing and some lowering to link time.
This works out to be a wash in terms of memory usage: We use more memory
to store the separate ALU instructions, but we optimize out a lot of code
as well.  The main result, though, is that we do more of our work at link
time rather than draw time.
2016-08-04 08:48:27 -07:00
Eric Anholt
2350569a78 vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.
We don't want to bake the whole array into the FS key, because of the
hashing overhead.  But we can keep a set of the arrays seen, and use a
pointer to the copy in as the array's proxy.

Between this and the previous patch, gl-1.0-blend-func now passes on
hardware, where previously it was filling the 256MB CMA area with shaders
and OOMing.

Drops 712 shaders from shader-db.
2016-08-04 08:48:27 -07:00
Eric Anholt
62ea2461ed vc4: Don't recompile the CS when the FS changes.
The compiled_fs_id is a proxy for the vc4->prog.fs->input_slots[], but
only the VS dereferences it.

Drops 754 shaders from shader-db.
2016-08-04 08:48:27 -07:00
Eric Anholt
d577dbc201 vc4: Move FS inputs setup out to a helper function.
It's a pretty big block, and I was about to make it bigger.
2016-08-04 08:48:27 -07:00
Michel Dänzer
67c5e843b9 vl/dri3: Destroy Present event context when destroying drawable v2
Without this, the X server may accumulate stale Present event contexts
if a client performs several video decoding sessions using the same
window.

v2: Based on Chris Wilson's review:
* Use xcb_discard_reply() instead of free(xcb_request_check())

Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
2016-08-04 15:45:43 +09:00
Eric Anholt
bc1fc9c985 vc4: Avoid generating a custom shader per level in glGenerateMipmaps().
We were baking in the LOD of the source level to each shader.  Instead,
pass it in as a uniform -- this requires storing it to a temp register,
but that's better than compiling a ton of separate shaders:

total instructions in shared programs: 115032 -> 115036 (0.00%)
instructions in affected programs:     96 -> 100 (4.17%)
LOST:                                  572
2016-08-03 10:55:54 -07:00
Eric Anholt
e97e9e62a1 vc4: Tell valgrind about BO allocations from mmap time to destroy.
This helps in debugging memory pressure.  It would be nice if we could
tell valgrind about it all the way from allocation time to destroy, but we
need a pointer to hand to VALGRIND_MALLOCLIKE_BLOCK.
2016-08-03 10:28:20 -07:00
Eric Anholt
a0671d67de vc4: Fix a leak of the src[] array of VPM reads in optimization.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-03 10:25:09 -07:00
Eric Anholt
9f95690959 vc4: Fix leak of the bo_handles table. 2016-08-03 10:25:08 -07:00
Eric Anholt
02f8c444e8 vc4: Fix handling of UBO range offsets.
The ranges are in units of bytes, not dwords.  This wasn't caught by
piglit tests because ttn tends to make one big uniform file, so we only
had one UBO range with a src and dst offset of 0.
2016-08-03 10:25:08 -07:00
Eric Anholt
36b9eb82c1 vc4: Dump NIR at shader state creation time as well.
I keep wanting to see this version of the NIR.
2016-08-03 10:25:08 -07:00
Marek Olšák
435d9595d3 r600g: use last_gfx_fence like radeonsi
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
a6bfafa083 gallium/radeon: move last_gfx_fence from radeonsi to common code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
c15a9dec29 radeonsi: skip unnecessary si_update_shaders calls
Small decrease in draw call overhead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
c2a0e99169 radeonsi: print the command line to VM fault reports (v2)
v2: rebase on top of Brian's commit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
6573ad69ef ddebug: print the command line to all logs (v2)
for piglit with the pipelined hang detection mode

v2: rebase on top of Brian's commit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
840353059a ddebug: don't use fmemopen on non-Linux OS
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97140

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
c88b309fd5 radeonsi: don't set the last parameter component of llvm.AMDGPU.cube
LLVM doesn't use it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
42c5f839ad radeonsi: use llvm.amdgcn.cube* if available
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
1fb6e55eaf radeonsi: use llvm.amdgcn.rsq.f64 if available
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
db2d31dab1 radeonsi: use v_mad_f32 for fma
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.

This tries to restore performance for those games.

The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.

Also, there is code size reduction:
  Totals from affected shaders:
  VGPRS: 109796 -> 109808 (0.01 %)
  Spilled SGPRs: 29995 -> 30022 (0.09 %)
  Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
  Code Size: 6667596 -> 6476356 (-2.87 %) bytes
  Max Waves: 26931 -> 26899 (-0.12 %)

I've not actually tested real performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Tim Rowley
11072de368 swr: build swr with -fno-strict-aliasing
swr rasterizer contains numerous data transfers between vectors
and ordinary C types.  Fixing for strict aliasing will take time.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-02 14:30:33 -05:00
Marek Olšák
6db93cd167 gallium/util: fix align64
it cut off the upper 32 bits

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-01 23:28:14 +02:00
Matt Turner
be35c6ba92 draw: Avoid aliasing violations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00
Matt Turner
8e68f35d32 r600g: Avoid aliasing violations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00
Matt Turner
d2838f77ec r300g: Avoid aliasing violation.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00