Commit graph

29173 commits

Author SHA1 Message Date
Marek Olšák
546bc07349 radeonsi: don't preload constants at the beginning of shaders
LLVM can CSE the loads, thus we can always re-load constants before each
use. The decrease in SGPR spilling is huge.

The best improvements are the dumbest ones.

26011 shaders in 14651 tests
Totals:
SGPRS: 1453346 -> 1251920 (-13.86 %)
VGPRS: 742576 -> 728421 (-1.91 %)
Spilled SGPRs: 52298 -> 16644 (-68.17 %)
Spilled VGPRs: 397 -> 369 (-7.05 %)
Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread
Code Size: 36136488 -> 36001064 (-0.37 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 219315 -> 222221 (1.33 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-12 21:06:57 +02:00
Boyuan Zhang
e5009b7c26 st/va: enable vbr rate control for vaapi encode
This patch enables variable bit-rate for vaapi encoding. According to va.h,
target bit-rate equals to maximum bit-rate multiplies by target_percentage.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-12 10:34:53 -04:00
Leo Liu
6a7f79af9b vl/rbsp: match initial escaped bits with valid in the buffer
Otherwise the check for the three byte will not make sense.

Signed-off-by: Leo Liu <leo.liu@amd.com>
2016-09-12 10:09:27 -04:00
Nicolai Hähnle
b8703e363c winsys/radeon: rename nrelocs, crelocs to max_relocs, num_relocs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:55 +02:00
Nicolai Hähnle
d66bbfbede winsys/radeon: don't pre-allocate the relocations array
It's really not necessary. Switch to an exponential resizing strategy.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:53 +02:00
Nicolai Hähnle
f47da2e34f winsys/radeon: remove unused radeon_cs_context::priority_usage
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:51 +02:00
Nicolai Hähnle
17fff0c2de winsys/amdgpu: remove amdgpu_cs_lookup_buffer
The radeonsi driver doesn't and shouldn't care about the buffer index.
Only the virtual addresses matter.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:47 +02:00
Nicolai Hähnle
12657a7abf winsys/amdgpu: remove unused field domains from amdgpu_cs_buffer
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:07 +02:00
Nicolai Hähnle
3cdeb2a177 winsys/amdgpu: remove initial buffer list allocation
It's really not necessary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:04 +02:00
Nicolai Hähnle
cc53dfda9f winsys/amdgpu: extract adding a new buffer list entry into its own function
While at it, try to be a little more robust in the face of memory allocation
failure.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:01 +02:00
Nicolai Hähnle
11cbf4d7ae winsys/amdgpu: use only one fence per BO
The fence that is added to the BO during flush is guaranteed to be
signaled after all the fences that were in the fences array of the BO
before the flush, because those fences are added as dependencies for the
submission (and all this happens atomically under the bo_fence_lock).

Therefore, keeping only the last fence around is sufficient.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:59 +02:00
Nicolai Hähnle
480ac143df winsys/amdgpu: add do_winsys_deinit function
The idea is to have matching init/deinit functions so that deinit can be
re-used for cleanup in the error path of amdgpu_winsys_create.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:56 +02:00
Nicolai Hähnle
9fb8d354ca winsys/amdgpu: clean up error paths in amdgpu_winsys_create
No need to call pb_cache_deinit, because the cache hasn't been initialized
at that point.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:53 +02:00
Nicolai Hähnle
a6c38d47d4 gallium/radeon: page alignment for buffers is unnecessary
In some places (e.g. shader program pointers) we require 256 bytes alignment.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:45 +02:00
Nicolai Hähnle
339867c077 gallium/radeon/winsyses: remove #includes of pb_bufmgr.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:36 +02:00
Ilia Mirkin
148fbf32a8 freedreno/a3xx: disable filtering for texture buffers and int textures
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-11 13:14:06 -04:00
Niels Ole Salscheider
cfa914a1b4 st/clover: Define __OPENCL_VERSION__ on the device side
This is required by the OpenCL standard.

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-09-10 15:48:54 -07:00
Ilia Mirkin
a8c0c7301c gm107/ir: allow indirect inputs to be loaded by frag shader
Looks like the GM107 IPA op does not allow a separate offset when
using an indirect register. Instead we must use AL2P like we do for
indirect vertex operations on Kepler+.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-09-10 13:40:04 -04:00
Ilia Mirkin
a22aee5ad1 gm107/ir: AL2P writes to a predicate register
We have to force it to write to predicate 7 (aka PT) in order for it not
to mess up another predicate. Unclear what would be returned in the
predicate, perhaps an error code for out-of-bounds requests. Blob
doesn't seem to check it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-09-10 13:36:20 -04:00
Marek Olšák
08bcbfdc07 radeonsi: flush TC L2 before using a compute indirect buffer
There is no known test for this.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:07 +02:00
Marek Olšák
a5a2cc530c radeonsi: fix the VGT performance tweak for small instances
Based on the VGT spec.

The Vulkan driver doesn't do it optimally and they plan to fix it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák
a67d81580b radeonsi: remove the cache_flush atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák
f9750932ea winsys/amdgpu: replace OUT_CS with radeon_emit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák
81da78bfc3 winsys/radeon: replace OUT_CS with radeon_emit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Max Staudt
02675622b0 r300g: Set R300_VAP_CNTL on RSxxx to avoid triangle flickering
On the RSxxx chip series, HW TCL is missing and r300_emit_vs_state()
is never called.

However, if R300_VAP_CNTL is never set, the hardware (at least the
RS690 I tested this on) comes up with rendering artifacts, and
parts that are uploaded before this "fix" remain broken in VRAM.
This causes artifacts as in fdo#69076 ("triangle flickering").

It seems like this setup needs to happen at least once after power on
for 3D rendering to work properly. In the DDX with EXA, this happens in
RADEON_SWITCH_TO_3D() when processing an XRENDER Composite or an
Xv request. So playing back a video or starting a GTK+2 application
fixes 3D rendering for the rest of the session. However, this auto-fix
doesn't happen when EXA is not used, such as with GLAMOR or Wayland.

This patch ensures the register is configured even in absence of
the DDX's EXA module.

The register setting is taken from:
  xf86-video-ati  --  RADEONInit3DEngineInternal()
  mesa/src/mesa/drivers/dri/r300  --  r300EmitClearState()

Tested on RS690.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Max Staudt <mstaudt@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-09 13:30:47 +10:00
Marek Olšák
5981ab5445 gallium: remove PIPE_BIND_TRANSFER_READ/WRITE
not used in any useful way

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
0fbaf74977 radeonsi: unify si_set_optimal_micro_tile_mode call sites
There is nothing special happening in those code blocks.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
758bc52959 radeonsi: fix texture reinterpretation after DCC fast clear
The problem is that TC-compatible DCC clear codes translate
into different clear values when you change the format.

I have a new piglit reproducing the issue.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
46c425e7c8 radeonsi: enable DCC fast clear for 128-bit formats
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
831c0c80f1 radeonsi: clamp integer clear color values for DCC fast clear
It should be possible to get TC-compatible fast clear more often now.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
93f3d8e10d Revert "radeonsi: enable SDMA on CIK"
This reverts commit 0241d8300f.

It doesn't work with mobile Bonaire. It looks like the programming of
tiling parameters is wrong on some chips.
2016-09-08 22:51:33 +02:00
Tim Rowley
7514e326f8 swr: fixes for format mapping and texture sizing
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-09-08 10:43:21 -05:00
Rob Clark
74b1969d71 gbm: wire up fence extension
v2: make fence extension optional to not break non-i965 classic
    drivers, and move __DRI2_FENCE into core extensions, based
    on comments from Emil

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-07 11:54:00 -04:00
Rob Clark
32c061b110 freedreno: reject imports with bogus pitch
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-07 11:41:38 -04:00
Marek Olšák
fe40a65fb6 radeonsi: skip redundant INDEX_TYPE writes
Ported from Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
bdf767dac4 radeonsi: add more unlikely() uses into si_draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
a8e7ea6abc radeonsi: skip draws with instance_count == 0
loosely ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
53d74e055e gallium/radeon/winsyses: fix counting mapped memory
Not all buffers are unmapped explicitly.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Leo Liu
2593354643 st/omx/dec: enable hevc omx decode support
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
1a534d31fe st/omx/dec/h265: get the reference list for uvd
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
7d63b80728 st/omx/dec/h265: add short term reference picture sets
Specified by subclause 7.3.7

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
fa7c4f151d st/omx/dec/h265: add slice header
Specified by subclause 7.3.6.1

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
a639a2868e st/omx/dec/h265: add picture parameter sets
Specified by subclause 7.3.2.3

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
b3c1583e17 st/omx/dec/h265: add sequence parameter sets
Specified by subclause 7.3.2.2

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
6d186a79f2 st/omx/dec: add initial omx hevc support
Mainly based on the h264 implementation.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
0c374a7770 st/omx/dec: set dst rect to match src size
When creating interlaced video buffer, hegith set to "template.height =
align(tmpl->height/ array_size, VL_MACROBLOCK_HEIGHT);", and we use
"template.height *= array_size;" for the buffer height, so it actually
aligned with 32. With progressive video buffer it still aligned with 16,
thus causing different height between interlaced buffer and progressive
buffer for 4K (height=2160), and 720p (height=720).

When transcode the video, this will cause the 16 lines corruption
at the bottom of the encode video.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:01:24 -04:00
Marek Olšák
e7a73b75a0 gallium: switch drivers to the slab allocator in src/util 2016-09-06 14:24:04 +02:00
Dave Airlie
69fca64259 amd/addrlib: move addrlib from amdgpu winsys to common code
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-06 10:06:33 +10:00
Dave Airlie
1add3562e3 gallium/util: move endian detect into a separate file
This just ports the simpler endian detection bits, addrlib
sharing wants this outside gallium.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-06 10:06:24 +10:00
Dave Airlie
a86be7b6ad radeon: move radeon_family/chip_class defintions to common
This just moves these to a common header file.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-06 10:06:04 +10:00