Commit graph

24315 commits

Author SHA1 Message Date
Eric Anholt
69ef08d303 vc4: Make the pack-to-unorm instructions be non-SSA.
This helps ensure that the register allocator doesn't force the later pack
operations to insert extra MOVs.

total instructions in shared programs: 98170 -> 98159 (-0.01%)
instructions in affected programs:     2134 -> 2123 (-0.52%)
2015-08-20 23:42:17 -07:00
Eric Anholt
0bba4fa070 vc4: Allow QIR registers to be non-SSA.
Now that we have NIR, most of the optimization we still need to do is
peepholes on instruction selection rather than general dataflow
operations.  This means we want to be able to have QIR be a lot closer to
the actual QPU instructions, just with virtual registers.  Allowing
multiple instructions writing the same register opens up a lot of
possibilities.
2015-08-20 23:40:22 -07:00
Eric Anholt
ceb1a31842 vc4: We can now move TEX_RESULT accesses across other r4 ops.
No difference on shader-db.
2015-08-20 23:40:16 -07:00
Ilia Mirkin
8483577f6b nv50/ir: pre-compute BFE arg when both bits and offset are imm
Due to a quirk in how the nv50 opt passes run, the algebraic
optimization that looks for these BFE's happens before the constant
folding pass. Rearranging these passes isn't a great idea, but this is
easy enough to fix. Allows a following cvt to eliminate the bfe in
certain situations.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-20 22:16:46 -04:00
Glenn Kennard
4237dfb978 r600g: Fix handling of TGSI_OPCODE_ARR with SB
FLT_TO_INT goes in the vector pipes on evergreen/NI,
not the trans unit as on earlier chips.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-21 09:46:13 +10:00
Edward O'Callaghan
7a32652231 r600: Turn 'r600_shader_key' struct into union
This struct was getting a bit crowded, following the lead of
radeonsi, mirror the idea of having sub-structures for each
shader type. Turning 'r600_shader_key' into an union saves
some trivial memory and CPU cycles for the shader keys.

[airlied: drop as_ls, and reorder so larger fields at start.]
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-21 09:46:13 +10:00
Edward O'Callaghan
e2145de74d r600: Rewrite r600_shader_selector_key() to use a switch stmt
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-21 09:46:13 +10:00
Tobias Klausmann
3e6adbd761 nv50/ir: Handle OP_CVT when folding constant expressions
[imirkin: handle more type combinations, use macro]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-20 17:58:30 -04:00
Ilia Mirkin
f5b926183d nvc0/ir: undo more shifts still by allowing a pre-SHL to occur
This happens with unpackSnorm lowering. There's yet another
bitfield-extract behind it, but there's too much variation to be worth
cutting through.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-20 17:58:30 -04:00
Ilia Mirkin
9ebe7dc094 nvc0/ir: don't require AND when the high byte is being addressed
unpackUnorm* lowering doesn't AND the high byte/word as it's
unnecessary. Detect that situation as well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-20 17:58:30 -04:00
Ilia Mirkin
63cb85e567 nvc0/ir: detect i2f/i2i which operate on specific bytes/words
Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.

This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-20 17:58:30 -04:00
Ilia Mirkin
51499bb5ff nvc0/ir: detect AND/SHR pairs and convert into EXTBF
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-20 17:58:30 -04:00
Chih-Wei Huang
2a4af36517 nv50/ir: support different unordered_set implementations
If build with C++11 standard, use std::unordered_set.

Otherwise if build on old Android version with stlport,
use std::tr1::unordered_set with a wrapper class.

Otherwise use std::tr1::unordered_set.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-20 17:58:30 -04:00
Marek Olšák
3b1e283d88 radeonsi: fix a typo as_es -> as_ls in a string
Trivial.
2015-08-19 12:04:51 +02:00
Marek Olšák
5fb0180592 winsys/amdgpu: fix the type of memory usage counters
If the 32-bit types overflowed, the driver could submit an IB that uses much
more memory than is available.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-08-19 12:03:01 +02:00
Marek Olšák
421b809db1 radeonsi: fix indirect indexing of MSAA textures
FMASK wasn't handled correctly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-08-19 12:03:01 +02:00
Jason Ekstrand
f01bdb0484 util/ra: Make allocating conflict lists optional
Since i965 is now using make_reg_conflicts_transitive and doesn't need
q-value computations, they are disabled on i965.  They are enabled
everywhere else so that they get the old behavior.  This reduces the time
spent in eglInitialize() on BDW by around 10-15%.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-18 17:48:53 -07:00
Rob Clark
4a0bea3863 freedreno: use fd_pipe_wait_timeout()
To properly support the case of waiting on a fence with a 0 timeout, we
still need to call down to the kernel.  Which requires the use of the
new fd_pipe_wait_timeout() API.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-18 15:36:30 -04:00
Rob Clark
fd7a14f8dd freedreno: fence fix
Don't take current timestamp/fence from current ring, as we might have
already rolled over to new rb.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-18 15:36:30 -04:00
Neil Roberts
885762e182 Add mesa.icd to the .gitignore
Since 4d7e0fa8c7 this file is generated by the configure script.
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-08-18 12:12:15 -07:00
Grazvydas Ignotas
97f5d00648 radeon/uvd: remove unused variables
Recent commits introduced new unused variable warnings, fix them.

Reviewed-by: Christian König <christian.koenig@amd.com>
2015-08-18 14:11:48 +02:00
Marcos Paulo de Souza
df97126731 nouveau: recognize tess stages in nouveau_compiler
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 23:05:00 -04:00
Marcos Paulo de Souza
723a5a2e68 tgsi: fix parsing of tessellation shader inputs/outputs
Tessellation control shaders write to outputs as OUT[ADDR[0].x][0], make
sure to parse the indirect dimension on outputs.

Also tess control inputs/outputs and tess eval input declarations need
to receive the same treatment as geometry shader inputs.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 23:05:00 -04:00
Marcos Paulo de Souza
a37fa7653b tgsi: set implicit array size for tess stages
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 22:50:16 -04:00
Ilia Mirkin
5af71fb5ac freedreno/a3xx: add s3tc texture format support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 11:38:38 -04:00
Ilia Mirkin
581cbfdec1 freedreno/a3xx: fix up logic for handling block formats
This only appears in cubemaps which have have packed layers, so are very
sensitive to any layout disagreement between sw and hw.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 11:38:38 -04:00
Ilia Mirkin
12e1bf0b68 freedreno/a3xx: double the polygon offset value
A few other drivers do this, fixes the gl-1.4-polygon-offset piglit test

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 11:38:38 -04:00
Ilia Mirkin
1af0641db3 nvc0: implement the color buffer 0 is integer rule for alpha-to-one/cov
The hardware checks for multisampling being enabled, but does not have
the rule about cbuf0 being an integer format. Only enable
alpha-to-one/alpha-to-coverage if cbuf0 is not an integer format.

Fixes piglits
  ext_framebuffer_multisample-int-draw-buffers-alpha-to-one
  ext_framebuffer_multisample-int-draw-buffers-alpha-to-coverage

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 04:21:18 -04:00
Ilia Mirkin
2f5ee9bf27 gk110/ir: fix sched calculator to consider all registers in the ISA
GK110/GK208 have 256 registers, not 64. Find out the number of registers
from the target to avoid unnecessary iteration for pre-GK110.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 02:46:16 -04:00
Ilia Mirkin
ae5cf4f3f7 nvc0: program smooth line width when multisampling is enabled
There are separate line widths for smooth and aliased lines. The smooth
one is selected when multisampling is enabled even if line smoothing
isn't explicitly turned on.

Fixes the ext_framebuffer_multisample-line-smooth piglits

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 01:01:02 -04:00
Ilia Mirkin
884b4df3b6 nvc0: bind a fake tess control program when there isn't one available
Apparently this is necessary in order for tess factors to work in a tess
eval program without a tess control program bound. Probably because it
uses the fake program's shader header to work out the number of patch
constants.

Fixes vs-tes-tessinner-tessouter-inputs

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 01:01:02 -04:00
Ilia Mirkin
f13073b775 gm107/ir: avoid letting the lowering pass get out of sync
There's a lot of functionality duplicated in the gm107 lowering pass
from the nvc0 pass. As that one gets updated, the gm107 one falls
behind. Avoid this by sharing the code.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17 01:01:02 -04:00
Ilia Mirkin
2514c78fba nv50,nvc0: take level into account when doing eng2d multi-layer blits
This fixes arb_get_texture_sub_image-get, and any situation where the 2d
engine was being used for multi-layer blits to a non-0 level.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
2015-08-17 01:01:02 -04:00
Ilia Mirkin
ca628085b6 freedreno/a3xx: add per-texture seamless cubemap control
The default is to enable seamless cubemap filtering, but there's a bit
to turn it off.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-16 03:01:53 -04:00
Ilia Mirkin
b4ace13eea freedreno/a4xx: add cube map array support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-15 14:05:37 -04:00
Rob Clark
868b66fce7 freedreno/a4xx: fix srgb render targets
Also fixes mipmap level generation for srgb textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-15 12:09:06 -04:00
Rob Clark
dd412c8fcb freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-15 12:08:34 -04:00
Ilia Mirkin
d19a98e2e6 freedreno: expose OES exts for float linear filtering
a4xx can do both float and half-float, while a3xx can only do half-float

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-14 20:22:49 -04:00
Ilia Mirkin
d3e23f1ff9 nvc0: disable tessellation on maxwell
The address calculations are all different (e.g. see GP), there appear
to be sync's in programs, and probably a bunch of other differences.
Just disable it for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-14 16:02:26 -04:00
Eric Anholt
bf3c50fba2 vc4: Move all of our fixed function fragment color handling to NIR.
This massively reduces our dependency on VC4-specific optimization passes.

shader-db:
total uniforms in shared programs: 32077 -> 32067 (-0.03%)
uniforms in affected programs:     149 -> 139 (-6.71%)
total instructions in shared programs: 98208 -> 98182 (-0.03%)
instructions in affected programs:     2154 -> 2128 (-1.21%)
2015-08-14 11:39:18 -07:00
Eric Anholt
38c6c0f5b4 vc4: Add a helper for making driver-specific NIR load_uniform for GL state
In order to move more of our lowering into NIR, we need the ability to
reference various pipeline state (like texture rectangle scaling factors
or blend colors), so we just set those up as a load_uniform with a big
offset to indicate that it's not within the shader's uniform storage and
is one of our state values.
2015-08-14 11:39:18 -07:00
Eric Anholt
9e6dc5b64d nir: Add a nir_opt_undef() to handle csels with undef.
We may find a cause to do more undef optimization in the future, but for
now this fixes up things after if flattening.  vc4 was handling this
internally most of the time, but a GLB2.7 shader that did a conditional
discard and assign gl_FragColor in the else was still emitting some extra
code.

total instructions in shared programs: 100809 -> 100795 (-0.01%)
instructions in affected programs:     37 -> 23 (-37.84%)

v2: Use nir_instr_rewrite_src() to update def/use on src[0] (by Thomas
    Helland).
v3: Make sure to flag metadata dirties, and copy the swizzle and abs/neg
    over to src[0], too (by anholt).

Reviewed-by: Thomas Helland <thomashelland90@gmail.com> (v2)
Tested-by: Thomas Helland <thomashelland90@gmail.com> (v2)
2015-08-14 11:39:18 -07:00
Ilia Mirkin
b346a84e27 gm107/ir: indirect handle goes first on maxwell also
Fixes fs-simple-texture-size.shader_test

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
2015-08-14 14:11:44 -04:00
Ilia Mirkin
7ff7d5d799 nv30: add depth bounds test support for hw that has it
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-14 13:05:29 -04:00
Ilia Mirkin
a6bf20d153 nv50: add depth bounds test support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-14 13:05:29 -04:00
Ilia Mirkin
d4087265f6 nvc0: add depth bounds test support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-14 13:05:29 -04:00
Marek Olšák
f47c59322e radeonsi: revert a wrong DB bug workaround for VI
The bug was misunderstood. Besides that, the bug affects a DB feature we
don't use yet.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-08-14 15:02:31 +02:00
Boyuan Zhang
839bf82606 radeon/uvd: implement HEVC support
add context buffer to fix H265 uvd decode issue.
fix H265 corruption issue caused by incorrect assigned ref_pic_list.

v2: disable interlace for HEVC
    add CZ sps flag workaround
    fix coding style

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-08-14 15:02:31 +02:00
Leo Liu
0654a9ca17 radeon/vce: disable VCE dual instance for harvest part
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-08-14 15:02:31 +02:00
Leo Liu
09def7e1e0 radeon/vce: implement VCE dual instance support
VCE dual instances are encoding in parallel, it needs two frames for
encoding with their own parameters in one IB. Master instance will check
the task info to find another frame, assign it to the slave instance

Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-14 15:02:31 +02:00