Commit graph

27982 commits

Author SHA1 Message Date
Samuel Pitoiset
cb828b7b18 gm107/ir: fix sign bit emission for FADD32I
When emitting OP_SUB, the sign bit for FADD and FADD32I is not
at the same position. It's at position 45 for FADD but 51 for FADD32I.

This fixes the following piglit test:
tests/spec/arb_fragment_program/fdo30337b.shader_test

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-07-05 01:44:08 +02:00
Eric Anholt
ac772b24a1 vc4: Regularize instruction emit macros
ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way
ALU1 did.  This should make the ALU[012] macros much clearer, by moving
most of their contents to vc4_qir.c
2016-07-04 16:33:22 -07:00
Eric Anholt
8a52f03f5d vc4: Enable dead CF elimination.
Now that we're about to start generating control flow in our NIR, we want
this in place.  It optimizes things frequently in the CS, when the GL VS
has control flow that doesn't affect the vertex position.
2016-07-04 16:33:22 -07:00
Eric Anholt
8f2af4763a vc4: Optimize out redundant SF updates.
Tiny change on shader-db currently, but it will be important when we start
emitting a lot of SFs from the same variable as part of control flow
support.

total instructions in shared programs: 89463 -> 89430 (-0.04%)
instructions in affected programs:     1522 -> 1489 (-2.17%)
total estimated cycles in shared programs: 250060 -> 250015 (-0.02%)
estimated cycles in affected programs:     8568 -> 8523 (-0.53%)
2016-07-04 16:33:22 -07:00
Eric Anholt
200b4e4bd5 vc4: Move SF removal to a separate peephole pass.
The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling.  We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.

No change on shader-db.
2016-07-04 16:33:22 -07:00
Eric Anholt
aa76ba6f2f vc4: DCE instructions with a NULL destination.
I'm going to add an optimization for redundant SF update removal, which
will just remove the SF and leave us (in many cases) with an instruction
with a NULL destination and no side effects.  Rather than teaching that
pass whether the whole instruction can be removed, leave that
responsibility to this pass.
2016-07-04 16:33:22 -07:00
Eric Anholt
2a8973fb78 vc4: Mark texturing setup instructions as having side effects.
We need to not DCE them even though they don't have a destination in QIR.
We also shouldn't relocate them in vc4_opt_vpm.  Neither of these things
happen, but I'm about to make DCE consider instructions with a NULL
destination.
2016-07-04 16:33:22 -07:00
Eric Anholt
44df374a9c vc4: Fix a pasteo in scheduling condition flag usage.
Noticed by code inspection.  This hasn't been too big of a deal, because
our cond usages all start out as adder ops, either MOVs or the FTOI for Z
writes.  MOVs *can* get converted to mul ops during scheduling, but
apparently we hadn't hit this.
2016-07-04 16:33:22 -07:00
Eric Anholt
eaa53f80d9 vc4: Drop the dead QIR_PACK() macro.
This isn't used since we switched to using the dst.pack field instead of
custom instructions.
2016-07-04 16:33:18 -07:00
Marek Olšák
5c92c21369 radeonsi: do compilation from si_create_shader_selector asynchronously
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.

This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.

If an app creates more shaders, at most 4 threads will be used to compile
them.

Debug output disables this for shader stats to be printed in the correct
order.

(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
84824935cf radeonsi: don't lock shader cache mutex during compilation
to allow multiple shaders to be compiled simultaneously.

ALso, shader-db can again use all 4 cores.

v2: Remove the pipe_mutex_unlock call in the error path.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-07-05 00:47:13 +02:00
Marek Olšák
850cd953b1 radeonsi: separate the compilation chunk of si_create_shader_selector
The function interface is ready to be used by util_queue.
Also, si_shader_select_with_key can no longer accept si_context.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
6781a2a994 radeonsi: move LLVMTargetMachineRef creation to a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
8a4ace4a47 gallium/radeon: add and use radeon_info::max_alloc_size (v2)
v2: - squashed the patches
    - use INT_MAX
    - clamp max_const_buffer_size
    - check the DRM version in radeon

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-07-05 00:47:13 +02:00
Marek Olšák
027ad71b57 radeonsi: print LLVM IRs to ddebug logs
Getting LLVM IRs of hanging shaders have never been easier.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
28a03be06b radeonsi: enable string markers and record apitrace call numbers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
642cf400aa ddebug: add an option to dump info about a specific apitrace call
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
1daec2b795 ddebug: implement pipe_context::generate_mipmap
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
50b2235478 ddebug: record and dump apitrace call numbers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
861ecf1ca9 ddebug: implement emit_string_marker
and remove some obsolete comments

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
a446c40e0a gallium/radeon: remove unused code - radeon_llvm_util.*
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
eaccc4e8c8 radeonsi: keep using v_rcp_f32 for division in future LLVM (v2)
This will be needed after some LLVM changes that haven't landed yet.

v2: - use LLVMIsConstant to fix an LLVM assertion failure.
      LLVMSetMetadata doesn't work with constants.
    - don't set float metadata as string

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
1c00086746 radeonsi: remove an obsolete comment
It's not true.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
4d1f32376d radeonsi: don't interpolate colors if flatshading is enabled
use v_interp_mov for those

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
4accb02d7a radeonsi: enable the barycentric optimization in all cases
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
476e9cee1d radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled
This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
a675c6a000 radeonsi: split ps.prolog.force_persample_interp into persp and linear bits
This reduces the number of v_mov's in the prolog.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
61010cfac0 radeonsi: don't dump the shader key for non-monolithic shaders early
It's always zero.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Jan Vesely
015e2e0fce r600g: Add double precision FMA ops
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782
Fixes: 54c4d525da ("r600g: Enable FMA on chips that support it")

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: James Harvey <lothmordor@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-07-05 00:47:12 +02:00
Francesco Ansanelli
9827fc3f03 r600: fix duplicate 'const' declaration
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-04 21:26:31 +02:00
Indrajit Das
51227b41c6 radeon/uvd: fix overflow error while calculating bit stream buffer size
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-04 11:38:05 +02:00
Rob Clark
7295428e41 freedreno: fix crash on smaller gpus and higher resolutions
Devices with smaller GMEM size need more tiles.  On db410c at 2048x1152,
glmark2 shadow needed ~330 tiles for fullscreen.  Lets bump it up to
512.  (Maybe with MRT you could end up needing more, but at that point
things are probably going to be painfully slow.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-03 11:16:28 -04:00
Rob Clark
f78a6b1ce3 glsl: add driconf to zero-init unintialized vars
Some games are sloppy.. perhaps because it is defined behavior for DX or
perhaps because nv blob driver defaults things to zero.

So add driconf param to force uninitialized variables to default to zero.

This issue was observed with rust, from steam store.  But has surfaced
elsewhere in the past.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-02 09:00:19 -04:00
Rob Clark
202710d110 freedreno/ir3: support glsl linking for cmdline compiler
For .vert/.frag, now multiple can be specified on the cmdline for
purposes of linking, and the last one specified is the one that is
fed into the ir3 backend (and dumped along the way if --verbose is
specified)

Without this, varyings in frag shaders would appear as undefined.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-07-02 09:00:19 -04:00
Rob Clark
1759eb1d19 freedreno: update valid_buffer_range for SO buffers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Rob Clark
da39ac9c51 freedreno/ir3: support non-user_buffer consts
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Rob Clark
2081c1ecc0 freedreno/a2xx: move setup/restore cmds into binning pass
Rather than doing a separate submit at context create, move these cmds
to before first tile, as is done on a3xx/a4xx.  Otherwise state can
be overwritten by other contexts.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Rob Clark
2c3b54c278 freedreno: pass index buffer as a pipe_resource
This will be useful in a following patch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Rob Clark
88cc11e971 freedreno: switch emit_const_bo() to take prsc's
We can push the unwrap of pipe_resource down.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Hans de Goede
d7dfd4cb51 nv30: Fix "array subscript is below array bounds" compiler warning
gcc6 does not like the trick where we point to one entry before the
array start and then start a while with a pre-increment.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-02 12:21:28 +02:00
Hans de Goede
110ef733dc nouveau: Fix a couple of "foo may be used uninitialized' compiler warnings
These are all new false positives with gcc6.

In nouveau_compiler.c: gcc6 no longer assumes that passing a pointer
to a variable into a function initialises that variable.

In nv50_ir_from_tgsi.cpp op and mode are not set if there are 0
enabled dst channels, this never happens, but gcc cannot know this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-02 12:21:28 +02:00
Hans de Goede
1f3c8f3664 nouveau: Fix gcc6 / c++11 auto_ptr deprecation compiler warnings
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Hans de Goede
2aa1197eee nouveau: Add support for SV_WORK_DIM
Add support for SV_WORK_DIM for nvc0 and nve4.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Hans de Goede
3345f70f63 nvc0: Make NVC0_CB_AUX_GRID_INFO take an index argument
This brings it inline with the other macros like NVC0_CB_AUX_UBO_INFO
and NVC0_CB_AUX_TEX_INFO.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Hans de Goede
ef8e50a841 clover: Pass work_dim parameter of clEnqueueNDRangeKernel() to driver
In order to implement get_work_dim() the driver may need to know the
clEnqueueNDRangeKernel() work_dim parameter, so pass it to the driver.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Hans de Goede
d386cef246 tgsi: Add WORK_DIM System Value
Add a new WORK_DIM SV type, this is will return the grid dimensions
(1-4) for compute (opencl) kernels.

This is necessary to implement the opencl get_work_dim() function.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Ilia Mirkin
b2b5075e04 nvc0: fix up image support for allowing multiple samples
Basically we just have to scale up the coordinates and then add the
relevant sample offset. The code to handle this was already largely
present from Christoph's earlier attempts to pipe images through back in
the dark ages, this just hooks it all up.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-01 23:01:02 -04:00
Nayan Deshmukh
de772bc060 st/vdpau: use bicubic filter for scaling(v6.1)
use bicubic filtering as high quality scaling L1.

v2: fix a typo and add a newline to code
v3: -render the unscaled image on a temporary surface (Christian)
    -apply noise reduction and sharpness filter on
     unscaled surface
    -render the final scaled surface using bicubic
     interpolation
v4: support high quality scaling
v5: set dst_area and dst_clip in bicubic filter
v6: set buffer layer before setting dst_area
v6.1: add PIPE_BIND_LINEAR when creating resource

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-01 12:54:58 +02:00
Nayan Deshmukh
872dd9ad15 vl: add a bicubic interpolation filter(v5)
This is a shader based bicubic interpolater which uses cubic
Hermite spline algorithm.

v2: set dst_area and dst_clip during scaling (Christian)
v3: clear the render target before rendering
v4: intialize offsets while initializing shaders
    use a constant buffer to send dst_size to frag shader
    small changes to reduce calculation in shader
v5: send half pixel offset instead of sending dst_size

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-01 12:54:33 +02:00
Ilia Mirkin
51ca57df01 nv30: go back to not using viewport validate function for swtnl
The output of draw requires a null viewport transform, which the regular
code is ill-equiped to do. Reinstate the original settings in the render
path, and add setting of the viewport clip polygon based on fb
width/height (as that is all taken care of by draw).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-01 01:04:10 -04:00