Commit graph

71368 commits

Author SHA1 Message Date
Jason Ekstrand
587842a0ca anv/gem: Add a helper for getting bit6 swizzling information 2016-01-18 17:21:05 -08:00
Jason Ekstrand
c2a6f4302e nir/spirv: Patch through image qualifiers 2016-01-18 17:21:05 -08:00
Jason Ekstrand
56c8a5f2b8 nir/spirv: Implement ImageQuerySize for storage iamges
SPIR-V only has one ImageQuerySize opcode that has to work for both
textures and storage images.  Therefore, we have to special-case that one a
bit and look at the type of the incoming image handle.
2016-01-18 17:21:05 -08:00
Jason Ekstrand
bb8cadd169 nir/spirv: Insert movs around image intrinsics
Image intrinsics always take a vec4 coordinate and always return a vec4.
This simplifies the intrinsics a but but also means that they don't
actually match the incomming SPIR-V.  In order to compensate for this, we
add swizzling movs for both source and destination to get the right number
of components.
2016-01-18 17:21:05 -08:00
Ilia Mirkin
a31819cff8 nv50/ir: swap the least-ref'd source into src1 when both const/imm
The whole point of inlining sources is to reduce loads. We can end up in
a situation where one value is used a lot of times, and one value is
used only once per instruction. The once-per-instruction one is the one
that should get inlined, but with the previous algorithm, it was given
no preference.

This flips things around to preferring putting less-referenced values
into src1 which increases the likelihood of them being inlined.

While we're at it, adjust the heuristic to not treat 0 as an immediate,
as well as (effectively) check for situations where LIMMs can't be
loaded. All this yields improvements on nvc0:

total instructions in shared programs : 6261157 -> 6255985 (-0.08%)
total gprs used in shared programs    : 945082 -> 943417 (-0.18%)
total local used in shared programs   : 30372 -> 30288 (-0.28%)
total bytes used in shared programs   : 50089256 -> 50047880 (-0.08%)

                local        gpr       inst      bytes
    helped          21         822        3332        3332
      hurt           0         278         565         565

And more importantly avoids generating really bad code with SSBOs, where
we end up checking a lot of different values (usually immediates) against
the length.

On nv50 we get comparable results, and even improve packing (bytes went
down more than instructions):

total instructions in shared programs : 6346564 -> 6341277 (-0.08%)
total gprs used in shared programs    : 728719 -> 725131 (-0.49%)
total local used in shared programs   : 3552 -> 3552 (0.00%)
total bytes used in shared programs   : 43995688 -> 43932928 (-0.14%)

                local        gpr       inst      bytes
    helped           0        1380        3252        3774
      hurt           0         287        1710        1365

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-18 17:52:07 -05:00
Ilia Mirkin
af686e7de3 st/mesa: restore the stObj's size if it was cleared out
An issue could still occur if the base level is set, but fixing that
would require a lot more logic.

This fixes the recently-failing texelFetch 3D tests because the mipmaps
were no longer being generated, which in turn caused the copying logic
to be hit, which in turn didn't work because of the broken
width/height/depth.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-18 17:52:07 -05:00
Jason Ekstrand
6f956b0b22 anv/meta: Improve meta clear cleanup a bit 2016-01-18 14:07:46 -08:00
Jason Ekstrand
45d17fcf9b anv: Misc allocation scope fixes 2016-01-18 14:04:13 -08:00
Jason Ekstrand
378af64e30 anv/meta: Add a meta allocator that uses SCOPE_DEVICE
The Vulkan spec requires all allocations that happen for device creation to
happen with SCOPE_DEVICE.  Since meta calls into other things that allocate
memory, the easiest way to do this is with an allocator.
2016-01-18 14:03:24 -08:00
Rob Clark
805e080ba0 freedreno/a4xx: use smaller threadsize for more registers
Once we go past half of the "GPR" register file, it seems like we need
to run frag shader with smaller threadsize.  (The vertex shader already
runs at TWO_QUADS, which is the minimum.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-18 16:58:25 -05:00
Rob Clark
6062941e4d freedreno: per-generation OUT_IB packet
Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled)
version of the CP_INDIRECT_BUFFER packet.  So allow for PFD vs PFE per
generation.  Switch a3xx and a4xx over to using prefetch-enabled version
(which is also what blob does.. it seems only on a2xx we cannot use
PFE).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-18 16:58:25 -05:00
Jason Ekstrand
3dfa6a881c anv/meta: Initialize a handle to null 2016-01-18 13:05:02 -08:00
Jason Ekstrand
d49298c702 gen8: Fix border color
The border color packet is specified as a 64-byte aligned address relative
to dynamic state base address.  The way the packing functions are currently
set up, we need to provide it with (offset >> 6) because it just shoves the
bits in where the PRM says they go and isn't really aware that it's an
address.
2016-01-18 12:16:31 -08:00
Jason Ekstrand
bfcc744892 genX/pack: Add a __gen_fixed helper and use it for TextureLODBias
The __gen_fixed helper properly clamps the value and also handles negative
values correctly.  Eventually, we need to make the scripts generate this
and use it for more things.
2016-01-18 11:35:04 -08:00
Jason Ekstrand
5a67df2546 anv/pack: Make TextureLODBias a proper 4.8 float
XXX: We need to update the generators so this doesn't get stompped.
2016-01-18 10:36:53 -08:00
Jason Ekstrand
15e6af0708 nir/spirv: Handle if's where the merge is also a break or continue 2016-01-18 10:10:47 -08:00
Jason Ekstrand
14ebd0fdd7 nir/spirv: Hanle continues that use SSA values from the loop body
Instead of emitting the continue before the loop body we emit it
afterwards.  Then, once we've finished with the entire function, we run
nir_repair_ssa to add whatever phi nodes are needed.
2016-01-18 09:43:12 -08:00
Jason Ekstrand
61ba97522e nir/lower_returns: Repair SSA after doing return lowering 2016-01-18 09:43:12 -08:00
Jason Ekstrand
b11825590d nir: Add a pass to repair SSA form 2016-01-18 09:43:12 -08:00
Jason Ekstrand
a7a5e8a2de nir/vars_to_ssa: Use the new nir_phi_builder helper
The efficiency should be approximately the same.  We do a little more work
per phi node because we have to sort the predecessors.  However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.
2016-01-18 09:18:42 -08:00
Jason Ekstrand
8aab4a7bd2 nir: Add a phi node placement helper
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis.  Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.
2016-01-18 09:18:42 -08:00
Jason Ekstrand
b1f1200e80 util/bitset: Allow iterating over const bitsets 2016-01-18 09:18:42 -08:00
Emil Velikov
c03f3dd0a5 gallium: bundle the compat header u_pwr8.h in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Emil Velikov
7bc714509b mapi: include gl.xml in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Emil Velikov
a78e08e88f i965: adding missing headers to the dist tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Christian König
eaf7ec9cfc st/va: add motion adaptive deinterlacing v2
v2: minor cleanup

Signed-off-by: Christian König <christian.koenig@amd.com>
2016-01-18 10:59:32 +01:00
Michel Dänzer
ad20be1f30 gallium/radeon: Rename do_invalidate_resource to invalidate_buffer
And only call it from r600_invalidate_resource for buffer resources.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
0491dd1deb st/dri: Don't call invalidate_resource for NULL depth/stencil buffers
Fixes crash in 4 EGL piglit tests with radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
a9ab7172a6 radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
4297259fc8 radeonsi: Print "LLVM emitted unknown config register" warning only once
Say "LLVM" instead of "Compiler" for clarity.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Oded Gabbay
679a654a77 llvmpipe: use vpkswss when dst is signed
This patch fixes a bug when building a pack instruction.

For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.

This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale

I've also corrected some coding style issues in the function.

v2: Returned else statements to vmware style

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-18 09:45:25 +02:00
Dave Airlie
119bef9543 glsl: fix subroutine lowering reusing actual parmaters
One of the oglconform tests was crashing here, and it was
due to not cloning the actual parameters before creating the
new call. This makes a call clone function that does the right
things to make sure we clone all the needed info, and points
the callee at it. (It differs from ->clone due to this).

this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this
patch in my cts fixes tree, but hadn't had time to make sure I liked it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-18 15:02:34 +10:00
Timothy Arceri
9258d9f23d glsl: remove special case for detecting stream duplicates
Any duplicates in a single declaration will already fail the
generic duplicates test due to the explicit_stream flag being set.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-18 13:09:28 +11:00
Timothy Arceri
eac2cece31 glsl: add missing explicit_stream flag to has_layout()
This will allow the ARB_shading_language_420pack rules in
glsl_parser.yy for catching duplicate layout qualifiers to be
triggered for the stream identifier rather than relying on the
code meant to catch duplicates within a single layout(...)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-18 13:09:16 +11:00
Timothy Arceri
86677f1016 mesa: fix segfault in glUniformSubroutinesuiv()
From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL
4.5 Core spec:

   "The command

       void UniformSubroutinesuiv(enum shadertype, sizei count,
                                  const uint *indices);

   will load all active subroutine uniforms for shader stage
   shadertype with subroutine indices from indices, storing
   indices[i] into the uniform at location i. The indices for
   any locations between zero and the value of
   ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not
   used will be ignored."

V2: simplify NULL check suggested by Jason.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=93731
2016-01-18 11:53:24 +11:00
Timothy Arceri
50376e0c0e glsl: fix segfault linking subroutine uniform with explicit location
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
2016-01-18 11:30:45 +11:00
Ilia Mirkin
4ac1274caa gm107/ir: don't do indirect frag shader inputs on GM107
Apparently the IPA op decided to stop working with offsets. Need to
figure out if we need to do an AL2P situation or something similar. For
now just turn it back off.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-17 16:37:04 -05:00
Ilia Mirkin
3281ae96c8 tgsi: initialize Atomic field in tgsi_default_declaration
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-17 16:37:04 -05:00
Ilia Mirkin
5a81b48ad0 nvc0: bsp_bo can't be null
We already deref it earlier. And these are all allocated on load.
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-17 16:37:04 -05:00
Oded Gabbay
529aa8249a llvmpipe: fix arguments order given to vec_andc
This patch fixes a classic "confuse the enemy" bug.

_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.

_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"

To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-17 21:07:27 +02:00
Rob Clark
02ac91d717 freedreno/ir3: fix mad 3rd src delay calc
In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-17 12:21:45 -05:00
Rob Clark
2a6ec1e061 freedreno/ir3: better array register allocation
Detect arrays which don't conflict with each other and allow overlapping
register allocation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:52 -05:00
Rob Clark
6a33c5c0df freedreno/ir3: array offset can be negative
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], COLOR
  DCL CONST[0..7]
  DCL ADDR[0]
    0: ARL ADDR[0].x, IN[1].xxxx
    1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
    2: DP4 OUT[0].x, CONST[4], IN[0]
    3: DP4 OUT[0].y, CONST[5], IN[0]
    4: DP4 OUT[0].z, CONST[6], IN[0]
    5: DP4 OUT[0].w, CONST[7], IN[0]
    6: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:20 -05:00
Rob Clark
ddede497b8 freedreno/ir3: workaround bug/feature
Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions.  This may be slightly conservative, we may only have
this restriction when the first src is also const.

This fixes, for example, +24/-0 of the variable-indexing piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:22:43 -05:00
Rob Clark
ebd3a1fc17 ttn: use writemask for store_var
Only user is freedreno, and after array-rework it can cope.  Avoids
generating loads for a store.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:52 -05:00
Rob Clark
fad158a0e0 freedreno/ir3: array rework
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:08 -05:00
Rob Clark
cc7ed34df9 freedreno/ir3: refactor/simplify cp
If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:46 -05:00
Rob Clark
680664dff9 freedreno/ir3: fix incorrect decoding of mov instructions
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:37 -05:00
Rob Clark
2809c87f90 freedreno/ir3: remove unused tgsi tokens ptr
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:59 -05:00
Rob Clark
fc0d2f7e02 freedreno/ir3: bit of ra refactor
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:47 -05:00