Commit graph

23210 commits

Author SHA1 Message Date
Marek Olšák
ca90cde81e radeonsi: implement gl_SampleMaskIn
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák
f9fd0c4a55 radeonsi: add support for SQRT
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák
d73c1c1304 radeonsi: add support for FMA
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák
dfea35666e gallium/radeon: don't use LLVMReadOnlyAttribute for ALU
None of the instructions use a pointer argument.
(+ small cosmetic changes)

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-16 12:54:18 +01:00
Marek Olšák
9da9c8e3f4 tgsi: handle bitwise opcodes in tgsi_opcode_infer_type (v2)
v2: set the same types as the destination type in tgsi_exec

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-16 12:54:18 +01:00
Marek Olšák
216543ea54 gallium: add FMA and DFMA opcodes (v3)
Needed by ARB_gpu_shader5.

v2: select DMAD for FMA with double precision
v3: add and select DFMA

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-16 12:54:18 +01:00
Rob Clark
e92bc6b38e freedreno: update generated headers
Fix a3xx texture layer-size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-15 18:00:19 -04:00
Rob Clark
d3fb949c03 freedreno/ir3: remove old compiler
Now that piglit is no longer falling back to old compiler for any tests,
we can remove it.  Hurray \o/

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-15 13:27:03 -04:00
Rob Clark
feb858b788 freedreno/ir3: avoid scheduler deadlock
Deadlock can occur if we schedule an address register write, yet some
instructions which depend on that address register value also depend on
other unscheduled instructions that depend on a different address
register value.  To solve this, before scheduling an address register
write, ensure that all the other dependencies of the instructions which
consume this address register are already scheduled.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-15 13:26:56 -04:00
Rob Clark
7208e96bb8 freedreno/ir3: bit of cleanup
Add an array_insert() macro to simplify inserting into dynamically sized
arrays, add a comment, and remove unused prototype inherited from the
original freedreno.git/fdre-a3xx test code, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-15 13:26:44 -04:00
Ilia Mirkin
620e29b748 freedreno: fix slice pitch calculations
For example if width were 65, the first slice would get 96 while the
second would get 32. However the hardware appears to expect the second
pitch to be 64, based on halving the 96 (and aligning up to 32).

This fixes texelFetch piglit tests on a3xx below a certain size. Going
higher they break again, but most likely due to unrelated reasons.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-03-13 16:05:16 -04:00
Ilia Mirkin
89b26d5a36 freedreno/a3xx: use the same layer size for all slices
We only program in one layer size per texture, so that means that all
levels must share one size. This makes the piglit test

bin/texelFetch fs sampler2DArray

have the same breakage as its non-array version instead of being
completely off, and makes

bin/ext_texture_array-gen-mipmap

start passing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-03-13 16:05:16 -04:00
Brian Paul
558dcd8770 util: convert slab macros to inline functions
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-13 08:03:43 -06:00
Alexandre Demers
a38e6c4fbd gallivm: (trivial) Fix typo in comment introduced by 70dc8a
Fix typo in comment introduced by 70dc8a

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-13 13:52:52 +00:00
Jose Fonseca
70dc8a9930 gallivm: Prevent double delete on LLVM 3.6
std::unique_ptr takes ownership of MM, and a double delete could ensure
in case of an error,  as pointed out by Chris Vine in
https://bugs.freedesktop.org/show_bug.cgi?id=89387

Reviewed-by: Chris Vine <chris@cvine.freeserve.co.uk>
2015-03-12 10:01:09 +00:00
Brian Paul
5376bc74cc st/glx: use strdup() instead of _mesa_strdup()
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-03-11 09:34:24 -06:00
Samuel Pitoiset
e5cd42ed9a nvc0: fix wrong max value for driver queries
The maximum value of a Gallium HUD's panel is automatically adjusted
when the current value is greater than the max. If we set the
pipe_query_driver_info::max_value to UINT64_MAX, the maximum value is
never adjusted and this results in a flat line instead of a pretty curve
which is correctly scaled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-09 20:47:05 -04:00
Alexandre Demers
7a37d5c3a4 r600g: Use R600_MAX_VIEWPORTS instead of 16
Lets define R600_MAX_VIEWPORTS instead of using 16 here and there
in the code when looping through viewports and scissors. It is
easier to understand what this number represents.

v2: Missed a case where R600_MAX_VIEWPORTS should have been used.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-03-09 23:02:05 +01:00
Marek Olšák
c939231e72 r300g: fix sRGB->sRGB blits
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-03-09 21:22:22 +01:00
Marek Olšák
9953586af2 r300g: fix a crash when resolving into an sRGB texture
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-03-09 21:03:49 +01:00
Marek Olšák
113601086d r300g: use memset for clearing the shader key 2015-03-09 20:58:32 +01:00
Marek Olšák
4815c187b7 r300g: remove the broken SNORM->UNORM shader lowering pass
Not used anymore.
2015-03-09 20:58:32 +01:00
Marek Olšák
74a757f92f r300g: fix RGTC1 and LATC1 SNORM formats
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-03-09 20:58:32 +01:00
Stefan Dösinger
f710b99071 r300g: Fix the ATI1N swizzle (RGTC1 and LATC1)
This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01
test as well as the precision part of Wine's 3dc format test (fd.o bug
89156).

The Z component seems to contain a lower precision version of the
result, probably a temporary value from the decompression computation.
The Y and W component contain different data that depends on the input
values as well, but I could not make sense of them (Not that I tried
very hard).

GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in
piglit, and both formats are affected by a compiler bug if they're
sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx,
which returns random garbage.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-03-09 20:58:32 +01:00
Tom Stellard
51b43c559f radeonsi: Add additional information to shader dumps
This adds SGPR count, VGPR count, shader size, LDS size, and scratch
usage to shader dumps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-09 13:53:33 +00:00
Tom Stellard
bbfa1c3239 radeonsi/compute: Use value from compiler for COMPUTE_PGM_RSRC1.FLOAT_MODE
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-09 13:53:33 +00:00
Tom Stellard
a646b00cfc clover: Return the minimum required value for CL_DEVICE_SINGLE_FP_CONFIG v2
This means dropping CL_FP_DENORM from the current return value.

v2:
  - Add comments about minimum values for OpenCL 1.2.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
2015-03-09 13:53:33 +00:00
Ilia Mirkin
cb3eb43ad6 freedreno/ir3: get the # of miplevels from getinfo
This fixes ARB_texture_query_levels to actually return the desired
value.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-09 10:50:39 -04:00
Ilia Mirkin
8ac957a51c freedreno/ir3: fix array count returned by TXQ
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-09 10:50:39 -04:00
Ilia Mirkin
f3dfe6513c freedreno: move fb state copy after checking for size change
Fixes: 1f3ca56b ("freedreno: use util_copy_framebuffer_state()")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-09 10:50:39 -04:00
Rob Clark
fd17db6fe5 freedreno: replace glsl130 debug flag with glsl120
Now that relative-dst works, we should never fall back to the old
compiler.  (Which is almost true, other than a couple edge case sched
fails in piglit).

So replace glsl130 flag to force GLSL 130 and integers on a3xx/a4xx with
a glsl120 flag to force GLSL 120 and !integers.

If this commit breaks any game/app/etc use FD_MESA_DEBUG=glsl120 as a
workaround and please let me know.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
0e8d58b80a gallium/docs: add some freedreno compiler docs
Enable the 'sphinx.ext.graphviz' extension, and add in a section for
driver specific docs, with freedreno compiler docs beneath.  The
goal is for more complete compiler docs, and hopefully some docs about
other parts of the driver (such as how tiling works, etc).

Note that there is also a Distribution -> Drivers section.  Although
that appears to be simply just a list of drivers.  Not sure if that
should move under the 'Drivers' section or left alone.  I did add a
one-line section for freedreno in the existing Distribution -> Drivers
section.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
060d349920 freedreno/ir3: relative dst
To simplify RA, assign arrays that are written to first.  Since enough
dependency information is in the graph to preserve order of reads and
writes of array, so all SSA names for the array collapse into one, just
assign the entire thing by array-id.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
b7703212d8 freedreno/ir3: split out array_fanin() helper
We'll need this too for relative dst..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
17754b70d7 freedreno/ir3: drop deref nodes
The meta-deref instruction doesn't really do what we need for relative
destination.  Instead, since each instruction can reference at most a
single address value, track the dependency on the address register via
instr->address.  This lets us express the dependency regardless of
whether it is used for dst and/or src.

The foreach_ssa_src{_n} iterator macros now also iterates the address
register so, at least in SSA form, the address register behaves as an
additional virtual src to the instruction.  Which is pretty much what
we want, as far as scheduling/etc.

TODO:
For now, the foreach_src{_n} iterators are unchanged.  We could wrap
the address in an ir3_register and make the foreach_src_{_n} iterators
behave the same way.  But that seems unnecessary at this point, since
we mainly care about the address dependency when in SSA form.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
f8f7548f46 freedreno/ir3: helpful iterator macros
I remembered that we are using c99.. which makes some sugary iterator
macros easier.  So introduce iterator macros to iterate all src
registers and all SSA src instructions.  The _n variants also return
the src #, since there are a handful of places that need this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
26b79ac3e4 freedreno/ir3: fix register usage calculations
For cat1 instructions, use reg() as well for relative src, to ensure
proper accounting of register usage.  Also, for relative instructions,
use reg->size rather than reg->wrmask to determine the number of
components read/written.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
3ecc834e75 freedreno/ir3: couple tweaks for cmdline compiler
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
0f797f7b7d freedreno/ir3: split up ssa_dst
And a couple other trivial renames, to prepare for relative dst.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark
27648efa20 freedreno/ir3: fix failed assert in grouping
Turns out there are scenarios where we need to insert mov's in "front"
of an input.  Triggered by shaders like:

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], GENERIC[9]
  DCL SAMP[0]
  DCL TEMP[0], LOCAL
    0: MOV TEMP[0].xy, IN[1].xyyy
    1: MOV TEMP[0].w, IN[1].wwww
    2: TXF TEMP[0], TEMP[0], SAMP[0], 1D_ARRAY
    3: MOV OUT[1], TEMP[0]
    4: MOV OUT[0], IN[0]
    5: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Mark Janes
b28c037d64 r300g: Fix build, invalid extern "C" around header inclusion.
A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in r300 files.

When the helper to detect this issue was pushed to master, it broke
the build for the r300 driver.  This patch fixes the r300 build.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-06 22:08:44 -05:00
Mark Janes
c4b91a1f5c nouveau: Fix build, invalid extern "C" around header inclusion.
A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in nouveau files.

When the helper to detect this issue was pushed to master, it broke
the build for the nouveau driver.  This patch fixes the nouveau build.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-06 22:08:11 -05:00
Ilia Mirkin
20346808cf nv50,nvc0: remove bogus 64_FLOAT formats
There is no HW support for these and the VBO pusher doesn't know about
them. No need to, either, since the st will be lowering them to 2x32.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-06 22:06:05 -05:00
Chia-I Wu
bca6c8572f ilo: clarify valid and preferred tilings
We did it right until the switch to gen_surface_tiling, which has
GEN8_TILING_W.  Generally, GEN8_TILING_W may be valid but not preferred.
2015-03-07 04:32:39 +08:00
Chia-I Wu
bf061a3d2e ilo: clean up Gen6 WAs
Add a help function for each WA and make PIPE_CONTROL flags match the WA
descriptions.  Call gen6_wa_pre_pipe_contro() only before PIPE_CONTROLs.
Fix missing gen6_wa_pre_3dstate_vs_toggle() in the rectlist path.
2015-03-07 02:17:54 +08:00
Chia-I Wu
ba5670fc50 ilo: add generic ilo_render_3dprimitive()
It replaces gen[6-8]_3dprimitive().
2015-03-07 01:45:52 +08:00
Chia-I Wu
8b2eecfbf8 ilo: add generic ilo_render_pipe_control()
It replaces gen[6-8]_pipe_control() and a direct gen6_PIPE_CONTROL() call in
ilo_render_emit_flush().
2015-03-07 01:40:23 +08:00
Chia-I Wu
35b713ad75 ilo: fix padding of linear sampler views
Should use the temporary variable in the loop instead of layout->bo_height.
2015-03-07 01:38:35 +08:00
Chia-I Wu
dda4823844 ilo: do not check for interleaved_samples
interleaved_samples is only zero-initialized when layout_want_mcs() is called.
We should not check for it.  There is also no need to.
2015-03-07 01:38:35 +08:00
Chia-I Wu
ebad062e9a ilo: enable L3 cache in MOCS
This enables L3 cache in MOCS almost everywhere.
2015-03-06 04:50:19 +08:00