Commit graph

63861 commits

Author SHA1 Message Date
Brian Paul
971122a9c0 st/mesa: minor simplification of some state atom assignments 2014-07-09 06:43:25 -06:00
Brian Paul
301ffe7b26 st/mesa: minor fix-up in st_GetSamplePosition()
If the driver doesn't implement get_sample_position(), let's return
some non-garbage values.
2014-07-09 06:43:25 -06:00
Brian Paul
91affc8b32 mesa: use float to silence MSVC warning in _mesa_GetMultisamplefv() 2014-07-09 06:43:25 -06:00
Samuel Pitoiset
50bbe49c33 nvc0: allocate more space before a counter is configured
On nvc0, a counter can have up to 6 sources instead of only one
for nve4+. This fixes a crash when a counter uses more than
one source.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-08 19:41:00 -04:00
Tobias Klausmann
a9b21015f5 nv50/ir: use unordered_set instead of list to keep track of var uses
The set of variable uses does not need to be ordered in any way, and
removing/adding elements is a fairly common operation in various
optimization passes.

This shortens runtime of piglit test fp-long-alu to ~22s from ~4h

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-08 19:41:00 -04:00
Kenneth Graunke
503391b46f i965/disasm: Fix disassembly of the any16h/all16h predicates.
BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as
"all16h", and ALL16H would probably print as "(null)".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-08 12:31:01 -07:00
Kenneth Graunke
e13a6406c3 glsl: Fix the foreach_in_list_reverse macro.
We clearly don't want to start at the head and walk backwards; we want
to start at the last real element before the tail sentinel.  If the list
is empty, tail_pred will be the head sentinel, and we'll stop.

Nothing uses this function, so I guess nobody noticed it was broken.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-07-08 12:31:01 -07:00
Marek Olšák
be536efe20 radeonsi: mark MSAA config state as dirty at the beginning of CS
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81020

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-08 20:46:23 +02:00
Marek Olšák
fe6be9926f gallium: fix u_default_transfer_inline_write for textures
This doesn't fix any known issue. In fact, radeon drivers ignore all
the discard flags for textures and implicitly do "discard range"
for any write transfer.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-07-08 20:46:23 +02:00
Matt Turner
cf430408c4 i965: Remove artificial dependency between math instructions.
... on Gen6+. I'm not actually sure which class Gen6 fits into.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-08 11:12:02 -07:00
Matt Turner
099cbc1477 i965/fs: Track dependencies in instruction scheduling per reg offset.
Previously instruction scheduling tracked dependencies on a per-register
basis. This meant that there was an artificial dependency between
interpolation instructions writing into the same virtual register.

Instruction scheduling would insert a number of instructions between the
two instructions in this example, when they are actually independent.

   linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F
   linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F

This lead to cases where the first texture coordinate is interpolated at
the beginning of the shader, but the second is done immediately before
the texture operation that uses it as a source.

After this change, the artificial dependency is removed and the
interpolation instructions are scheduled together.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-08 11:12:02 -07:00
Jon TURNEY
7a641dd58d configure: Don't special case Cygwin to use gnu99, define _XOPEN_SOURCE instead
Revert "build: Build on Cygwin with gnu99 instead of c99." and define
_XOPEN_SOURCE appropriately.

This reverts commit 53e36d333c.

Since Cygwin 1.7.18 (April 2013), it's headers correctly prototype strtoll()
when using -std=c99, and correctly prototype strdup() when _XOPEN_SOURCE is
defined appropriately, so this workaround is no longer needed.

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Cc: Vinson Lee <vlee@freedesktop.org>
2014-07-08 14:25:21 +01:00
Chia-I Wu
8ff16111ee ilo: fix fence reference counting
The old code was complicated, and was wrong when *ptr is NULL.
2014-07-08 15:00:36 +08:00
Kristian Høgsberg
bbefb15e01 i965: Extend compute-to-mrf pass to understand blocks of MOVs
The current compute-to-mrf pass doesn't handle blocks of MOVs.  Shaders
that end with a texture fetch follwed by an fb write are left like this:

0x00000000: pln(8)          g6<1>F          g4<0,1,0>F      g2<8,8,1>F      { align1 WE_normal 1Q compacted };
0x00000008: pln(8)          g7<1>F          g4.4<0,1,0>F    g2<8,8,1>F      { align1 WE_normal 1Q compacted };
0x00000010: send(8)         g2<1>UW         g6<8,8,1>F
                            sampler (1, 0, 0, 1) mlen 2 rlen 4              { align1 WE_normal 1Q };
0x00000020: mov(8)          g113<1>F        g2<8,8,1>F                      { align1 WE_normal 1Q compacted };
0x00000028: mov(8)          g114<1>F        g3<8,8,1>F                      { align1 WE_normal 1Q compacted };
0x00000030: mov(8)          g115<1>F        g4<8,8,1>F                      { align1 WE_normal 1Q compacted };
0x00000038: mov(8)          g116<1>F        g5<8,8,1>F                      { align1 WE_normal 1Q compacted };
0x00000040: sendc(8)        null            g113<8,8,1>F
                            render ( RT write, 0, 4, 12) mlen 4 rlen 0      { align1 WE_normal 1Q EOT };

This patch lets compute-to-mrf recognize blocks of MOVs and match them to
instructions (typically SEND) that writes multiple registers.  With this,
the above shader becomes:

0x00000000: pln(8)          g6<1>F          g4<0,1,0>F      g2<8,8,1>F      { align1 WE_normal 1Q compacted };
0x00000008: pln(8)          g7<1>F          g4.4<0,1,0>F    g2<8,8,1>F      { align1 WE_normal 1Q compacted };
0x00000010: send(8)         g113<1>UW       g6<8,8,1>F
                            sampler (1, 0, 0, 1) mlen 2 rlen 4              { align1 WE_normal 1Q };
0x00000020: sendc(8)        null            g113<8,8,1>F
                            render ( RT write, 0, 20, 12) mlen 4 rlen 0     { align1 WE_normal 1Q EOT };

which is the bulk of the shader db results:

total instructions in shared programs: 987040 -> 986720 (-0.03%)
instructions in affected programs:     844 -> 524 (-37.91%)
GAINED:                                0
LOST:                                  0

The optimization also applies to MRT shaders that write the same
color value to multiple RTs, in which case we can eliminate four MOVs in
a similar fashion.  See fbo-drawbuffers2-blend in piglit for an example.

No measurable performance impact.  No piglit regressions.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
2014-07-07 23:39:40 -07:00
Ilia Mirkin
8aa34dc9cb nvc0/ir: fill offset in properly for TXD
Apparently TXD wants its offset differently than TEX, accepting it in
the upper bits of the layer index. Unclear what happens when this is
combined with indirect sampler indexing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-07-08 00:14:33 -04:00
Ilia Mirkin
114d46829d nvc0/ir: use manual TXD when offsets are involved
Something about how we're implementing offsets for TXD is wrong, just
flip to the generic quadop-based implementation in that case.

This is the minimal fix appropriate for backporting.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2014-07-08 00:14:33 -04:00
Ilia Mirkin
afea9bae67 nvc0/ir: do quadops on the right texture coordinates for TXD
handleTEX moves the layer as the first argument. This makes sure that
the quadops deal with the texture coordinates.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2014-07-08 00:14:33 -04:00
Ilia Mirkin
1065aa92f4 nv50/ir: ignore bias for samplerCubeShadow on nv50
Unfortunately there's no good way to do this on the nv50 shader isa.
Dropping the bias seems preferable to doing the compare post-filtering.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2014-07-08 00:14:33 -04:00
Ilia Mirkin
30d91e0eec nv50/ir: retrieve shadow compare from first arg
This can only happen with texture(samplerCubeShadow, bias), where the
compare will be in the first argument.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2014-07-08 00:14:33 -04:00
Carl Worth
9007c4f9f4 docs: Import 10.2.3 release notes
And add a news item.
2014-07-07 16:28:37 -07:00
Matt Turner
f6db414f3c i965/fs: Disable unlit_centroid_workaround on Haswell.
Although the HSW PRM shows it, the BSpec lists this workaround as being
for Ivybridge only.

total instructions in shared programs: 1994951 -> 1993675 (-0.06%)
instructions in affected programs:     27325 -> 26049 (-4.67%)
2014-07-06 18:19:17 -07:00
Matt Turner
6f7c4a8d05 i965/vec4: Perform CSE on CMP(N) instructions.
Port of commit b16b3c87 to the vec4 code.

No shader-db improvements, but might as well. The fs backend saw an
improvement because it's scalar and multiple identical CMP instructions
were generated by the SEL peepholes.
2014-07-06 18:19:15 -07:00
Matt Turner
7921bf0062 i965/vec4: Don't emit null MOVs in CSE.
Port of commit 219b43c6 to the vec4 code.
2014-07-06 18:18:52 -07:00
Matt Turner
949991cc99 i965/vec4: Improve CSE performance by expiring some available expressions.
Port of commit 5daf867f to the vec4 code.
2014-07-06 18:18:52 -07:00
Kenneth Graunke
3c8dc48ad1 i965/vec4: Add basic common subexpression elimination.
[mattst88]: Modified to perform CSE on instructions with
            the same writemask. Offered no improvement before.

total instructions in shared programs: 1995633 -> 1995185 (-0.02%)
instructions in affected programs:     14410 -> 13962 (-3.11%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-06 18:18:51 -07:00
Matt Turner
848fc7f710 i965: Fix warnings introduced in commit e24ef5ab.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-06 18:15:36 -07:00
Christian König
042b061fef gallium/radeon: use PRIX64 instead of PRIu64
We want hex values here, not decimals.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-07-06 13:28:04 +02:00
Matt Turner
1580865a8c i965: Move assembly annotation functions to intel_asm_annotation.c.
It's C. Compile it as such.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
423932791d i965: Rename intel_asm_printer -> intel_asm_annotation.
The #ifndef include guards already said the right thing :)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
6d3e24a5c2 i965: Make backend_instruction usable from C.
With a hack to place an exec_node in the struct in C to be at the same
location as the inherited exec_node in C++.

Acked-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
0db30fcf89 i965/cfg: Make cfg_t usable from C.
Acked-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
857c06236c i965: Repack backend_instruction struct.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
ce706b4a9b i965: Make a brw_predicate enum.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
46e5b2a497 i965: Make a brw_conditional_mod enum.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
ab74a42eef i965: Move common fields into backend_instruction.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
3de11cacf0 i965: Use enum brw_reg_type for register types.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
34ef6a7651 i965: Move is_zero/one/null/accumulator into backend_reg.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
c019105f37 i965: Make a common backend_reg class.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:30 -07:00
Matt Turner
9377b189f7 i965: Drop imm union from visitor register classes.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:29 -07:00
Matt Turner
53992a102f i965: Use immediate storage in brw_reg for visitor regs.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-05 22:42:29 -07:00
Andreas Boll
45446efc30 docs: add news item for mesa-demos 8.2.0 release 2014-07-05 11:32:54 +02:00
Chris Forbes
4087d9ec0b glsl: Fix merging of layout(invocations) with other qualifiers
If another layout qualifier appeared to the left of `invocations` in the
GS input layout declaration, the invocation count would be dropped on
the floor.

Fixes the piglit tests:

spec/ARB_transform_feedback3/arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
spec/ARB_gpu_shader5/arb_gpu_shader5-invocation-id
spec/ARB_gpu_shader5/compiler/correct-multiple-layout-qualifier-invocations.geom
spec/ARB_gpu_shader5/execution/invocations-conflicting

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2014-07-05 09:42:17 +12:00
Ilia Mirkin
9a37eb8adb nvc0: add a memory barrier when there are persistent UBOs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
2014-07-03 20:08:41 -04:00
Ilia Mirkin
5d4f5218bb nv50: do an explicit flush on draw when there are persistent buffers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
2014-07-03 20:01:07 -04:00
Ilia Mirkin
b2b7c65122 nv50: disable dedicated ubo upload method
The hardware allows multiple simultaneous renders with the same
memory-backed constbufs but with each invocation having different
values. However in order for that to work, the data has to be streamed
in via the right constbuf slot. We weren't doing that for UBOs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.1" <mesa-stable@lists.freedesktop.org>
2014-07-03 20:01:06 -04:00
Ilia Mirkin
32b71246e7 gallium: rename PIPE_CAP_TGSI_VS_LAYER to also have _VIEWPORT
Now that this cap is used to determine the availability of both, adjust
its name to reflect the new reality.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-07-03 19:39:25 -04:00
Ilia Mirkin
0fb6f1bf1d mesa/st: enable AMD_vertex_shader_viewport_index
The assumption is that any driver capable of emitting layer from the
vertex shader and supporting viewports should be able to also handle
emitting viewport index from the vertex shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tobias Droste <tdroste@gmx.de>
2014-07-03 19:39:25 -04:00
Ilia Mirkin
313acb3ffa r600g: allow vs to write to gl_ViewportIndex
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tobias Droste <tdroste@gmx.de>
2014-07-03 19:39:25 -04:00
Thomas Hellstrom
556a415033 svga: Don't unnecessarily reemit BindGBShader commands v2
The Linux winsys can no longer relocate shader code, so avoid
reemitting BindGBShader commands. They are costly.

v2: Correctly handle errors from SVGA3D_BindGBShader()

Reported-by: Michael Banack <banackm@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
2014-07-03 22:26:00 +02:00
Aaron Watry
824197efd5 radeon/llvm: Allocate space for kernel metadata operands
Previously, we were assuming that kernel metadata nodes only had 1 operand.

Kernels which have attributes can have more than 1, e.g.:
!0 = metadata !{void (i32 addrspace(1)*)* @testKernel, metadata !1}
!1 = metadata !{metadata !"work_group_size_hint", i32 4, i32 1, i32 1}

Attempting to get the kernel without the correct number of attributes led
to memory corruption and luxrays crashing out.

Fixes the cl/program/execute/attributes.cl piglit test.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76223
CC: "10.2" <mesa-stable@lists.freedesktop.org>
2014-07-03 15:18:03 -05:00