Commit graph

66874 commits

Author SHA1 Message Date
Eric Anholt
84caf5a861 vc4: Set unused raddr fields to QPU_R_NOP.
The simulator assertion fails if you have a write to a reg and then a read
(for example, in the NOP side of an instruction), even if the read isn't
used for anything.  By setting unused raddrs to NOP, we avoid the problem
(since only the phsyical registers are tracked).
2014-10-08 17:42:59 +02:00
Eric Anholt
48af7426f2 vc4: Abstract out the field-merging logic for instructions.
I'm going to be doing the same logic for some more fields next.
2014-10-08 17:42:59 +02:00
Niels Ole Salscheider
acdcef6788 r600: Use DMA transfers in r600_copy_global_buffer
v2: Do not demote items that are already in the pool

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
2014-10-07 15:59:43 -04:00
Iago Toral Quiroga
fd31628c49 glsl: Optimize min/max expression trees
Original patch by Petri Latvala <petri.latvala@intel.com>:

Add an optimization pass that drops min/max expression operands that
can be proven to not contribute to the final result. The algorithm is
similar to alpha-beta pruning on a minmax search, from the field of
AI.

This optimization pass can optimize min/max expressions where operands
are min/max expressions. Such code can appear in shaders by itself, or
as the result of clamp() or AMD_shader_trinary_minmax functions.

This optimization pass improves the generated code for piglit's
AMD_shader_trinary_minmax tests as follows:

total instructions in shared programs: 75 -> 67 (-10.67%)
instructions in affected programs:     60 -> 52 (-13.33%)
GAINED:                                0
LOST:                                  0

All tests (max3, min3, mid3) improved.

A full shader-db run:

total instructions in shared programs: 4293603 -> 4293575 (-0.00%)
instructions in affected programs:     1188 -> 1160 (-2.36%)
GAINED:                                0
LOST:                                  0

Improvements happen in Guacamelee and Serious Sam 3. One shader from
Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of
dropping of a (constant float (0.00000)) operand, which was
compiled to a saturate modifier.

Version 2 by Iago Toral Quiroga <itoral@igalia.com>:

Changes from review feedback:
- Squashed various cosmetic changes sent by Matt Turner.
- Make less_all_components return an enum rather than setting a class member.
  (Suggested by Mat Turner). Also, renamed it to compare_components.
- Make less_all_components, smaller_constant and larger_constant static.
  (Suggested by Mat Turner)
- Change mixmax_range to call its limits "low" and "high" instead of
  "range[0]" and "range[1]". (Suggested by Connor Abbot).
- Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by
  Connor Abbot).
- Make the logic more clearer by rearrenging the code and commenting.
  (Suggested by Connor Abbot).
- Added comment to explain why we need to recurse twice. (Suggested by
  Connor Abbot).
- If we cannot prune an expression, do not return early. Instead, attempt
  to prune its children. (Suggested by Connor Abbot).

Other changes:
- Instead of having a global "valid" visitor member, let the various functions
  that can determine this status return a boolean and check for its value
  to decide what to do in each case. This is more flexible and allows to
  recurse into children of parents that could not be prunned due to invalid
  ranges (so related to the last bullet in the review feedback).
- Make sure we always check if a range is valid before working with it. Since
  any use of get_range, combine_range or range_intersection can invalidate
  a range we should check for this situation every time we use any of these
  functions.

Version 3 by Iago Toral Quiroga <itoral@igalia.com>:

Changes from review feedback:
- Now we can make get_range, combine_range and range_intersection static too
  (suggested by Connor Abbot).
- Do not return NULL when looking for the larger or greater constant into
  mixed vector constants. Instead, produce a new constant by doing a
  component-wise minmax. With this we can also remove of the validations when
  we call into these functions (suggested by Connor Abbot).
- Add a comment explaining the meaning of the baserange argument in
  prune_expression (suggested by Connor Abbot).

Other changes:
- Eliminate minmax expressions operating on constant vectors with mixed values
  by resolving them.

No piglit regressions observed with Version 3.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2014-10-07 12:37:51 +02:00
Tapani Pälli
16b53005a7 glsl: do not emit error for non written varyings on OpenGL ES
Patch fixes following test case from 'shaders-with-varyings' WebGL
conformance suite: "vertex shader with unused varying and fragment
shader with used varying must succeed"

v2: emit still a warning if the condition happens (Ian)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-10-07 08:28:51 +03:00
Michel Dänzer
be0a994fb8 radeonsi: Use dummy pixel shader if compilation of the real shader failed
Instead of crashing.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79155#c5
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-10-07 12:07:13 +09:00
Chia-I Wu
f358462640 ilo: let shaders determine surface counts
When a shader needs N surfaces, we should upload N surfaces and not depend on
how many are bound.  This commit is larger than it should be because we did
not export how many surfaces a surface uses before.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
2014-10-06 15:10:30 +08:00
Chia-I Wu
ca824e6940 ilo: let shaders determine sampler counts
When a shader needs N samplers, we should upload N samplers and not depend on
how many are bound.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
2014-10-04 23:18:51 +08:00
Marek Olšák
0c4bc1e292 tgsi: change tgsi_shader_info::properties to a one-dimensional array
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

v2: fix svga too
2014-10-04 15:36:39 +02:00
Marek Olšák
1f6c0b55df radeonsi: set number of userdata SGPRs of GS copy shader to 4
It only needs the constant buffer with clip planes and read-write resources
for the GS->VS ring and streamout. That's 2 pointers.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:15 +02:00
Marek Olšák
68d36c0bb5 radeonsi: pass the GS shader directly to si_generate_gs_copy_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:15 +02:00
Marek Olšák
aeb05f011e radeonsi: set LLVMByValAttribute for all descriptor arrays
I hope this is correct.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:15 +02:00
Marek Olšák
91f1a79f78 radeonsi: make the vertex shader key smaller
We only support 16 vertex attribs, not 32.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
90611297fa radeonsi: don't flush shader caches when building PM4 shader states
This is a wrong place to flush caches to say the least.

I don't think we need to flush the instruction caches if we don't patch
shaders with DMA.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
10e386f4aa radeonsi: remove interp_at_sample from the key, use TGSI_INTERPOLATE_LOC_SAMPLE
st/mesa has the same flag in its shader key, we don't need to do it
in the driver anymore.

Instead, use TGSI_INTERPOLATE_LOC_SAMPLE, which is what st/mesa sets.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
0a2d6f0c4e radeonsi: move geometry shader properties from si_shader to si_shader_selector
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
54de709911 radeonsi: always compile shaders on demand
The first compiled shader is sometimes useless, because the key doesn't match
the key for the draw call where it's used.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
6c9f61c97e radeonsi: remove unused variable si_shader::gs_input_prim
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
7dc0164192 tgsi: remove some not so useful variables from tgsi_shader_info 2014-10-04 15:16:14 +02:00
Marek Olšák
8860584045 radeonsi: get fs_write_all from tgsi_shader_info directly
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
8908fae243 tgsi: simplify shader properties in tgsi_shader_info
Use an array of properties indexed by TGSI_PROPERTY_* definitions.
2014-10-04 15:16:14 +02:00
Marek Olšák
5233568861 radeonsi: get tgsi_shader_info only once before compilation
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
af4f5a7c97 gallium/util: add util_bitcount64
I'll need this in radeonsi.

v2: use __builtin_popcountll if available

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-10-04 15:16:14 +02:00
Marek Olšák
837907b8b3 radeonsi: fix CS tracing and remove excessive CS dumping 2014-10-04 15:16:14 +02:00
Ilia Mirkin
c74be01e80 gk110/ir: add dnz flag emission for fmul/fmad
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
2014-10-03 20:37:59 -04:00
Ilia Mirkin
d58037ccf5 gm107/ir: add dnz emission for fmul
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
2014-10-03 20:37:59 -04:00
Brian Paul
90dc71b454 st/wgl: add WINAPI qualifiers on wgl function typedefs
Fixes a release build segfault when wglCreateContextAttribsARB()
calls the wglCreateContext() function.

Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
2014-10-03 13:45:52 -06:00
Rob Clark
7297bdbd50 freedreno: query fixes
Fixes a few issues, including a potential empty-IB (which triggers gpu
hangs in piglit occlusion_query_meta_no_fragments)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 14:19:52 -04:00
Rob Clark
a262c601d3 freedreno/a3xx: handle VS only outputting BCOLOR
Possibly we should map the front color to black (zeroes).  But not sure
there is a way to do that without generating a shader variant.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 14:19:52 -04:00
Rob Clark
af4d088395 freedreno/ir3: fix lockups with lame FRAG shaders
Shaders like:

  FRAG
  PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
  DCL IN[0], GENERIC[0], PERSPECTIVE
  DCL OUT[0], COLOR
  DCL SAMP[0]
  DCL TEMP[0], LOCAL
  IMM[0] FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
    0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
    1: MOV OUT[0], IMM[0].xyxx
    2: END

cause unhappyness.  They have an IN[], but once this is compiled the
useless TEX instruction goes away.  Leaving a varying that is never
fetched, which makes the hw unhappy.

In the process fix a signed vs unsigned compare.  If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 14:19:52 -04:00
Matt Turner
cabc93c5ad i965/compaction: Disable compaction on SNB temporarily.
Will investigate after XDC.
2014-10-03 10:41:57 -07:00
Matt Turner
0d5c9bf1e4 Revert "i965: Emit ELSE/ENDIF JIP with type D on Gen 7."
This reverts commit 54e30dbf4d.

Will investigate after XDC.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84557
2014-10-03 10:02:24 -07:00
Matt Turner
b59db8e0f0 i965/fs: Remove dead generate_rep_fb_write prototype.
Added in commit f9dc7aab.
2014-10-03 10:02:24 -07:00
Brian Paul
c7f0755caa mesa: fix spurious wglGetProcAddress / GL_INVALID_OPERATION error
On Windows, the Piglit primitive-restart test was failing a
glGetError()==0 assertion when it was run w/out any command line
arguments.  Piglit's all.py script only runs primitive-restart
with arguments so this case isn't normally hit during a full
piglit run.

The basic problem is Microsoft's opengl32.dll calls glFlush
from wglGetProcAddress() and Piglit uses wglGetProcAddress() to
resolve glPrimitiveRestartNV() which is called inside glBegin/End.
See comments in the code for more info.

Plus, improve the comments for _mesa_alloc_dispatch_table().

Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Sinclair Yeh <syeh@vmware.com>
2014-10-03 10:04:48 -06:00
Ilia Mirkin
33c9ad97bf freedreno/ir3: add TXF support
Still failing a bunch of the fairly picky texelFetch tests, but the
1D(Array) ones are full passes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
e6acf3ac24 freedreno/ir3: add TXD support and expose ARB_shader_texture_lod
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
c49107c889 freedreno/ir3: add texture offset support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
5bba74c64b freedreno/ir3: shadow comes before array
Experimentally, this makes *ArrayShadow tex-miplevel-selection tests
pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
81b34e4461 freedreno/ir3: make TXQ return integers, not floats
We're still doing something wrong for array textures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
c4e2a196c3 freedreno/ir3: add UMAD support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
347bc197a6 freedreno/ir3: add ISSG support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
ad5db64e7e freedreno/ir3: add MOD support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
cab3cb1d71 freedreno/ir3: add UMOD support, based on UDIV
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Ilia Mirkin
8f7d01c2cb freedreno/ir3: add IDIV/UDIV support
Logic shamelessly copied from nv50 lowering pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-10-02 23:30:47 -04:00
Michel Dänzer
ed03747e6a radeonsi: Clear sampler view flags when binding a buffer
Fixes assertion failure while running the Unreal Engine 4 Elemental demo:

.../si_blit.c:322:si_decompress_color_textures: Assertion `tex->cmask.size || tex->fmask.size' failed.

Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-10-03 11:15:38 +09:00
Eric Anholt
ca00070259 vc4: Add support for framebuffer sRGB encoding. 2014-10-02 18:29:18 -07:00
Eric Anholt
24d9980562 vc4: Add support for sampling from sRGB.
This isn't perfect -- the filtering is happening on the srgb values, and
we're decoding afterwards, which is not what you want.  I think that's the
cause of some additional texwrap(GL_CLAMP, LINEAR) failures, though many
other texwrap tests on srgb start to pass since unfiltered values come out
correct.
2014-10-02 18:28:45 -07:00
Ilia Mirkin
3dd9a0d6fd freedreno/ir3: avoid fan-in sources referring to same instruction
Since the RA has to be done s.t. each one gets its own (adjacent)
register, it would complicate matters if instructions were allowed to be
repeated. This enables copy-propagation use in situations where
previously that might have happened.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-02 21:05:50 -04:00
Rob Clark
f5eeb8a6dc freedreno/a3xx: emit all immediates in one shot
Makes the command stream a bit tighter when there are lots of
immediates.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-02 21:05:50 -04:00
Ilia Mirkin
be00852bae freedreno: instanced drawing/compute not yet supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-02 21:05:50 -04:00