Commit graph

65652 commits

Author SHA1 Message Date
Rob Clark
ef858ac770 freedreno/ir3: add DDX/DDY
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-09-01 18:08:21 -04:00
Rob Clark
5e5604cc28 freedreno/ir3: don't keep IR around
Once we've assembled the shader, no need to keep the intermediate
around.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-09-01 18:08:21 -04:00
Jason Ekstrand
e8f83538dd i965/fs: Don't segfault when debug-logging a null program
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-09-01 12:33:13 -07:00
Jason Ekstrand
1c573c9adb i965/vec4: Don't segfault when debug-logging a null program
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-09-01 12:31:56 -07:00
Marek Olšák
a10c8db715 radeonsi: implement EXPCLEAR optimization for depth
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:52 +02:00
Marek Olšák
f05fe294e7 r600g,radeonsi: initialize HTILE to fully-expanded state
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:52 +02:00
Marek Olšák
573313c94e radeonsi: implement fast depth clear
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:51 +02:00
Marek Olšák
63cb4077e6 radeonsi: move DB_RENDER_CONTROL into draw_vbo
So that I can add fast depth clear.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:51 +02:00
Marek Olšák
78aa717601 radeonsi: disable occlusion queries if they are not needed
We always left them enabled, which turned off HiZ in some cases.
This should improve performace with Hyper-Z.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:51 +02:00
Marek Olšák
ab9ad91779 r600g,radeonsi: force fast stencil and HTILE stencil off, fixing a Hyper-Z hang
This should be as fast as no HTILE for stencil. I think we can still get full
performance with depth-only rendering even if stencil is present in the buffer
but not used, but I'm not 100% sure. This may be revisited when HiS and fast
stencil clear are implemented.

This fixes a hang in Brutal Legend.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64471

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:51 +02:00
Marek Olšák
ba14d4910c r600g: set VGT_ENHANCE=4 on R7xx
This is a golden setting on RV740, but there is a hw bug which recommends
setting it on all R7xx chipsets.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:49 +02:00
Marek Olšák
13b93596da r600g: expose AMD_vertex_shader_layer and *_viewport_index on R600-R700
already implemented

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:45 +02:00
Marek Olšák
d159c5e3e0 r600g: fix layered clear
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:42 +02:00
Marek Olšák
e6d191bb6f r600g: some DB bug workarounds for R6xx DB flushing
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:40 +02:00
Marek Olšák
0ccc653c70 r600g: enable fast depth clear for array textures and cubemaps
I have a piglit test that hits this.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:37 +02:00
Marek Olšák
6d751065cc r600g: use HTILE allocator from SI
It's almost the same.

This enables tiling for HTILE. It also enables Hyper-Z for other texture
targets (1D, 1D_ARRAY, 2D_ARRAY, CUBE, CUBE_ARRAY, 3D, RECT).

2D array depth textures are tested by Unigine Sanctuary and my new piglit
test.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:33 +02:00
Marek Olšák
ee1b30eaff r600g: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX for EG/CM, inline other fields
This fixes rendering to non-zero layer/face/slice with HTILE.

v2: added the assertion

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:17:40 +02:00
Marek Olšák
91050ff215 radeonsi: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX, inline other fields
This fixes rendering to a non-zero layer/face/slice with HTILE.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72685

v2: added the assertion

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:15:36 +02:00
Glenn Kennard
8d0f6ff810 r600g: Implement sm5 geometry shader instancing
Requires Evergreen or later hardware.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
2014-09-01 21:12:03 +02:00
Marek Olšák
482def592f glsl_to_tgsi: allocate and enlarge arrays for temporaries on demand
This fixes crashes if the number of temporaries is greater than 4096.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66184

v2: added fail paths for realloc failures

Cc: 10.2 10.3 mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-09-01 21:03:58 +02:00
Marek Olšák
b419c651fb gallium/pb_bufmgr_cache: limit the size of cache
This should make a machine which is running piglit more responsive at times.
e.g. streaming-texture-leak can easily eat 600 MB because of how fast it
creates new textures.
2014-09-01 20:17:48 +02:00
Marek Olšák
bba7d29a86 pipe-loader: use the correct screen index 2014-09-01 20:09:19 +02:00
Marek Olšák
0b56e23e7f egl/dri2: use the correct screen index
Required for multi-GPU configuration where each GPU has its own X screen.
2014-09-01 20:09:19 +02:00
Jordan Justen
1a428a5256 docs: Mark ARB_compute_shader as work in progress
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2014-09-01 10:45:37 -07:00
Connor Abbott
d571f2b15d i965/fs: don't use ir->shadow_comparitor in emit_texture_*
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-09-01 00:55:14 -07:00
Connor Abbott
cbfcb1b069 i965/fs: don't pass ir_variable * to emit_samplepos_setup()
We were only using it to get at its type, which we already know because
it's a builtin variable.

Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-09-01 00:12:15 -07:00
Connor Abbott
ec3d06f591 i965/fs: don't pass ir_variable * to emit_frontfacing_interpolation()
We were only using it to get at its type, which we already know because
it's a builtin variable.

v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations.

Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-09-01 00:11:16 -07:00
Kenneth Graunke
70691f0c28 i965: Fix GPU hangs when INTEL_DEBUG=no16 is set.
The replicated data clear shader needs to be SIMD16, or else the GPU
will hang.  So, compile it even if INTEL_DEBUG=no16 is set.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-08-31 17:03:31 -07:00
Emil Velikov
88cbe3908f mesa: fix make tarballs
Current method of generating distribution tar-balls involves manually
invoking make + target name in the appropriate places. This temporary
solution is used until we get 'make dist' working.

Currently it does not work, as in order to have the target (which is
also a filename) available in the final Makefile we need to add a PHONY
target + use the correct target name.

Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-09-01 00:22:20 +01:00
Abdiel Janulgue
5598458e69 i965/vec4: Remove try_emit_saturate
Now that saturate is implemented natively as an instruction,
we can cut down on unneeded functionality.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:09 +03:00
Abdiel Janulgue
cbd225057a i965/fs: Refactor try_emit_saturate
v3: Since the fs backend can emit saturate as a separate instruction, there is
    no need to detect for min/max instructions and to rewrite the instruction tree
    accordingly. On the other hand, we don't need to emit a separate saturated
    mov either when the expression generating src can do saturate directly.
v4: Add can_do_saturate() check before enabling saturate modifer (Ken)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:09 +03:00
Abdiel Janulgue
b2c0c35907 ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
Now that saturate is implemented natively as instruction,
we can cut down on unneeded functionality.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:09 +03:00
Abdiel Janulgue
7841a246b9 i965/vec4: Allow propagation of instructions with saturate flag to sel
When sel conditon is bounded within 0 and 1.0. This allows code as:
        mov.sat a b
        sel.ge  dst a 0.25F

To be propagated as:
        sel.ge.sat dst b 0.25F

v3: - Syntax clarifications in inst->saturate assignment
    - Remove extra parenthesis when assigning src_reg value
      from copy_entry (Matt Turner)
v4: - Take channels into consideration when propagating saturated instructions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:09 +03:00
Abdiel Janulgue
40aeb558ce i965/fs: Allow propagation of instructions with saturate flag to sel
When sel conditon is bounded within 0 and 1.0. This allows code as:
	mov.sat a b
	sel.ge  dst a 0.25F

To be propagated as:
	sel.ge.sat dst b 0.25F

v3: Syntax clarifications in inst->saturate assignment (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:09 +03:00
Abdiel Janulgue
0e2ba3ee82 glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
v2: - Output max(saturate(x),b) instead of saturate(max(x,b))
    - Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is > 0.0 and
      inner constant is 1.0.
    - Fix comments to show that the optimization is a commutative operation
      (Matt Turner)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
d92394c5d8 glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
v2: - Output min(saturate(x),b) instead of saturate(min(x,b)) suggested by Ilia Mirkin
    - Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is zero and
      inner constant is < 1
    - Fix comments to reflect we are doing a commutative operation (Matt Turner)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
8f890b119e glsl: Optimize clamp(x, 0, 1) as saturate(x)
v2: - Check that the base type is float (Ian Romanick)
v3: - Make sure comments reflect that we are doing a commutative operation
    - Add missing condition where the inner constant is 1.0 and outer constant is 0.0
    - Make indexing of operands easier to read (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
cbd0d643a3 glsl: Implement saturate as ir_unop_saturate
Now that we have the ir_unop_saturate implemented as a single
instruction, generate the correct simplified expression.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
cb621166dc yi965/vec4: Add support for ir_unop_saturate
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
4bfe8a1e61 i965/fs: Add support for ir_unop_saturate
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
909fa50f5b ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
cfa8c1cb39 ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
Needed when vertex programs doesn't allow saturate

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
8935c12937 glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
4c0ccfc5b3 glsl: Add constant evaluation of ir_unop_saturate
v2: Use CLAMP macro (Ian Romanick)

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
a5f02b6696 glsl: Add ir_unop_saturate
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-08-31 21:04:08 +03:00
Abdiel Janulgue
f340145107 i965/vec4/fs: Count loops in shader debug
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:04:03 +03:00
Abdiel Janulgue
ddc1d297bc i965/vec4: inline generate_vec4_instruction() within generate_code()
Suggested by Matt. This patch combines and moves back the code-generation
functions from generate_vec4_instruction() into generate_code(). Makes
generate_code() a bit larger, but helps us to count loops in a
straightforward manner.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-08-31 21:03:49 +03:00
Kenneth Graunke
e34a363a78 i965: Add 2x MSAA support to Broadwell fast clear code.
According to the cited documentation section (but in the newer docs),
x_scaledown is the same for 2x and 4x MSAA.

+47 piglits.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83081
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
2014-08-31 01:48:10 -07:00
Matt Turner
8b5ac1df17 i965/vec4: Update register coalescing test.
In commit 04895f5c I added support for reswizzling writemasks. This test
was checking that we didn't support this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82881
2014-08-30 21:00:28 -07:00
Matt Turner
0492275038 i965: Use unreachable() to silence warning.
brw_meta_fast_clear.c:211:17: warning: 'x_scaledown' may be used
uninitialized in this function [-Wmaybe-uninitialized]
    unsigned int x_scaledown, y_scaledown;

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-08-30 21:00:28 -07:00