Commit graph

66693 commits

Author SHA1 Message Date
Kenneth Graunke
169b6c1955 i965/vs: Handle vertex color clamping in emit_urb_slot().
Vertex color clamping only applies to a few specific built-ins: COL0/1
and BFC0/1 (aka gl_[Secondary]{Front,Back}Color).  It seems weird to
handle special cases in a function called emit_generic_urb_slot().

emit_urb_slot() is all about handling special cases, so it makes more
sense to handle this there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-02 17:00:26 -08:00
Kenneth Graunke
793ac67d3d i965: Use the enum type for gen6_gather_wa sampler key field.
Requested by Matt Turner.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-02 17:00:26 -08:00
Kenneth Graunke
e5e466c954 i965: Drop use of GL types in program keys.
This is really far removed from the API; we should just use C types.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-02 17:00:26 -08:00
Kenneth Graunke
a64f3ba3d1 i965: Move program key structures to brw_program.h.
With fs_visitor/fs_generator being reused for SIMD8 VS/GS programs,
we're running into weird #include patterns, where scalar code #includes
brw_vec4.h and such.

Program keys aren't really related to SIMD4X2/SIMD8 execution - they
mostly capture NOS for a particular shader stage.  Consolidating them
all in one place that's vec4/scalar neutral should help avoid problems.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-02 17:00:26 -08:00
Kenneth Graunke
5f34a18f96 i965: Delete brw_state_flags::cache and related code.
It's been merged into brw_state_flags::brw for simplicity and
efficiency.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-02 17:00:26 -08:00
Kenneth Graunke
4f24c168c8 i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).
I put the BRW_NEW_*_PROG_DATA flags at the beginning so that
brw_state_cache.c can still continue using 1 << brw_cache_id.

I also added a comment explaining the difference between
BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time
to remember it.

Non-mechanical changes:
- brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache.
- brw_state_upload.c - INTEL_DEBUG=state changes.
- brw_context.h - bit definition merging.

v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention
    state-based recompiles, and nix the "proper subset" claim,
    as it's false. (Caught by Kristian Høgsberg).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-02 17:00:26 -08:00
Kenneth Graunke
ce44b2061c i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.
Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only
ones that are left are legitimately related to the program cache.  Yet,
it seems a bit wasteful to have an entire bitfield for only 7 bits.

State upload is one of the hottest paths in the driver.  For each atom
in the list, we call check_state() to see if it needs to be emitted.
Currently, this involves comparing three separate bitfields (mesa, brw,
and cache).  Consolidating the brw and cache bitfields would save a
small amount of CPU overhead per atom.  Broadwell, for example, has
57 state atoms, so this small savings can add up.

CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the
offset into the program cache BO (prog_offset).  Since most uses refer
to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name.

Removing "cache" completely is a bit painful, so I decided to do it in
several patches for easier review, and to separate mechanical changes
from manual ones.  This one simply renames things, and was made via:

$ for file in *.[ch]; do
      sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \
             -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file
  done

Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw!
The next patch will remedy this flaw.  It will also fix the
alphabetization issues.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>
2014-12-02 17:00:26 -08:00
Kenneth Graunke
2a4f5728ad i965: Remove "disable_derivative_optimization" driconf option.
This was added in September 2013 when we first implemented the fast
(but lower quality) derivatives.  A quick Google search didn't turn
up anyone using or recommending the option, so I suspect no one does.

Applications that want to control the quality of their derivatives can
use the new GL_ARB_derivative_control extension, or use the glHint
mechanism.  The driconf option seems superfluous.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-02 17:00:26 -08:00
Ian Romanick
0391d1bbea i965: Just return void from brw_try_draw_prims
Note from Ken:

    "We used to use the return value to indicate whether software
    fallbacks were necessary, but we haven't in years."

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-02 12:16:28 -08:00
Ian Romanick
9fd398215d mesa: Use current Mesa coding style in check_valid_to_render
This makes some others patches (still in my local tree) a bit cleaner.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-02 12:16:28 -08:00
Ian Romanick
331b0120d1 mesa: Use unreachable instead of assert in check_valid_to_render
This is generally the prefered style these days.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-02 12:16:28 -08:00
Ian Romanick
304c466bd8 mesa: Silence unused parameter warnings in _mesa_validate_Draw functions
../../src/mesa/main/api_validate.c: In function '_mesa_validate_DrawElements':
../../src/mesa/main/api_validate.c:376:37: warning: unused parameter 'basevertex' [-Wunused-parameter]
../../src/mesa/main/api_validate.c: In function '_mesa_validate_MultiDrawElements':
../../src/mesa/main/api_validate.c:394:65: warning: unused parameter 'basevertex' [-Wunused-parameter]
../../src/mesa/main/api_validate.c: In function '_mesa_validate_DrawRangeElements':
../../src/mesa/main/api_validate.c:452:35: warning: unused parameter 'basevertex' [-Wunused-parameter]
../../src/mesa/main/api_validate.c: In function '_mesa_validate_DrawArrays':
../../src/mesa/main/api_validate.c:473:25: warning: unused parameter 'start' [-Wunused-parameter]
../../src/mesa/main/api_validate.c: In function '_mesa_validate_DrawElementsInstanced':
../../src/mesa/main/api_validate.c:590:44: warning: unused parameter 'basevertex' [-Wunused-parameter]

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-02 12:16:28 -08:00
Ian Romanick
5e72886db0 mesa: Refactor common validation code to validate_DrawElements_common
Most of the code in _mesa_validate_DrawElements,
_mesa_validate_DrawRangeElements, and
_mesa_validate_DrawElementsInstanced was the same.  Refactor this out to
common code.

As a side-effect, a bug in _mesa_validate_DrawElementsInstanced was
fixed.  Previously this function would not generate an error when
check_valid_to_render failed if numInstances was 0.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-02 12:16:28 -08:00
Ian Romanick
b93dcb0e71 mesa: Generate GL_INVALID_OPERATION when drawing w/o a VAO in core profile
GL 3-ish versions of the spec are less clear that an error should be
generated here, so Ken (and I during review) just missed it in 1afe335.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-02 12:16:28 -08:00
Brian Paul
4e6244e80f mesa: fix height error check for 1D array textures
height=0 is legal for 1D array textures (as depth=0 is legal for
2D arrays).  Fixes new piglit ext_texture_array-errors test.

Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2014-12-02 10:00:03 -07:00
Jan Vesely
ca0616f17e r600, llvm: Fix mem leak
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-12-02 11:30:13 -05:00
EdB
745b1f5503 clover: clCompileProgram CL_INVALID_COMPILER_OPTIONS
clCompileProgram should return CL_INVALID_COMPILER_OPTIONS
instead of CL_INVALID_BUILD_OPTIONS

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2014-12-02 11:05:03 -05:00
Eric Anholt
29c7cf2b2b vc4: Pair up QPU instructions when scheduling.
We've got two mostly-independent operations in each QPU instruction, so
try to pack two operations together.  This is fairly naive (doesn't track
read and write separately in instructions, doesn't convert ADD-based MOVs
into MUL-based movs, doesn't reorder across uniform loads), but does show
a decent improvement on shader-db-2.

total instructions in shared programs: 59583 -> 57651 (-3.24%)
instructions in affected programs:     47361 -> 45429 (-4.08%)
2014-12-01 22:29:42 -08:00
Dave Airlie
7b0067d23a r600g/sb: fix issues cause by GLSL switching to loops for switch
Since 73dd50acf6
glsl: implement switch flow control using a loop

The SB backend was falling over in an assert or crashing.

Tracked this down to the loops having no repeats, but requiring
a working break, initial code just called the loop handler for
all non-if statements, but this caused a regression in
tests/shaders/dead-code-break-interaction.shader_test.
So I had to add further code to detect if all the departure
nodes are empty and avoid generating an empty loop for that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86089
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-12-02 13:57:27 +10:00
Rob Clark
036f434ac2 freedreno/a4xx: alpha blend fixes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-01 20:31:23 -05:00
Rob Clark
a7d91c33c2 freedreno/a4xx: fix DRAW initiator encoding of index size
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-01 20:31:23 -05:00
Rob Clark
81194ac767 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-01 20:31:23 -05:00
Matt Turner
5df88c2096 i965/vec4: Rewrite dead code elimination to use live in/out.
Improves 359 shaders by >=10%
         114 shaders by >=20%
          91 shaders by >=30%
          82 shaders by >=40%
          22 shaders by >=50%
           4 shaders by >=60%
           2 shaders by >=80%

total instructions in shared programs: 5845346 -> 5822422 (-0.39%)
instructions in affected programs:     364979 -> 342055 (-6.28%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
7a5cc789de i965/vec4: Track liveness of the flag register.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
b449366587 i965/fs: Remove opt_drop_redundant_mov_to_flags().
Dead code elimination now handles this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
b37273b924 i965/fs: Use const fs_reg & rather than a copy or pointer.
Also while we're touching var_from_reg, just make it an inline function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
60d507c3c5 i965/fs: Dead code eliminate instructions writing the flag.
Most prominently helps Natural Selection 2, which has a surprising
number shaders that do very complicated things before drawing black.

instructions in affected programs:     21052 -> 16978 (-19.35%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
bf8deb5514 i965/fs: Track liveness of the flag register.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
13f6601585 i965: Use local pointer to block_data in live intervals.
The next patch will be simplified because of this, and makes reading the
code a lot easier.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
a50915984f i965/vec4: Make live_intervals part of the vec4_visitor class.
Like in fs_visitor.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
e4d0299089 i965/fs: Treat the FB_WRITE as predicated if we're discarding.
Pre-Haswell hardware couldn't actually predicate it, but it's easier to
pretend as if it's predicated in the visitor since it will generate a
MOV from f0.1.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner
f1e5418f40 i965: Don't treat IF or WHILE with cmod as writing the flag.
Sandybridge's IF and WHILE instructions can do an embedded comparison
with conditional mod.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:12 -08:00
Matt Turner
937ddb419d i965/disasm: Disassemble tdr and tm registers properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:12 -08:00
Jordan Justen
cd1b0f04be main, glsl: Bump max known desktop glsl version to 4.50
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-01 16:20:21 -08:00
Jordan Justen
307d22abb0 glsl/cs: Change gl_WorkGroupSize from ivec3 to uvec3
As documented in:

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

  const uvec3 gl_WorkGroupSize;

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-01 16:20:21 -08:00
Jonathan Gray
31a46fb7a5 i965: avoid anonymous struct in float <-> VF conversions
Anonymous structures are only supported with newer versions of
GCC.  They will not work with GCC 4.2.1 used by OpenBSD or
GCC 4.4.7 shipped with RHEL6 going by a commit to fix a similiar
problem in radeonsi earlier in the year
(74388dd24b).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
2014-12-01 16:13:08 -08:00
Brian Paul
991d5cf8ce mesa: fix arithmetic error in _mesa_compute_compressed_pixelstore()
We need parenthesis around the expression which computes the number of
blocks per row.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
2014-12-01 16:30:55 -07:00
Brian Paul
691170b9c7 vbo: also print buffer object pointer in vbo_print_vertex_list()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:30:39 -07:00
Brian Paul
1e14aaa8f9 mesa: some improvements for print_list()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:30:17 -07:00
Brian Paul
c407c6d588 mesa: inline/remove _mesa_polygon_stipple()
Was not called from any other place.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:30:12 -07:00
Brian Paul
f54162857c svga: fix comment typo 2014-12-01 16:30:12 -07:00
Brian Paul
953847e5a8 mesa: remove unused functions in prog_execute.c
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:29:55 -07:00
Brian Paul
cd8a7258b8 mesa: update glext.h to version 20141118 2014-12-01 15:22:20 -07:00
Brian Paul
ded14afa42 gallium: add include path to fix building of pipe-loader code
The pipe-loader code wasn't finding util/u_atomic.h

Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-01 15:22:08 -07:00
José Fonseca
0806bf8815 graw: Avoid 'near'/'far' variables.
They are defined by windows.h, which got included slightly more
frequently than before with u_atomic.h
2014-12-01 20:24:51 +00:00
Matt Turner
120426b13d i965/fs: Clean up some whitespace in reg_allocate.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-01 11:32:56 -08:00
Matt Turner
2e007fd621 ra: Don't use regs as the ralloc context.
The i965 backends pass something out of 'screen', which is allocated
per-process, making using this as a ralloc context not thread-safe.

All callers ra_alloc_interference_graph() already ralloc_free() its
return value.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-01 11:32:54 -08:00
Matt Turner
933c678776 i965: Initialize INTEL_DEBUG once per process.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-01 11:32:52 -08:00
Matt Turner
82811ff176 i965: Initialize compaction tables once per process.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-01 11:32:51 -08:00
Matt Turner
9db278d0e2 glsl: Initialize static temporaries_allocate_names once per process.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-01 11:32:48 -08:00