Commit graph

68698 commits

Author SHA1 Message Date
Brian Paul
72d6bbca5b st/mesa: fix comment indentation in st_flush_bitmap_cache() 2016-01-06 15:53:46 -07:00
Timothy Arceri
e58be8ac0e glsl: fix varying slot allocation for blocks and structs with explicit locations
Previously each member was being counted as using a single slot,
count_attribute_slots() fixes the count for array and struct members.

Also don't assign a negitive to the unsigned expl_location variable.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-07 09:44:32 +11:00
Timothy Arceri
47dde2bd45 glsl: don't try adding built-ins to explicit locations bitmask
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:26 +11:00
Timothy Arceri
ac6e2c2056 glsl: fix overlapping of varying locations for arrays and structs
Previously we were only reserving a single location for arrays and
structs.

We also didn't take into account implicit locations clashing with
explicit locations when assigning locations for their arrays or
structs.

This patch fixes both issues.

V5: fix regression for patch inputs/outputs in tessellation shaders
V4: just use count_attribute_slots() to get the number of slots,
also calculate the correct number of slots to reserve for gs and
tess stages by making use of the new get_varying_type() helper.
V3: handle arrays of structs
V2: also fix for arrays of arrays and structs.

Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:20 +11:00
Timothy Arceri
5907a02ab6 glsl: create helper to remove outer vertex index array used by some stages
This will be used in the following patch for calculating array sizes correctly
when reserving explicit varying locations.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:16 +11:00
Timothy Arceri
30991d7389 glsl: remove unused varyings before packing them
Previously we would pack varyings before trying to remove them, this
relied on the packing pass not packing varyings with a location of -1
to avoid packing varyings that should be removed.
However this meant unused varyings with an explicit location would be
packed before they could be removed when we enable packing of them in a
later patch.

V2: fix regression in V1 removing unused varyings in multi-stage SSO,
fix regression with single stage programs.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:12 +11:00
Krzysztof Sobiecki
0d7477a289 gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UP
ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality.
Replacing it with DIV_ROUND_UP eliminates this problems.

Signed-off-by: Krzysztof A. Sobiecki <sobkas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-06 16:09:12 -05:00
Eric Anholt
bbd29f1375 vc4: Fix driver build from last minute rebase fix.
I had the driver all tested for the last series, and in my last build I
noticed that get_swizzled_channel was unused now, and removed
it... apparently without testing to find that I removed the wrong channel
swizzle function.
2016-01-06 12:49:45 -08:00
Eric Anholt
25aa436e86 vc4: Optimize out a comparison for bcsel based on an ALU comparison
We routinely have code like:

	vec1 ssa_220 = fge ssa_104, ssa_61
	vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105

and we would compare fge's args and choose between ~0 and 0 to generate
ssa_220, then compare ssa_220 to 0 and choose between bcsel's args.
Instead, try to notice the pattern and compare between fge's args to
select between bcsel's args.

total instructions in shared programs: 88019 -> 87574 (-0.51%)
instructions in affected programs:     9985 -> 9540 (-4.46%)
total estimated cycles in shared programs: 245752 -> 245237 (-0.21%)
estimated cycles in affected programs:     17232 -> 16717 (-2.99%)
2016-01-06 12:43:09 -08:00
Eric Anholt
7a9eb76786 vc4: Add missing sRGB decode to texel fetches.
We only see txf on MSAA textures, currently, and apparently this didn't
impact any of our piglit tests.
2016-01-06 12:43:09 -08:00
Eric Anholt
f01ca9eeda vc4: Add support for GL_ARB_texture_swizzle.
We already had the code supporting it, since it's needed for the depth
mode when doing shadow comparisons.
2016-01-06 12:43:09 -08:00
Eric Anholt
12519a972f vc4: Use NIR texture lowering for texture swizzling.
We can't use its other features currently (mostly because we don't want
Newton-Raphson on rcps for texture coordinates), but it gets us started.

This eliminates some comparisons with constants in GLB2.7 and ETQW traces
at the QIR level by moving the comparisons into NIR, where they get
constant-folded out.

instructions in affected programs:     165 -> 156 (-5.45%)
total uniforms in shared programs: 32087 -> 32085 (-0.01%)
total estimated cycles in shared programs: 245762 -> 245752 (-0.00%)
estimated cycles in affected programs:     461 -> 451 (-2.17%)
2016-01-06 12:43:08 -08:00
Eric Anholt
71db7d3dc5 vc4: Replace the SSA-style SEL operators with conditional MOVs.
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers.  By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.

total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs:     39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs:     162887 -> 162343 (-0.33%)
2016-01-06 12:39:51 -08:00
Eric Anholt
0a89f307f9 vc4: Don't try the SF coalescing unless it's on a def.
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
2016-01-06 12:39:27 -08:00
Edward O'Callaghan
1953cee6d7 gallium/drivers/svga: Use unsigned for loop index
Fix a 's/unsigned int/unsigned/' consistency case while here.

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
8e2a8ec731 gallium/drivers/r600: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
76a7d6f412 gallium/drivers/ilo: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
5071c192cc gallium: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
bfabd5e74a gallium/drivers: Remove unnecessary semicolons
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
67d4b4b28c gallium: Remove unnecessary semicolons
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Oded Gabbay
9d59b9d00c llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8
This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
 Name            Before     After    Delta
------------------------------------------------
openarena        16.35      16.7     2.14%
xonotic          4.707      4.97     5.57%

glmark2 didn't show a significant (more than 1%) difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
925c46cfc4 llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8
This patch converts the SSE-optimized build_mask_32() and
build_mask_linear_32() to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
  Name            Before     After    Delta
------------------------------------------------
glmark2 (score)   139.8      142.7    2.07%

openarena and xonotic didn't show a significant (more than 1%)
difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
3bbe16ea79 llvmpipe: Optimize do_triangle_ccw for POWER8
This patch converts the SSE optimization done in do_triangle_ccw to
VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
  Name            Before     After    Delta
------------------------------------------------
glmark2 (score)   136.6      139.8    2.34%
openarena         16.14      16.35    1.30%
xonotic           4.655      4.707    1.11%

v2:

- Convert loads to use aligned loads
- Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
e99555ef0b llvmpipe: add POWER8 portability file - u_pwr8.h
This file provides a portability layer that will make it easier to convert
SSE-based functions to VMX/VSX-based functions.

All the functions implemented in this file are prefixed using "vec_".
Therefore, when converting from SSE-based function, one needs to simply
replace the "_mm_" prefix of the SSE function being called to "vec_".

Having said that, not all functions could be converted as such, due to the
differences between the architectures. So, when doing such
conversion hurt the performance, I preferred to implement a more ad-hoc
solution. For example, converting the _mm_shuffle_epi32 needed to be done
using ad-hoc masks instead of a generic function.

All the functions in this file support both little-endian and big-endian
but currently the file is build only on POWER8 LE machine.

All of the functions are implemented using the Altivec/VMX intrinsics,
except one where I needed to use inline assembly (due to missing
intrinsic).

v2:
- Use vec_vgbbd instead of __builtin_vec_vgbbd
- Add an aligned load function
- Don't use typeof()
- Make file build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Kenneth Graunke
7295f4fcc2 nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.
The nir_opt_algebraic rule

(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),

can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv.  (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point.  So, make NIR do it for us.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-01-05 19:22:11 -08:00
Kenneth Graunke
bd21b54607 i965: Only turn on ARB_compute_shader if we can write registers.
Compute shaders require reconfiguring the L3 for shared local memory
support.  We have to be able to write the L3 registers to do that.

This effectively turns off compute shaders prior to Kernel 4.2.

(Previously, the extension enable was in an API_OPENGL_CORE conditional.
However, that isn't necessary - core Mesa extension handling already
restricts it properly.  I've moved it out in this patch.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-01-05 18:07:27 -08:00
Kenneth Graunke
25b7e4a01f i965: Use rcp in brw_lower_texture_gradients rather than 1.0 / x.
That's what it's for.  Plus, we actually implement rcp.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-05 18:07:27 -08:00
Timothy Arceri
3d402d4450 mesa: fix GL_MAX_NAME_LENGTH query for tessellation shaders
This fixes some piglit subtests for ARB_program_interface_query.

V3: remove some of the unnecessary parentheses
V2: fix alignment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-06 12:01:09 +11:00
Timothy Arceri
e1e1b67878 glsl: don't change the varying type in validation code
There is a function dedicated to demoting unused varyings lets
trust it to do its job.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-06 10:52:58 +11:00
Timothy Arceri
21590a307c glsl: move lowering after matching validation
After lowering the matching flag is_unmatched_generic_inout is lost so
we need to move this validation before lowering.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-06 10:52:54 +11:00
Timothy Arceri
0508d9504a glsl: only add outward facing varyings to resourse list for SSO
An SSO program can have multiple stages and we only want to add the externally
facing varyings. The current code was adding both the packed inputs and outputs
for the first and last stage of each program.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-06 10:52:48 +11:00
Anuj Phogat
4d2a7f5111 i965/gen9: Modify the conditions to use blitter on skl+
Conditions modified allow skl+ to use blitter:
 - for all tiling formats
 - to write data to YF/YS tiled surfaces

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-01-05 13:43:32 -08:00
Anuj Phogat
0bf037c0fe i965/gen9: Return false in place of assert in intelEmitCopyBlit()
This allows the fallback paths to handle it correctly.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-05 13:43:32 -08:00
Anuj Phogat
5cbe01c83f i965/gen9: Remove regions overlap check in fast copy blit
Overlapping blits are anyway undefined in OpenGL. So no need
of overlap check here.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-05 13:43:32 -08:00
Anuj Phogat
3c8b97a45b i965/gen9: Don't use fast copy blit in case of non power of 2 cpp
Fast copy blit is currently enabled for use only with Yf/Ys tiling.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-05 13:43:32 -08:00
Ian Romanick
ee4676aa57 i915/i965: Fix typo in perf_debug message
Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-05 13:18:45 -08:00
Brian Paul
a13e9adbee st/mesa: minor indentation fixes 2016-01-05 13:04:46 -07:00
Brian Paul
f4caa7d2fc draw: minor indentation fix 2016-01-05 13:03:05 -07:00
Brian Paul
dce1e1a8eb mesa: minor clean-up of some memcpy/sizeof() calls in m_matrix.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:05 -07:00
Brian Paul
95d412181d util: add debug_dump_ubyte_rgba_bmp()
Like debug_dump_float_rgba_bmp() but takes ubyte values.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
f04d7439a0 mesa: check for z=0 in _mesa_Vertex3dv()
It's very rare that a GL app calls glVertex3dv(), but one in particular
calls it lot, always with Z = 0.  Check for that condition and convert
the call into glVertex2f.  This reduces VBO memory used and reduces
the number of times we have to switch between float[2] and float[3]
vertex formats in the svga driver.  This results in a small but
measurable performance improvement.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
eec8d7e7e0 svga: fix test for SVGA_NEW_STIPPLE
We only want to set the SVGA_NEW_STIPPLE dirty flag when the polygon
stipple state changes.  Before, we only set the flag when we were
enabling stipple, but not disabling.

We don't really have to add SVGA_NEW_STIPPLE to the dirty FS state
set since it's a subset of SVGA_NEW_RAST, but let's be explicit.

This doesn't fix any known bugs.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
993b04ee2c svga: add some comments in svga_state_vs.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
fc07658895 svga: change svga_hw_view_state::dirty to boolean
Since it's a true/false value.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
077aa3be93 svga: avoid emitting redundant SetVertexBuffers() commands
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
b11bd20889 svga: check for no-ops in svga_bind_sampler_states()
and svga_set_sampler_views().  If there's no change, return early
and don't set a SVGA_NEW_x dirty state flag.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Ilia Mirkin
6531ccb705 i965: quieten compiler warning about out-of-bounds access
gcc 4.9.3 shows the following error:

brw_vue_map.c:260:20: warning: array subscript is above array bounds
[-Warray-bounds]
    return brw_names[slot - VARYING_SLOT_MAX];

This is because BRW_VARYING_SLOT_COUNT is a valid value for the enum
type. Adding an assert will generate no additional code but will teach
the compiler to not complain.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-05 12:07:53 -05:00
Julien Isorce
777d1453f1 build: enable st/va with nouveau driver
vainfo fails in vaDriverInit because "dd_create_screen"
does not reach strcmp(driver_name, "nouveau") code.
Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU
is not defined.
This patch define the macro the same it is done for dri and
vdpau targets.

Tested with:
./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va
--with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11

LIBVA_DRIVER_NAME=gallium vainfo
Output:
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple                  :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileMPEG4Simple            :	VAEntrypointVLD
      VAProfileMPEG4AdvancedSimple    :	VAEntrypointVLD
      VAProfileVC1Simple              :	VAEntrypointVLD
      VAProfileVC1Main                :	VAEntrypointVLD
      VAProfileVC1Advanced            :	VAEntrypointVLD
      VAProfileH264Baseline           :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileNone                   :	VAEntrypointVideoProc

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce
abb30b9c8b nvc0: add support for st/va
- split nvc0_decoder_bsp in begin/next/end
- preserve content buffer when calling nvc0_decoder_bsp_next
- implement pipe_video_codec::begin_frame/end_frame

https://bugs.freedesktop.org/show_bug.cgi?id=89969

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce
7ba27f60f7 nouveau: split nouveau_vp3_bsp in begin/next/end
It allows to call nouveau_vp3_bsp_next multiple times
between one begin/end.

It is required to support st/va.

https://bugs.freedesktop.org/show_bug.cgi?id=89969

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
[imirkin: create strparm_bsp function, simplified w0 calculation]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00