Commit graph

82384 commits

Author SHA1 Message Date
Neil Roberts
bf6bd7eaf0 i965: Support allocating the MCS buffer for 16x MSAA
When 16 samples are used the MCS buffer needs 64 bits per pixel.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
b4c2e6054f i965: Support calculating the bits needed to set up 16x MSAA
The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
1a97cac767 i965/fs: Add a sampler program key for whether the texture is 16x MSAA
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.

Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
4ef27745c8 i965/vec4/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
e386fb0dee i965/fs/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.

v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
    the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
20250e854e i965: Program 16x MSAA sample positions.
This is the standard pattern used by the other 3D graphics API.

BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.

The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.

arb_gpu_shader5-interpolateAtSample-different

Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern.

(Based on a patch by Kenneth Graunke)

Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
2015-11-05 10:33:15 +01:00
Kenneth Graunke
5048da974e i965: Handle 16x MSAA in IMS dimension munging code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:15 +01:00
Kenneth Graunke
b9f8e729c8 nir: Rename nir_live_variables.c to nir_liveness.c.
It doesn't actually operate on variables.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 00:09:40 -08:00
Kenneth Graunke
5c6f21579d nir: Rename live_variables to live_ssa_defs.
This computes liveness of SSA values, not nir_variables.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 00:09:40 -08:00
Alejandro Piñeiro
56774e6302 i965/vec4: select predicate based on writemask for sel emissions
Equivalent to commit 8ac3b525c but with sel operations. In this case
we select the PredCtrl based on the writemask.

This patch helps on cases like this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel at
instruction #3 reads all four channels of the flag, when it actually
only reads .x.

So, with this patch, the previous case becames this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Now only the x channel of the flag is used, allowing dead code
eliminate to update the writemask at the second instruction:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

So now cmod propagation can simplify out #2:
 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Shader-db numbers:
total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
instructions in affected programs:     219850 -> 212023 (-3.56%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                1192
HURT:                                  0
2015-11-05 08:57:23 +01:00
Jason Ekstrand
a40f682c71 anv/cmd_buffer: Fix SURFACE_STATE for non-view buffer bindings
We were treating it as if it's a BufferView and weren't taking the offset
into account properly.
2015-11-04 19:56:18 -08:00
Jason Ekstrand
1b68120760 anv/cmd_buffer: Don't use an anv_state pointer in emit_binding_table
The anv_state is supposed to be a flyweight so we're not really saving
anything by using a pointer.  Also, we were creating one, setting a pointer
to it, and then having it go out-of-scope which is bad.
2015-11-04 19:56:16 -08:00
Ilia Mirkin
bb73fc4cb8 nouveau: relax fence emit space assert
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2015-11-04 22:43:56 -05:00
Chad Versace
d259af3fbb anv: Remove unused anv_render_pass members
Remove members
  num_color_clear_attachments
  has_depth_clear_attachment
  has_stencil_clear_attachment

The new clear code in anv_meta_clear.c does not use them.
2015-11-04 15:54:38 -08:00
Chad Versace
a9a3071fc4 anv/meta: Rewrite clear code
Fixes Crucible test "func.clear.load-clear.attachments-8".

The old clear code, when clearing attachments for
VK_ATTACHMENT_LOAD_OP_CLEAR, suffered from some fundamental bugs. The
bugs were not fixable with the old code's approach.

    - It assumed that a VkRenderPass contained at most one depthstencil
       attachment.

    - It tried to clear all attachments (color and the sole
      depthstencil) with a single instanced draw call, using the VUE
      header's RenderTargetArrayIndex to specify the instance's target
      color attachment. But the RenderTargetArrayIndex does not select
      entries in the binding table; it only selects an array index of
      a singled layered surface.

    - If at least one attachment of VkRenderPass had
      VK_ATTACHMENT_LOAD_OP_CLEAR,
      then the old code cleared *all* attachments. This was
      a consequence of using a single draw call and single pipeline for
      the clear.

The new clear code fixes those bugs by making a separate draw call for
each attachment, and using one pipeline when clearing color attachments
and a different pipeline for depth attachments.

The new code, like the old code, does not clear stencil attachments. It
is left as a FINISHME.
2015-11-04 15:20:52 -08:00
Chad Versace
49c96a14c5 anv/meta: Clear color attribute is always flat
No behavioral change. This patch just removes an unneeded function
parameter.
2015-11-04 15:15:19 -08:00
Chad Versace
7f82cc718f anv/meta: Use consistent naming for dynamic state mask
Consistently rename bitmasks of Vulkan dynamic state to 'dynamic_mask'.

  anv_meta_saved_state::dynamic_flags -> dynamic_mask
  anv_meta_save(dynamic_state)        -> dynamic_mask
2015-11-04 15:15:19 -08:00
Chad Versace
2bdb9e2ed9 anv/meta: Rename anv_cmd_buffer_save/restore
As the functions are now exposed in anv_meta.h, let's rename them
to clarify that they are meta functions.

    anv_cmd_buffer_save -> anv_meta_save
    anv_cmd_buffer_restore -> anv_meta_restore
2015-11-04 15:15:19 -08:00
Chad Versace
16b2a489db anv: Move meta clear code to new file anv_meta_clear.c
anv_meta.c currently handles blits, copies, clears, and resolves.  The
clear code is about to grow, and anv_meta.c is already busting at the
seams.
2015-11-04 15:15:19 -08:00
Chad Versace
c56727037a anv: Move struct anv_vue_header to anv_private.h
Move it from anv_meta.c to the common header anv_private.h. This allows
us to split the meta blit and meta clear code into separate files.
2015-11-04 15:15:19 -08:00
Eric Anholt
6d3a24bce8 vc4: When the create ioctl fails, free our cache and try again.
This greatly increases the pressure you can put on the driver before
create fails.  Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-04 14:04:14 -08:00
Eric Anholt
3f7c96c36c vc4: Print the rounded shader size in debug output.
It's surprising to see "0kb" printed for debug on short shaders, while
4kb alignment won't be suprising.
2015-11-04 13:32:07 -08:00
Eric Anholt
4a951f1c08 vc4: Fix dumping the size of BOs allocated/cached.
60MB of cached BOs are a lot less scary than 600MB.
2015-11-04 13:32:07 -08:00
Ilia Mirkin
5bbd522452 mesa/tests: add glBufferStorageEXT to ES 3.1 dispatch list
I thought that aliased functions didn't need to be added, but that might
only be if the function aliases something in the same {desktop,ES}
space. Resolves the dispatch sanity test failure.

Fixes: 13b19aa81 (mesa: expose support for GL_EXT_buffer_storage)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-04 14:28:57 -05:00
Brian Paul
bdf6cef033 vbo: fix another GL_LINE_LOOP bug
Very long line loops which spanned 3 or more vertex buffers were not
handled correctly and could result in stray lines.

The piglit lineloop test draws 10000 vertices by default, and is not
long enough to trigger this.  Even 'lineloop -count 100000' doesn't
trigger the bug.

For future reference, the issue can be reproduced by changing Mesa's
VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to
use glVertex2f(), draw 3 loops instead of 1, and specifying -count
1023.

Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-04 11:51:59 -07:00
Brian Paul
d31481e70a svga: implement 'white_fragments' option for VGPU10 fragment shaders
When we emulate XOR logicop mode with blend-subtract, we need to ensure
that the fragment shader always emits white.  We had this implemented
for VGPU9, but not VGPU10.

VMware bug 1545492.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:41 -07:00
Brian Paul
149ac1fe43 u_vbuf: minor code reformatting / line wrapping
Trivial.
2015-11-04 11:51:41 -07:00
Brian Paul
e450d4371a u_vbuf: add some const qualifiers
Trivial.
2015-11-04 11:51:40 -07:00
Brian Paul
3f98c812b3 svga: use new enum indices_mode type
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:40 -07:00
Brian Paul
fa6efbd27d util/indices: replace #define tokens with enum type
To ease debugging in gdb.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:40 -07:00
Alejandro Piñeiro
c3d7caa1e0 i965: check inst->predicate when clearing flag_live at dead code eliminate
Detected by Matt Turner while reviewing commit
a59359ecd2

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-04 19:33:56 +01:00
Roland Scheidegger
c19443bc8b gallivm: fix sampling for s3tc srgb formats when using texture cache
This actually stored the values as 8bit linear values in the cache,
then did another srgb->linear conversion...
We don't want to do the former (decoding 8bit srgb values to 8bit linear
completely defeats the purpose of srgb in the first place), so just decode
to 8bit srgb.
Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.
2015-11-04 14:21:43 +01:00
Ben Widawsky
d56a1478a8 i965/meta: Assert fast clears and rep clears never overlap
There is nothing wrong with the code today, but as one modifies the code it
turns out to be not too difficult to mess up the code, and this easy assertion
should catch such driver implementation failures quickly.

Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-03 21:54:11 -08:00
Ryan Houdek
13b19aa815 mesa: expose support for GL_EXT_buffer_storage
This extension requires ES 3.1 since it relies on glMemoryBarrier.
For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
function.
This has been tested with the piglit in the ML and the Dolphin emulator.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-04 00:01:03 -05:00
Timothy Arceri
8e4cf900f0 glsl: make sure to only add subroutines to resource list
Over looked in 763cd8c080.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-04 15:43:12 +11:00
Timothy Arceri
f6b3c163f9 glsl: remove old TODO
SSBO support now exists as of commits f24e5e and f408a13dd3.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-11-04 15:40:38 +11:00
Timothy Arceri
6e3b380387 docs: Mark AoA as done for i965
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-04 13:41:16 +11:00
Timothy Arceri
5b75dbd7be i965: enable ARB_arrays_of_arrays
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-04 13:39:08 +11:00
Timothy Arceri
fb77da89f5 i965: add support for image AoA
V3: clamp array index to the correct size (the size of the current array
rather than the inner array) Francisco Jerez.

V2: avoid useless zero-initialization and addition for the first AoA level,
avoid redundant temporary, make use of type_size_scalar(), rename aoa_size
to element_size, assign the indirect indexing temporary directly to
image.reladdr, and replace while loop with a for loop. All suggested
by Francisco Jerez.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-04 13:38:32 +11:00
Roland Scheidegger
9285ed98f7 llvmpipe: add cache for compressed textures
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-04 02:51:02 +01:00
Oded Gabbay
39b4dfe6ab llvmpipe: use simple coeffs calc for 128bit vectors
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.

The decision which method to use is determined by the size of the vector
that is used by the platform.

For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.

For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.

This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).

This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.

The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:

- glxspheres, on ppc64le:

   - original code:  4.892745317 frames/sec 5.460303857 Mpixels/sec
   - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
   - Additional 0.8% performance boost

- glxspheres, on x86-64 without AVX:

   - original code:  20.16418809 frames/sec 22.50323395 Mpixels/sec
   - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
   - Additional 0.74% performance boost

- glmark2, on ppc64le:

  - original code:  score of 58
  - with my change: score of 57

- glmark2, on x86-64 without AVX:

  - original code:  score of 175
  - with the patch: score of 167
  - Impact of of -4.5% on performance

- OpenArena, on ppc64le:

  - original code:  3398 frames 1719.0 seconds 2.0 fps
                    255.0/505.9/2773.0/0.0 ms

  - with the patch: 3398 frames 1690.4 seconds 2.0 fps
                    241.0/497.5/2563.0/0.2 ms

  - 29 seconds faster with the patch, which is about 2%

- OpenArena, on x86-64 without AVX:

  - original code:  3398 frames 239.6 seconds 14.2 fps
                    38.0/70.5/719.0/14.6 ms

  - with the patch: 3398 frames 244.4 seconds 13.9 fps
                    38.0/71.9/697.0/14.3 ms

  - 0.3 fps slower with the patch (about 2%)

Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-04 02:38:53 +01:00
Kenneth Graunke
59bbe2681b nir: Properly invalidate metadata in nir_opt_remove_phis().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
bc3942e297 nir: Properly invalidate metadata in nir_lower_vec_to_movs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
0f037bd71f nir: Properly invalidate metadata in nir_opt_copy_prop().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
4cb7546066 nir: Properly invalidate metadata in nir_remove_dead_variables().
v2: Preserve live_variables too (Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-11-03 17:06:48 -08:00
Kenneth Graunke
8bb44510fc nir: Properly invalidate metadata in nir_split_var_copies().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
aea40091f0 nir: Properly invalidate metadata in nir_lower_global_vars_to_local().
v2: Preserve nir_metadata_live_variables as well (caught by Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-11-03 17:06:48 -08:00
Jason Ekstrand
531be601d5 nir: Unexpose _impl versions of copy_prop and dce
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-03 17:06:48 -08:00
Jordan Justen
4bc16ad217 mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Iago Toral <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2015-11-03 16:44:22 -08:00
Matt Turner
cf3121ed18 i965/vec4: Send from GRF in atomic operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-03 16:38:36 -08:00