Commit graph

86848 commits

Author SHA1 Message Date
Ilia Mirkin
f6f644ea12 swr: color interpolation is also supposed to get perspective division
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
7cbfe59cf3 swr: add sprite coord enable mask to fs key
This fixes gl-coord-replace-doesnt-eliminate-frag-tex-coords

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
6d6ef3fb55 swr: rework vert <-> frag shader linkage logic
Fixes a few things:
 - sprite coords only apply to generic varyings, and are a bitmask
 - back color only applies in 2-sided lighting mode
 - handle some odd situations between only some front/back colors being
   there. This is only semi-legal in GL, but we shouldn't start
   crashing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
2595aebd91 swr: flatshading makes color outputs flat, it doesn't affect others
We were previously not marking the "regular" flat outputs as flat when
flatshading was enabled.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
37be598dda swr: only broadcast color0 value, not all color values
The way that dual-source blending is described for GLES2 is very odd,
and we end up with a shader that both has this property set *and* has a
color1 value to be used as the second source. While changing the state
tracker is an option, it seems more reliable to verify that the
broadcast is only done on color0.

Fixes arb_blend_func_extended-fbo-extended-blend-pattern_gles2

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
2234a4330e swr: report a reasonable max lod bias
This is the same value that llvmpipe uses. Since swr uses the same
sampler logic, makes sense for this value to also be the same. Most
applications don't care.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
2b7bdff83f swr: avoid using exceptions for expected condition handling
I was getting a weird segfault from GCC 4.9.3:

 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00007ffff54f27aa in strlen () from /lib64/libc.so.6
 #1  0x00007ffff4f128e5 in get_cie_encoding (cie=cie@entry=0x7ffff6e09813)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:272
 #2  0x00007ffff4f1318e in classify_object_over_fdes (ob=ob@entry=0xd7bb90, this_fde=0x7ffff7f11010)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:628
 #3  0x00007ffff4f135ba in init_object (ob=0xd7bb90)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:749
 #4  search_object (ob=ob@entry=0xd7bb90, pc=pc@entry=0x7ffff4f11f4d <_Unwind_RaiseException+61>)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:961
 #5  0x00007ffff4f13e62 in _Unwind_Find_registered_FDE (bases=0x7fffffffd358, pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:1025
 #6  _Unwind_Find_FDE (pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>, bases=bases@entry=0x7fffffffd358)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde-dip.c:450
 #7  0x00007ffff4f11197 in uw_frame_state_for (context=context@entry=0x7fffffffd2b0, fs=fs@entry=0x7fffffffd100)
     at /gcc-4.9.3/libgcc/unwind-dw2.c:1245
 #8  0x00007ffff4f11b15 in uw_init_context_1 (context=context@entry=0x7fffffffd2b0, outer_cfa=outer_cfa@entry=0x7fffffffd660, outer_ra=0x7ffff518d23b <__cxa_throw+91>)
     at /gcc-4.9.3/libgcc/unwind-dw2.c:1566
 #9  0x00007ffff4f11f4e in _Unwind_RaiseException (exc=0xd7c250)
     at /gcc-4.9.3/libgcc/unwind.inc:88
 #10 0x00007ffff518d23b in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6
 #11 0x00007ffff51ed556 in std::__throw_out_of_range(char const*) ()
    from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6
 #12 0x00007fffea778be0 in std::map<pipe_format, SWR_FORMAT, std::less<pipe_format>, std::allocator<std::pair<pipe_format const, SWR_FORMAT> > >::at (
     this=0x7fffebeb4c40 <mesa_to_swr_format(pipe_format)::mesa2swr>,
     __k=@0x7fffffffd73c: PIPE_FORMAT_RGTC1_UNORM)
     at /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include/g++-v4/bits/stl_map.h:549
 #13 0x00007fffea776aee in mesa_to_swr_format (format=PIPE_FORMAT_RGTC1_UNORM) at swr_screen.cpp:597

We can just void this whole issue by not using exceptions in the
first place.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
946a7abd1c swr: remove formats from mapping table that don't have StoreTile impls
This table exists for the purpose of determining renderable formats.
Without a StoreTile implementation, that can't happen.

This basically removes rendering support to all L/LA/I formats. They can
be re-added when/if StoreTile implementations are added.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
2e12d2ba72 swr: remove unnecessary -1 entries in format mapping table
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
7cfb364b1a swr: rework resource layout and surface setup
This is a bit of a mega-commit, but unfortunately there's no great way
to break this up since a lot of different pieces have to match up. Here
we do the following:
 - change surface layout to match swr's Load/StoreTile expectations
 - fix sampler settings to respect all sampler view parameters
 - fix stencil sampling to read from secondary resource
 - respect pipe surface format, level, and layer settings
 - fix resource map/unmap based on the new layout logic
 - fix resource map/unmap to copy proper parts of stencil values in and
   out of the matching depth texture

These fix a massive quantity of piglits, including all the
tex-miplevel-selection ones.

Note that the swr native miptree layout isn't extremely space-efficient,
and we end up using it for all textures, not just the renderable ones. A
back-of-the-envelope calculation suggests about 10%-25% increased memory
usage for miptrees, depending on the number of LODs. Single-LOD textures
should be unaffected.

There are a handful of regressions as a result of this change:
 - Some textureGrad tests, these failures match llvmpipe. (There are
   debug settings allowing improved gallivm sampling accurancy.)
 - Some layered clearing tests as swr doesn't currently support that. It
   was getting lucky before because enough other things were broken.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-22 20:27:20 -05:00
Charmaine Lee
5d2b5996e1 util: fix missing swizzle components in the SINT <-> UINT conversion string
Fixes tgsi error introduced in commit 3817a7a. The error complains missing
swizzle component in the conversion string "UMIN TEMP[0], TEMP[0], IMM[0].x".

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-23 01:54:57 +01:00
Eric Anholt
414dbb2d5c vc4: Don't conditionalize the src1 mov of qir_SEL().
My thought in having both arguments conditionally moved was that it should
theoretically save some power by not doing work in those channels.
However, it ends up costing us instructions because we can't
register-coalesce the first of the MOVs, and it also introduces extra
scheduling dependencies.  The instruction cost would swamp whatever power
benefit I was hoping for.

shader-db results:
total instructions in shared programs: 100548 -> 99741 (-0.80%)
instructions in affected programs:     42450 -> 41643 (-1.90%)

With obvious outliers removed (I had an X11 emacs running over the network
in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps
improvement (n=18, 30).
2016-11-22 16:46:03 -08:00
Eric Anholt
1f0ba902f0 vc4: Re-add R4 to the "any" register class.
I screwed this up in fdad4d2402 which was
supposed to be making this code more maintainable.  What's amazing is
multithreaded FS showed the wins it did despite this bug.

shader-db results:
total instructions in shared programs: 103535 -> 100548 (-2.89%)
instructions in affected programs:     83794 -> 80807 (-3.56%)
2016-11-22 16:46:03 -08:00
Eric Anholt
9728887e7f vc4: Disable MSAA rasterization when the job binning is single-sampled.
Gallium core just changed to start setting MSAA enabled in the rasterizer
state even with samples==1 buffers.  This caused disagreements in our
driver between binning and rasterization state, which the simulator threw
assertion failures about.  Keep the single-sampled samples==1 behavior for
now.
2016-11-22 16:46:03 -08:00
Eric Anholt
ff018e0979 vc4: Make sure we don't overflow texture input/output FIFOs when threaded.
I dropped the first hunk of this change last minute when I decided it
wasn't actually needed, and apparently failed to piglit it in simulation.
The simulator threw an an assertion in gl-1.0-drawpixels-color-index,
which queued up 5 coordinates (3 before a switch, two after) before
loading the result.
2016-11-22 16:46:03 -08:00
Dave Airlie
ea417f5335 radv: move pipeline barrier image transitions after src flushing
This seems like it would conform better with the spec.

noticed while digging into fast clears.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-23 10:16:34 +10:00
Jason Ekstrand
3fd79558be anv: Enable fast clears on gen7-8
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
5e8069a572 anv: Add support for fast clears on gen9
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
dae8e52030 anv/blorp: Rework flushing around resolves
It turns out that the flushing required around resolves is a bit more
extensive than I first thought.  You actually need render cache flush
and a CS stall both before *and* after the resolve.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
8d1ccd6729 anv/cmd_buffer: Apply remaining flushes in EndCommandBuffer
Otherwise, some pipe flushes may just never happen.  This is unlikely to
cause problems depending on how the kernel schedules batches, but we
shouldn't count on it.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
878499d323 anv/blorp: Use regular blorp clears for subpass clears
At vkCmdNextSubpass time, we have the actual framebuffer so we can use
regular blorp_clear for subpass clears.  For fast clears, there is no
attachment version, so this will make fast clears a bit easier.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
772d223c9c anv: Add a vk_to_isl_color helper
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
d1d6b78898 anv/cmd_buffer: Make setup_attachments take a RenderPassBeginInfo
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:13:53 -08:00
Jason Ekstrand
1d5ac0a462 anv: Set up binding tables and surface states for input attachments
This commit adds the last remaining bits to support input attachments in
the Intel Vulkan driver.  For color and depth attachments, we allocate an
input attachment surface state during vkCmdBeginRenderPass like we do for
the render target surface states.  This is so that we can incorporate the
clear color and aux information as used in rendering.  For stencil, we just
treat it like a regular texture because we don't there is no aux.  Also,
only having to worry about at most one input attachment surface for each
attachment makes some of the vkCmdBeginRenderPass code simpler.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
140d041fac anv/pipeline: Handle depth/stencil self-dependencies
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
0b01262844 anv: Use pass attachment information to insert flushes
Input and resolve attachments can cause an implicit dependency in the
pipeline.  It's our job to insert the needed flushes.  Fortunately, we can
easily reuse the usage tracking that we use for CCS resolves.

This fixes 159 Vulkan CTS tests on Haswell because we're now flushing in
between drawing and MSAA resolves.  I have no idea how they were passing
before on newer hardware.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
57174d6042 anv/cmd_buffer: Fix pipeline barriers for input attachments
We were using VK_IMAGE_ACCESS_COLOR_ATTACHMENT_READ_BIT to detect an input
attachment read.  We should use VK_IMAGE_ACCESS_INPUT_ATTACHMENT_READ_BIT
instead.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
0acb28e0cf anv/pipeline: Add a input_attachment_index to the bindings
This allows us to go from the binding to either the descriptor or the input
attachment at will.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
3f1eda0b42 anv/pass: Calculate the combined image usage of attachments
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
347f43c8ec anv: Add an input attachment lowering pass
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
2e311e4211 i965/fs: Implement load_layer_id for fragment shaders
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:03:31 -08:00
Jason Ekstrand
08441dae59 nir: Add a layer_id system value intrinsic
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:03:29 -08:00
Jason Ekstrand
2e44799f50 spirv: Stop warning about input attachments
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:03:23 -08:00
Jason Ekstrand
c54097cc48 spirv: Handle the InputAttachmentIndex decoration
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:02:35 -08:00
Jason Ekstrand
111d57e7d2 compiler: Add the rest of the subpassInput types
There are actually 6 of them according to the GL_KHR_vulkan_glsl spec.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:02:29 -08:00
Jason Ekstrand
7a2cfd4adb anv/cmd_buffer: Emit CS push constants after binding tables
Emitting binding tables can cause push constants to be dirtied if the
shader uses images so we need to handle push constants later.
2016-11-22 10:10:38 -08:00
Marek Olšák
a3f6bea69a gallium: fix more occurences of u_hash.h
this fixes compile failures since 86514d84e0
2016-11-22 18:28:18 +01:00
Marek Olšák
d219720d19 mesa: use special checksums for unset checksums and fixed-func shaders
for debugging

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:07:16 +01:00
Marek Olšák
b818df1e71 glsl: add gl_linked_shader::SourceChecksum
for debugging

v2: wrap all checksums in #ifdef DEBUG

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:05:51 +01:00
Marek Olšák
6dfdf52b6a mesa: use util_hash_crc32 instead of _mesa_str_checksum
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:05:51 +01:00
Marek Olšák
86514d84e0 util: import CRC32 implementation from gallium
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:05:51 +01:00
Jason Ekstrand
3ef8dff865 anv/cmd_buffer: Add an assert on emit_binding_table failure
The != VK_SUCCESS case is really only capable of handling the one error.
This assert makes things a bit safer if something else goes wrong.

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-11-22 08:50:27 -08:00
Lucas Stach
d9a3ad94ca gbm: request correct version of the DRI2_FENCE extension
There is no version 2 of the DRI2_FENCE extension. So only a request
for version 1 has a chance to succeed.

Fixes: 74b1969d71 (gbm: wire up fence extension)
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-22 15:56:44 +00:00
Jason Ekstrand
f680a01ad4 anv/cmd_buffer: Emit a CS stall before setting a CS pipeline
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-22 08:06:33 -08:00
Jason Ekstrand
054e48ee0e anv/cmd_buffer: Re-emit MEDIA_CURBE_LOAD when CS push constants are dirty
This can happen even if the binding table isn't changed.  For instance, you
could have dynamic offsets with your descriptor set.  This fixes the new
stress.lots-of-surface-state.cs.dynamic cricible test.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-22 08:06:33 -08:00
Jason Ekstrand
722ab3de9f anv/cmd_buffer: Handle running out of binding tables in compute shaders
If we try to allocate a binding table and fail, we have to get a new
binding table block, re-emit STATE_BASE_ADDRESS, and then try again.  We
already handle this correctly for 3D and blorp but it never got handled for
CS.  This fixes the new stress.lots-of-surface-state.cs.static crucible test.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-22 08:06:33 -08:00
Jason Ekstrand
a8ef92b031 i965/compiler: Disable trig workarounds on KBL+
The precision of our trig instructions appears to have been fixed on Kaby
Lake.  Neither Ben nor I can find any documentation for this.  However, the
dEQP precision tests now pass with INTEL_PRECISE_TRIG=0 where they fail on
Sky Lake.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-11-22 08:06:33 -08:00
Jason Ekstrand
767b163e47 intel/common: Add an is_kabylake field to gen_device_info
Most of the 3-D engine Kaby Lake is identical to Sky Lake.  However, there
are a few small differences that we need to be able to detect.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-11-22 08:06:33 -08:00
Gwan-gyeong Mun
e074a08a6d anv: Fix unintentional integer overflow in anv_CreateDmaBufImageINTEL
Since both pCreateInfo->strideInBytes and pCreateInfo->extent.height
are of uint32_t type 32-bit arithmetic will be used.

Fix unintentional integer overflow by casting to uint64_t before
multifying.

CID 1394321

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
[Emil Velikov: cast only of the arguments]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-22 15:15:45 +00:00
Gwan-gyeong Mun
69cc7d90f9 util/disk_cache: close a previously opened handle in disk_cache_put (v2)
We're missing the close() to the matching open().

CID 1373407

v2: Fixes from Emil Velikov's review
    Update the teardown in reverse order of the setup/init.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
2016-11-22 15:13:42 +00:00