Commit graph

31705 commits

Author SHA1 Message Date
Eric Anholt
743dcdd936 vc4: Allow VBOs to be mapped during execution.
There's no reason we can't -- the mappings we expose are basically
equivalent to persistent/coherent, already.

Improves mesa-demos drawoverhead (no state change) performance by
5.21362% +/- 1.25078% (n=11).
2017-06-20 09:05:44 -07:00
Brian Paul
d8148ed10a gallium/vbuf: avoid segfault when we get invalid glDrawRangeElements()
A common user error is to call glDrawRangeElements() with the 'end'
argument being one too large.  If we use the vbuf module to translate
some vertex attributes this error can cause us to read past the end of
the mapped hardware buffer, resulting in a crash.

This patch adjusts the vertex count to avoid that issue.  Typically,
the vertex_count gets decremented by one.

This fixes crashes with the Unigine Tropics and Sanctuary demos with older
VMware hardware versions.  The issue isn't hit with VGPU10 because we
don't hit this fallback.

No piglit changes.

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 08:03:18 -06:00
Brian Paul
2a9d8a45a6 gallium/vbuf: add some const qualifiers
Helps understandability a bit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 08:03:12 -06:00
Brian Paul
ed83e73c4e translate: whitespace fixes in translate_generic.c 2017-06-20 07:56:34 -06:00
Brian Paul
ceb9ca7fa5 softpipe: remove unused softpipe_context::line_stipple_counter
Trivial.
2017-06-20 07:56:34 -06:00
Samuel Pitoiset
ea2492b62f radeonsi: set correct usage flag according to image access type
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 13:01:18 +02:00
Marek Olšák
58af1f6bb0 winsys/amdgpu: fix a deadlock when waiting for submission_in_progress
First this happens:

1) amdgpu_cs_flush (lock bo_fence_lock)
   -> amdgpu_add_fence_dependency
   -> os_wait_until_zero (wait for submission_in_progress) - WAITING

2) amdgpu_bo_create
   -> pb_cache_reclaim_buffer (lock pb_cache::mutex)
   -> pb_cache_is_buffer_compat
   -> amdgpu_bo_wait (lock bo_fence_lock) - WAITING

So both bo_fence_lock and pb_cache::mutex are held. amdgpu_bo_create can't
continue. amdgpu_cs_flush is waiting for the CS ioctl to finish the job,
but the CS ioctl is trying to release a buffer:

3) amdgpu_cs_submit_ib (CS thread - job entrypoint)
   -> amdgpu_cs_context_cleanup
   -> pb_reference
   -> pb_destroy
   -> amdgpu_bo_destroy_or_cache
   -> pb_cache_add_buffer (lock pb_cache::mutex) - DEADLOCK

The simple solution is not to wait for submission_in_progress, which we
need in order to create the list of dependencies for the CS ioctl. Instead
of building the list of dependencies as a direct input to the CS ioctl,
build the list of dependencies as a list of fences, and make the final list
of dependencies in the CS thread itself.

Therefore, amdgpu_cs_flush doesn't have to wait and can continue.
Then, amdgpu_bo_create can continue and return. And then amdgpu_cs_submit_ib
can continue.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101294

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-20 12:53:46 +02:00
Samuel Pitoiset
afeaa2e98a radeonsi: update all resident texture descriptors when needed
To avoid useless DCC fetches when DCC is disabled, descriptors
have to be updated in order to reflect this change. This is
quite similar to how we update descriptors of bound textures.

As a side effect, this should also prevent VM faults when
bindless textures are invalidated, because the VA in the
descriptor has to be updated accordingly as well.

I don't see any performance improvements with DOW3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 10:14:55 +02:00
Samuel Pitoiset
f00e80e3f7 radeonsi: keep track of the sampler state for texture handles
Needed for updating all resident texture descriptors when
dirty_tex_counter changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 10:14:52 +02:00
Marek Olšák
3fc99f1299 radeonsi: fix dumping shader descriptors into ddebug logs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:16:20 +02:00
Marek Olšák
f9dc29a9a5 radeonsi: add a workaround for inexact SNORM8 blitting again
GFX9 is affected.

We only have tests for GL_x_SNORM where x is R8, RG8, RGB8, and RGBA8.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
0f827b51c0 radeonsi/gfx9: fix TC-compatible stencil compression
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
8a264dd829 radeonsi/gfx9: fix TXF_LZ with 1D textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák
353b60cab5 radeonsi/gfx9: disable sparse buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Nicolai Hähnle
25e5534734 gallium/radeon/gfx9: fix PBO texture uploads to compressed textures
st/mesa creates a surface that reinterprets the compressed blocks as
RGBA16UI or RGBA32UI. We have to adjust width0 & height0 accordingly to
avoid out-of-bounds memory accesses by CB.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-19 12:05:15 +02:00
Nicolai Hähnle
4d5bb1b987 r600: fix off-by-one in egd_tables.py
Port of the corresponding fix in sid_tables.py.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-19 12:05:12 +02:00
Samuel Pitoiset
6ff6863c32 radeonsi: reduce overhead for resident textures which need color decompression
This is done by introducing a separate list.

si_decompress_textures() is now 5x faster.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:38 +02:00
Samuel Pitoiset
06ed251c32 radeonsi: reduce overhead for resident textures which need depth decompression
This is done by introducing a separate list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:36 +02:00
Samuel Pitoiset
705a6a560e radeonsi: use util_dynarray_foreach for bindless resources
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:34 +02:00
Samuel Pitoiset
8d9e76ce1f gallium/radeon: add a new HUD query for the number of resident handles
Useful for debugging performance issues when ARB_bindless_texture
is enabled. This query doesn't make a distinction between texture
and image handles.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:08:08 +02:00
Emil Velikov
68aa39d5c2 r600: include libelf headers only as needed
Headers are required only when building with OpenCL. As we're building
w/o it libelf may be missing, hence we'll error out as below:

src/gallium/drivers/r600/evergreen_compute.c:27:10:
fatal error: 'gelf.h' file not found
         ^
1 error generated.

Fixes: d96a210842 ("r600g,compute: provide local copy of functions from
ac_binary.c")
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-17 16:57:18 +01:00
Emil Velikov
1f958c1337 radeonsi: include ac_binary.h for struct ac_shader_binary
The header embeds the struct so it needs the header inclusion instead of
the dummy forward declaration.

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Tom Stellard <tstellar@redhat.com>
Fixes: 32206c5e56 ("radeonsi: Add radeon_shader_binary member to struct
si_shader")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-17 11:38:02 +01:00
Emil Velikov
7e1c42cf89 r600, radeon: move radeon_shader_binary_{init,clean} back to radeon
Those are used by r600 and radeonsi, so moving them within the former
was a bad idea.

Fixes: d96a210842 ("r600g,compute: provide local copy of functions
from ac_binary.c")
Cc: Jan Vesely <jan.vesely@rutgers.edu>
Cc: Aaron Watry <awatry@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-17 11:37:58 +01:00
Brian Paul
e3f5b8ac16 svga: add new num-failed-allocations HUD query
This counter is incremented if we fail to allocate memory for
vertex/index/const buffers, textures, etc.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-16 17:04:08 -06:00
Brian Paul
b27281c110 gallium/hud: support GALLIUM_HUD_DUMP_DIR feature on Windows
Use a dummy implementation of the access() function.  Use \ path separator.
Add a few comments.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-16 17:04:02 -06:00
Brian Paul
d6cb912d65 svga: add a few minor comments
Trivial.
2017-06-16 17:03:01 -06:00
Tim Rowley
a6237e4b7f swr/rast: Fix read-back of viewport array index
Binner/clipper read viewport array index from the vertex header as needed.
Move viewport state to BACKEND_STATE.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
9b448da60f swr/rast: Refactor includes to limit simdintrin.h usage
Reduces the files rebuilt after modifying simdintrin.h from
84 to 64.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
08a466aec0 swr/rast: Fix read-back of render target array index
The last FE stage can emit render target array index. Currently we only
check to see if GS is emitting it. Moved the state to BACKEND_STATE and
plumbed the driver to set it.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
17cdd1e796 swr/rast: Adjust cast for gcc warning
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
bea00a7b6e swr/rast: Don't transition hottile resolved->dirty during store tiles
Fixes crash when dumping render targets and RT surface has been deleted.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
5c08bfbd17 swr/rast: gen_llvm_types.py support for SIMD256/SIMD512
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
21baadfe58 swr/rast: Properly size GS stage scratch space
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
3695c8ec1e swr/rast: Fix early z / query interaction
For certain cases, we perform early z for optimization. The GL_SAMPLES_PASSED
query was providing erroneous results because we were counting the number
of samples passed before the fragment shader, which did not work if the
fragment shader contained a discard.

Account properly for discard and early z, by anding the zpass mask with
the post fragment shader active mask, after the fragment shader.

Fixes the following piglit tests:
    - occlusion-query-discard
    - occlusion_query_meta_fragments

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
b7eb86c617 swr/rast: Share vertex memory between VS input/output
Removes large simdvertex stack allocation.

Vertex shader must ensure reads happen before writes.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
7f3be3f0b8 swr/rast: Add support for dynamic vertex size for VS output
Add support for dynamic vertex size for the vertex shader output.

Add new state in SWR_FRONTEND_STATE to specify the size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
8e5d11cd7b swr/rast: SIMD16 FE - improve calcDeterminantIntVertical
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
01eca81cd4 swr/rast: Add support to PA for variable sized vertices
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
b10cdb217a swr/rast: Rework attribute layout
Move fixed attributes to the top and pack single component SGVs.
WIP to support dynamically allocated vertex size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
36ac8ba511 swr/rast: Remove explicit primitive id slot in the vertex layout
- Remove any special casing in the PS stage when primitive ID is input.
  Treat as a normal attribute that must be set up properly in the FE linkage.
- Remove primitive id from the PS_CONTEXT and TRI_FLAGS

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
8716e0d8b4 swr/rast: Fix invalid 16-bit format traits for A1R5G5B5
Correctly handle formats of <= 16 bits where the component bits don't
add up to the pixel size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Tim Rowley
a25093de71 swr/rast: Implement JIT shader caching to disk
Disabled by default; currently doesn't cache shaders (fs,gs,vs).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-06-16 16:20:16 -05:00
Brian Paul
1c33dc77f7 gallium/docs: improve docs for SAMPLE_POS, SAMPLE_INFO, TXQS, MSAA semantics
For the SAMPLE_POS and SAMPLE_INFO opcodes, clarify resource vs. render
target queries, range of postion values, swizzling, etc.  We basically
follow the DX10.1 conventions.

For the TXQS opcode and TGSI_SEMANTIC_SAMPLEID, clarify return value
and type.

For the TGSI_SEMANTIC_SAMPLEPOS system value, clarify the range of
positions returned.

v2: use 'undef' for unused vector components.  Use (0.5, 0.5, undef, undef)
for sample pos when MSAA not applicable.

v3: Add note that OPCODE_SAMPLE_INFO, OPCODE_SAMPLE_POS are not used yet
and the information is subject to change.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-06-16 14:07:31 -06:00
Brian Paul
005c978c5a svga: add some missing SVGA_STATS_* enum values, prefix strings
To fix the build when VMX86_STATS is defined.
Also, some minor whitespace changes to match upstream code.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 14:06:53 -06:00
Bruce Cherniak
80b587ba27 swr: Don't crash when encountering a VBO with stride = 0.
The swr driver uses vertex_buffer->stride to determine the number
of elements in a VBO. A recent change to the state-tracker made it
possible for VBO's with stride=0. This resulted in a divide by zero
crash in the driver. The solution is to use the pre-calculated vertex
element stream_pitch in this case.

This patch fixes the crash in a number of piglit and VTK tests introduced
by 17f776c27b.

There are several VTK tests that still crash and need proper handling of
vertex_buffer_index.  This will come in a follow-on patch.

v2: Correctly update all parameters for VBO constants (stride = 0).
    Also fixes the remaining crashes/regressions that v1 did
    not address, without touching vertex_buffer_index.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-06-16 13:45:24 -05:00
Christian Gmeiner
82db591155 etnaviv: add rs-operations sw query
It could be useful to get the number of emited resolve operations when
doing driver optimizations.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-16 15:28:12 +02:00
Lucas Stach
5065549e2a etnaviv: advertise correct max LOD bias
The maximum LOD bias supported is the same as the max texture level
supported.

Fixes piglit: ext_texture_lod_bias

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
8644b59b5d etnaviv: mask correct channel for RB swapped rendertargets
Now that we support RB swapped targets by using a shader variant, we
must derive the color mask from both the blend state and the bound
framebuffer.

Fixes piglit: fbo-colormask-formats

Fixes: 7f62ffb68a ("etnaviv: add support for rb swap")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
d6aa2ba2b2 etnaviv: replace translate_clear_color with util_pack_color
This replaces the open coded etnaviv version of the color pack with the
common util_pack_color.

Fixes piglits:
arb_color_buffer_float-clear
fcc-front-buffer-distraction
fbo-clearmipmap

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
6633880e7e etnaviv: remove bogus assert
etna_resource_copy_region handles resources with multiple samples
by falling back to the software path. There is no need to kill the
application there.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00