Commit graph

92185 commits

Author SHA1 Message Date
Roland Scheidegger
db7e786a25 llvmpipe: (trivial) minimally simplify mask construction
simd instruction sets usually have comparisons for equal, not unequal.
So use a different comparison against the mask itself - which also means
we don't need a all-zero as well as a all-one (for the pxor) reg.

Also add code to avoid scalar expansion of i1 values which we definitely
shouldn't do. There's problems with this though with llvm select
interaction, so it's disabled (basically using llvm select instead of
intrinsics may still produce atrocious code, even in cases where we
figured it should not, albeit I think this could probably be fixed
with some better selection of optimization passes, but I have zero
idea there really).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Lionel Landwerlin
a8eeb089c0 anv: fix multiple creation with internal failure
The specification section 9.4 says :

   When an application attempts to create many pipelines in a single
   command, it is possible that some subset may fail creation. In that
   case, the corresponding entries in the pPipelines output array will
   be filled with VK_NULL_HANDLE values. If any pipeline fails
   creation (for example, due to out of memory errors), the
   vkCreate*Pipelines commands will return an error code. The
   implementation will attempt to create all pipelines, and only
   return VK_NULL_HANDLE values for those that actually failed.

Fixes :

   dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
   dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline

v2: C is hard let's go shopping (Lionel)

v3: Remove unnecessary condition in for loops (Lionel)

v4: Document why we return on first failure (Eduardo)
    Move i declaration inside for() (Eduardo)

v5: Move array cleanup out of loop (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-05 21:09:09 +00:00
Tim Rowley
33fa4c99f7 swr: [rasterizer core/common/jitter] gl_double support
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99214
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-05 14:10:36 -06:00
Fredrik Höglund
b6670157d7 dri3: Fix MakeCurrent without a default framebuffer
In OpenGL 3.0 and later it is legal to make a context current without
a default framebuffer.

This has been broken since DRI3 support was introduced.

Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 20:52:01 +01:00
Marek Olšák
e16245b339 radeonsi: turn SDMA IBs into de-facto preambles of GFX IBs
Draw calls no longer flush SDMA IBs. r600_need_dma_space is
responsible for synchronizing execution between both IBs.

Initial buffer clears and fast clears will stay unflushed in the SDMA IB
(up to 64 MB) as long as the GFX IB isn't flushed either.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
cba9d59362 radeonsi: implement SDMA-based buffer clearing for SI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
29d6a367a6 radeonsi: do all math in bytes in SI DMA code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
9e1aa81dfe gallium/radeon: prevent SDMA stalls by detecting RAW hazards in need_dma_space
Call r600_dma_emit_wait_idle only when there is a possibility of
a read-after-write hazard. Buffers not yet used by the SDMA IB don't
have to wait.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
3be8336440 gallium/radeon: move unrelated code from dma_emit_wait_idle to need_dma_space
r600_dma_emit_wait_idle is going away in its current form.
The only difference is that the moved code is executed before DMA calls
instead of after them.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
973d7cd90a radeonsi: inline cik_sdma_do_copy_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
067a3237b9 radeonsi: also wait for SDMA in the clear_buffer CPU fallback
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
f6a1c2d883 radeonsi: simplify r600_resource typecasts in si_clear_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
a31a92e7ef radeonsi: always use SDMA for big buffer clears and first buffer uses
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
69f489dfa1 radeonsi: use SDMA in rvid_buffer_clear on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
9a3296bf1c radeonsi: use SDMA for initial clearing of DCC/CMASK/HTILE on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
d4c0ad4de8 radeonsi: implement SDMA-based buffer clearing for CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
431742dbba gallium/hud: increase the vertex buffer size for text
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
6d54cd75a8 gallium/hud: add an option to sort items below graphs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
80b8b9c8a4 gallium/hud: add an option to reset the color counter
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
a57e071e9e gallium/hud: allow more data sources per pane
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
e8bb97ce30 gallium/hud: add an option to rename each data source
useful for radeonsi performance counters

v2: allow specifying both : and =

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
d995115b17 gallium: remove TGSI_OPCODE_SUB
It's redundant with the source modifier.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
a4ace98a97 gallium: remove TGSI_OPCODE_ABS
It's redundant with the source modifier.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy
09d09b219e st/nine: Remove all usage of ureg_SUB in nine_shader
This is required to drop gallium SUB.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy
67cda68bba st/nine: Remove all usage of ureg_SUB in nine_ff
This is required to remove gallium SUB.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy
caf93f5311 st/nine: Do not map SUB and ABS to their gallium equivalent.
This is required for gallium SUB and ABS to be removed.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Eric Anholt
dbe0dd11b9 configure: Fix another bashism.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-05 09:24:28 -08:00
Marek Olšák
3477f67057 st/mesa: fix a segfault when prog->sh.data is NULL
Broken by:
   st/mesa: get Version from gl_program rather than gl_shader_program

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-05 17:11:03 +01:00
Emil Velikov
37f9262064 docs: add news item and link release notes for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-05 16:07:53 +00:00
Emil Velikov
934792b846 docs: add sha256 checksums for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c8ece92ded)
2017-01-05 16:07:53 +00:00
Emil Velikov
5cd9660302 docs: add release notes for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit bec04114d2)
2017-01-05 16:07:53 +00:00
Nayan Deshmukh
ee4b4791ab st/va: fix incorrect argument in vl_compositor_cleanup
This fixes the mistake introduced in commit
b6737a8bcd

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-05 16:40:06 +01:00
Tim Rowley
68ddcc6c28 swr: remove unneeded llvm version check
Old test caused breakage with llvm-svn (4.0.0svn), and not needed as
the minimum required llvm version for swr is 3.6.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-01-05 07:31:19 -06:00
George Kyriazis
36ad826548 swr: fix windows build break
wrap lp_bld_type.h around extern "C".
Windows decorates global variables, so when used from .cpp files, need
to use an undecorated version.

Also, removed related and unneeded code from swr_screen.cpp

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-05 07:30:18 -06:00
Marek Olšák
3753dc896d radeonsi: update clip_regs if clip_disable changes to fix a hang
This seems to fix the GPU hangs caused by:

commit ed3190b3f3
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Sun Nov 13 18:41:43 2016 +0100

    radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219

Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-05 14:01:18 +01:00
Marek Olšák
c7affbf687 st/mesa: enable GLSLOptimizeConservatively for drivers that want it
GLSL compilation now takes 24% less time with the Gallium noop driver.
I used my shader-db for the measurement. The difference for the whole
radeonsi driver can be ~10%.

The generated TGSI is mostly the same. For example, the compilation success
rate with a TGSI->GCN bytecode converter without any optimizations is
the same. Note that glsl_to_tgsi does its own copy propagation and simple
register allocation.

shader-db GCN report:
- Talos spills fewer SGPRs.
- DOTA 2 spills more SGPRs.
- The average shader-db score is better, but it's just due to randomness.

29045 shaders in 17564 tests
Totals:
SGPRS: 1325929 -> 1325017 (-0.07 %)
VGPRS: 1010808 -> 1010172 (-0.06 %)
Spilled SGPRs: 1432 -> 1399 (-2.30 %)
Spilled VGPRs: 93 -> 92 (-1.08 %)
Private memory VGPRs: 688 -> 688 (0.00 %)
Scratch size: 2540 -> 2484 (-2.20 %) dwords per thread
Code Size: 39336732 -> 39342936 (0.02 %) bytes
Max Waves: 217937 -> 217969 (0.01 %)

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák
96fe8834f5 glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák
0a5018c1a4 mesa: add gl_constants::GLSLOptimizeConservatively
to reduce the amount of GLSL optimizations for drivers that can do better.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák
e51baeb6c1 gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY
Drivers with good compilers don't need aggressive optimizations before TGSI.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák
d3cb79e043 glsl: run do_lower_jumps properly in do_common_optimizations
so that backends don't have to run it manually

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Kenneth Graunke
7c6b714cd0 i965: Print VS output VUE map in Vulkan too.
We need to move this to the shared layer.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-05 01:55:27 -08:00
Kenneth Graunke
480d6c1653 i965: Fix last slot calculations
If the VUE map has slots at the end which the shader does not write,
then we'd "flush" (constructing an URB write) on the last output it
actually wrote.  Then, we'd construct another SEND with EOT, but with
no actual payload data.  That's not legal.

For example, SSO programs have clip distance slots allocated no matter
what, but the shader may not write them.  If it doesn't write any user
defined varyings, then the clip distance slots will be the last ones.

Found while debugging
dEQP-VK.tessellation.shader_input_output.gl_position_vs_to_tcs_to_tes

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-05 01:54:52 -08:00
Iago Toral Quiroga
8dc92a5613 docs: Mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as done for i965/hsw+
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 09:34:36 +01:00
Iago Toral Quiroga
580c503ca2 docs: add GL_ARB_gpu_shader_fp64 and OpenGL 4.0 support for Intel Haswell.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-01-05 09:34:14 +01:00
Iago Toral Quiroga
a98f2e53e1 i965: add a kernel_features bitfield to intel screen
We can use this to track various features that may or may not be supported
by the hw / kernel. Currently, we usually do this by checking the generation
and supported command parser versions in various places thoughtout the driver
code. With this patch, we centralize all these checks in just once place at
screen creation time, then we just query the bitfield wherever we need to
check if a particular feature is supported.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Iago Toral Quiroga
e3123c8ca2 i965/gen7: Enable OpenGL 4.0 in Haswell when supported
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Iago Toral Quiroga
1f1b8def48 i965: get rid of brw->can_do_pipelined_register_writes
Instead, check the screen field directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Chris Wilson
02a44484f0 i965: Move the pipelined test for SO register access to the screen
Moving the test to the screen places it alongside the other global HW
feature tests that want to be shared between contexts.

Also, we need to know if we support pipelined register writes at
screen creation time so that we can tell if we can expose OpenGL 4.0
in gen7.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Samuel Iglesias Gonsálvez
ab1ec7de93 i965/disasm: remove printing hstride and width in align16 DF source regions
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-05 07:29:23 +01:00
Samuel Iglesias Gonsálvez
301fdfd838 vec4: use DIM instruction when loading DF immediates in HSW
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-05 07:29:13 +01:00