Commit graph

32240 commits

Author SHA1 Message Date
Marek Olšák
7b616f7b71 radeonsi: PIPE_BIND_SHARED should allow inter-process sharing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-18 17:47:49 +02:00
Nicolai Hähnle
f0233ac82d freedreno: compile fix
Fixes: 3f6b3d9db ("gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE")
Reported-by: Jan Vesely <jan.vesely@rutgers.edu>
2017-09-18 17:39:20 +02:00
Jan Vesely
30741187c1 clover: add missing include to compat.h
Fixes build issues with llvm-3.6
Fixes: 3115687f9b (clover: Fix build after
LLVM r313390)

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-09-18 16:32:09 +01:00
Jan Vesely
fdf0f1db22 clover: Query and export half precision support
v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16
    has_halfs -> has_halves

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-09-18 10:45:02 -04:00
Jan Vesely
7b2c5547c3 gallium: Add PIPE_SHADER_CAP_FP16
Denotes native half precision float operations capability
v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16
    fix indentation

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-18 10:45:02 -04:00
Benedikt Schemmer
c302f8fa7c nvc0: fix compile error
Fixes: 3f6b3d9db ("gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE")
Signed-off-by: Benedikt Schemmer <ben@besd.de>
Previously-pointed-out-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-18 15:31:35 +02:00
Nicolai Hähnle
7a62f8621a radeonsi: allow out-of-order rasterization in commutative blending cases
We do not enable this by default for additive blending, since it slightly
breaks OpenGL invariance guarantees due to non-determinism.

Still, there may be some applications can benefit from white-listing
via the radeonsi_commutative_blend_add drirc setting without any real
visible artifacts.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-18 11:25:20 +02:00
Nicolai Hähnle
8c56c45cd4 radeonsi: add drirc option "radeonsi_assume_no_z_fights"
This option enables a performance optimization where typical non-blending
draws with depth buffer may be rasterized out-of-order (on VI+, multi-SE
chips).

This optimization can lead to incorrect results when an applications
renders multiple objects with the same Z value at the same pixel, so we
will never enable it by default. But there may be applications that could
benefit from white-listing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-18 11:25:19 +02:00
Nicolai Hähnle
aab134cfa5 radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs
This does not take commutative blending into account yet.

R600_DEBUG=nooutoforder disables it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-18 11:25:19 +02:00
Nicolai Hähnle
66d03d0e3e gallium/radeon: pass old_(perfect_)enable to set_occlusion_query_state
The callee can derive the current enable state itself.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-18 11:25:19 +02:00
Nicolai Hähnle
3f6b3d9db7 gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE
To be able to properly distinguish between GL_ANY_SAMPLES_PASSED
and GL_ANY_SAMPLES_PASSED_CONSERVATIVE.

This patch goes through all drivers, having them treat the two
query types identically, except:

1. radeon incorrectly enabled conservative mode on
   PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only
   on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE.
2. st/mesa uses the new query type.

Fixes dEQP-GLES31.functional.fbo.no_attachments.*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-18 11:25:18 +02:00
Nicolai Hähnle
6772452e4c amd/common: remove has_ds_bpermute argument from ac_build_ddxy
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-18 11:25:18 +02:00
Nicolai Hähnle
3db86d86ed amd/common: add chip_class to ac_llvm_context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-18 11:25:18 +02:00
Nicolai Hähnle
e0af3bed2c amd/common: round cube array slice in ac_prepare_cube_coords
The NIR-to-LLVM pass already does this; now the same fix covers
radeonsi as well.

Fixes various tests of
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-09-18 11:25:18 +02:00
Nicolai Hähnle
6fb0c1013b radeonsi: workaround for gather4 on integer cube maps
This is the same workaround that radv already applied in commit
3ece76f03d ("radv/ac: gather4 cube workaround integer").

Fixes dEQP-GLES31.functional.texture.gather.basic.cube.rgba8i/ui.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-18 11:25:17 +02:00
Jan Vesely
3115687f9b clover: Fix build after LLVM r313390
v2: pass llvm context reference instead of a pointer

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-09-15 21:39:54 -04:00
Gurkirpal Singh
6a8aa11c20 st/omx_bellagio: Rename state tracker and option
Changes --enable-omx option to --enable-omx-bellagio

Signed-off-by: Gurkirpal Singh <gurkirpal204@gmail.com>
Reviewed-and-Tested-by: Julien Isorce <julien.iso...@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-09-15 14:28:36 +02:00
Dave Airlie
1b163238f5 r600: add .gitignore for egd_tables.h 2017-09-15 13:55:01 +10:00
Timothy Arceri
a70a401f52 radeonsi: enable STD430 packing of UBOs by default
Before this change we were defaulting to STD140 which is slightly
less efficient at packing arrays.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-15 11:42:55 +10:00
Timothy Arceri
c96e45ebf0 gallium: introduce PIPE_CAP_LOAD_CONSTBUF
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-15 11:42:55 +10:00
Timothy Arceri
b4401cc104 radeonsi: make use of LOAD for UBOs
v2: always set can_speculate and allow_smem to true

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-15 11:42:55 +10:00
Timothy Arceri
6fa60b5e40 gallium: add CONSTBUF type to tgsi_file_type
This will be use to distinguish between load types when using
the TGSI_OPCODE_LOAD opcode.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-15 11:42:54 +10:00
Dave Airlie
b6f6ead198 virgl: drop const dimensions on first block.
The virgl protocol version of tgsi doesn't handle this yet,
transform it back to the old ways.

Thanks to Nicolai Hähnle <nicolai.haehnle@amd.com>
for also writing nearly the same patch.

Fixes: 41e342d5 tgsi/ureg: always emit constants (and their decls) as 2D
Tested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-15 10:33:14 +10:00
Samuel Pitoiset
f0d09d9012 radeonsi: move si_get_wave_info() to AMD common code
This will allow us to use it from radv.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Eric Engestrom
396d2dbce4 swr: use ARRAY_SIZE macro
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-14 09:36:01 +01:00
Denis Pauk
74d2456491 gallium/{r600, radeonsi}: Fix segfault with color format (v2)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102552

v2: Patch cleanup proposed by Nicolai Hähnle.
    * deleted changes in si_translate_texformat.

Cc: Nicolai Hähnle <nhaehnle@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-09-14 00:59:24 +02:00
Nicolai Hähnle
e4af4433fc radeonsi: hard-code pixel center for interpolateAtSample without multisample buffers
The GLSL rules for interpolateAtSample are unfortunate:

   "Returns the value of the input interpolant variable at
    the location of sample number sample. If
    multisample buffers are not available, the input
    variable will be evaluated at the center of the pixel.
    If sample sample does not exist, the position used to
    interpolate the input variable is undefined."

This fix will fallback to monolithic shader compilation when
interpolateAtSample is used without multisampling.

One alternative would be to always upload 16 sample positions,
filling the buffer up with repetition when the actual number of
samples is less, and then ANDing the sample ID with 0xf. However,
that punishes all well-behaving users of interpolateAtSample,
when in reality, only conformance tests should be affected by
the issue.

Fixes
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:25:45 +02:00
Nicolai Hähnle
92c4277990 radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog
gl_SampleMaskIn is supposed to contain set bits only for the samples that
are covered by the current fragment shader invocation, but the VGPR
initialization hardware loads the set of all bits that are covered at the
current pixel.

Fixes various tests in
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:25:41 +02:00
Nicolai Hähnle
792724a337 radeonsi: remove SET_PREDICATION workaround on newer firmware
We need to keep the workaround for older firmware, though.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:25:08 +02:00
Nicolai Hähnle
b8c6e88848 amd/common: get ME/PFP/CE firmware feature versions as well
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:25:06 +02:00
Nicolai Hähnle
8d8f1ef573 radeonsi: rename variable to clarify its meaning
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:24:18 +02:00
Nicolai Hähnle
48b3364b5b radeonsi: make si_init_shader_selector_async static
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:24:18 +02:00
Nicolai Hähnle
7e4344151f radeonsi: fix segfault in descriptor dumping
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:24:18 +02:00
Nicolai Hähnle
81f398dcb1 ddebug: write out final driver log messages with GALLIUM_DDEBUG=always
If the last operation happens to be a non-draw, such as a
transfer_map that triggers a decompress blit, there may be
interesting messages left in the driver log.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:24:18 +02:00
Tim Rowley
000e2958f5 swr/rast: Fetch compile state changes
Add InstanceStrideEnable field and rename InstanceDataStepRate to
InstanceAdvancementState in INPUT_ELEMENT_DESC structure.

Add stubs for handling InstanceStrideEnable in FetchJit::JitLoadVertices()
and FetchJit::JitGatherVertices() and assert if they are triggered.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:54 -05:00
Tim Rowley
ead0dfe31e swr/rast: adjust linux cpu topology identification code
Make more robust to handle strange strange configurations like a vmware
exported 4-way numa X 1-core configuration.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:47 -05:00
Tim Rowley
1ccf9ad280 swr/rast: Missed conversion to SIMD_T
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:41 -05:00
Tim Rowley
c0ce5c4422 swr/rast: whitespace changes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:35 -05:00
Tim Rowley
6b9e801832 swr/rast: add graph write to jit debug putput
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:30 -05:00
Tim Rowley
6f0fcec07a swr/rast: Migrate memory pointers to gfxptr_t type
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:24 -05:00
Tim Rowley
ae2412dbbd swr/rast: Remove hardcoded clip/cull slot from clipper
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:18 -05:00
Tim Rowley
5471f65976 swr/rast: Start to remove hardcoded clipcull_dist vertex attrib slot
Add new field in SWR_BACKEND_STATE::vertexClipCullOffset to specify the
start of the clip/cull section of the vertex header.  Removed use of
hardcoded slot from binner.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:11 -05:00
Tim Rowley
9669972692 swr/rast: Move clip/cull enables in API
Moved from from SWR_RASTSTATE to SWR_BACKEND_STATE.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:09:04 -05:00
Tim Rowley
f5031fb952 swr/rast: Add new API SwrStallBE
SwrStallBE stalls the backend threads until all work submitted before
the stall has finished.  The frontend threads can continue to make
forward progress.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13 10:08:46 -05:00
Marek Olšák
4ba20c9473 Revert "winsys/amdgpu: disable local BOs on Raven"
This reverts commit 1cda9a2fee.

It works now.
2017-09-12 22:44:02 +02:00
Marek Olšák
6eade342eb radeonsi: optimize TCS epilog when invocation 0 writes tess factors
This removes the barrier and LDS stores and loads for tess factors
when it's possible. The removal of the barrier seems more important
to me though.

In one shader, it removes 17 * 4 bytes from the shader binary.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-11 19:02:02 +02:00
Marek Olšák
386d165d8d tgsi/scan: add a new pass that analyzes tess factor writes (v2)
The pass tries to deduce whether tess factors are always written by
all shader invocations.

The implication for radeonsi is that it doesn't have to use a barrier
near the end of TCS, and doesn't have to use LDS for passing the tess
factors to the epilog.

v2: Handle barriers and do the analysis pass for each code segment
    surrounded by barriers separately, and AND results from all
    such segments writing tess factors. The change is trivial in the main
    switch statement.

    Also, the result is renamed to "tessfactors_are_def_in_all_invocs"
    to make the name accurate.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-11 19:02:02 +02:00
Marek Olšák
a2a326e8f8 winsys/amdgpu: use the new raw CS API
This also cleans things up.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-11 16:29:52 +02:00
Marek Olšák
3824ca7610 radeonsi: implement pipe_context::fence_server_sync
This will be more useful once we have sync_file support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-11 16:29:52 +02:00
Marek Olšák
8843bf6dfd winsys/amdgpu: factor out some fence dependency code into separate functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-11 16:29:52 +02:00