Commit graph

20494 commits

Author SHA1 Message Date
Roland Scheidegger
71e630753e r600: set DX10_CLAMP for compute shader too
I really intended to set this for all shader stages by
3835009796 but missed it for compute shaders
(because it's in a different source file...).

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-23 02:28:38 +01:00
Gert Wollny
799d350870 r600/shader: Fix all warnings issed with "-Wall -Wextra"
- fix a number of -Wsign-compare warnings
- fix two warnings for -Woverride-init because TGSI_OPCODE_CEIL == 83, and
  the according field was defined two times.

[airlied: don't use -1 with unsigned type,
fix whitespace]

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-22 22:50:18 +00:00
Gert Wollny
1d076aafbc r600: Emit EOP for more CF instruction types
So far on pre-cayman chipsets the CF instructions CF_OP_LOOP_END,
CF_OP_CALL_FS, CF_OP_POP, and CF_OP_GDS an extra CF_NOP instruction
was added to add the EOP flag, even though this is not actually
needed, because all these instrutions support the EOP flag.

This patch removes the fixup code, adds setting the EOP flag for the
according instructions as well as others like CF_OP_TEX and CF_OP_VTX,
and adds writing out EOP for this type of instruction in the disassembler.

This also fixes a bug where shaders were created that didn't actually have
the EOP flag set in the last CF instruction, which might have resulted
in GPU lockups.

[airlied: cleaned up a little]
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-22 22:39:42 +00:00
Eric Anholt
6a78416dab broadcom/vc5: Fix BASE_LEVEL handling with txl.
The HW doesn't add the base level anywhere (the min/max lod clamping is
what does base level), so we need to add it manually in this case.

Fixes piglit tex-miplevel-selection *Lod 2D.
2017-11-22 10:56:31 -08:00
Eric Anholt
c55813c22e broadcom/vc5: Fix array texture layer count setup.
Fixes piglit array-texture.
2017-11-22 10:56:31 -08:00
Eric Anholt
ad1521d708 broadcom/vc5: Don't increment primitive queries while they're paused.
Fixes ext_transform_feedback-generatemipmap prims_generated
2017-11-22 10:56:31 -08:00
Eric Anholt
1214c2ea2a broadcom/vc5: Fix incorrect padding of TF outputs.
After the first output, we were padding by an extra size of the previous
output.  Fixes piglit ext_transform_feedback-output-type mat4x3[2] and
friends.
2017-11-22 10:56:31 -08:00
Eric Anholt
b18840ac6e broadcom/vc5: Fix UIF surface size setup for ARB_fbo's mismatched sizes.
The HW was computing an implicit height for the surface based on the image
size, but that may be smaller than the surface with ARB_fbo mismatched
sizes.  In that case, we need to tell it about the pad, either with the
little 4-bit field in the RT config, or the extended field in
CLEAR_COLORS_PART3.

Fixes piglit arb_framebuffer_object-mixed-buffer-sizes.
2017-11-22 10:56:31 -08:00
Wladimir J. van der Laan
9f162fa107 etnaviv: Put HALTI level in specs
The HALTI level is an indication of the gross architecture of the GPU.
It determines for significant part what feature level the GPU has, what
state (especially frontend state) is there, and where it is located.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-11-22 14:42:06 +01:00
Wladimir J. van der Laan
391c958f08 etnaviv: Const-correctness etnaviv_emit.h
The relocation structure is never changed by submitting it.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-11-22 14:42:00 +01:00
Roland Scheidegger
b5957cee92 llvmpipe: fix snorm blending
The blend math gets a bit funky due to inverse blend factors being
in range [0,2] rather than [-1,1], our normalized math can't really
cover this.
src_alpha_saturate blend factor has a similar problem too.
(Note that piglit fbo-blending-formats test is mostly useless for
anything but unorm formats, since not just all src/dst values are
between [0,1], but the tests are crafted in a way that the results
are between [0,1] too.)

v2: some formatting fixes, and fix a fairly obscure (to debug)
issue with alpha-only formats (not related to snorm at all), where
blend optimization would think it could simplify the blend equation
if the blend factors were complementary, however was using the
completely unrelated rgb blend factors instead of the alpha ones...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-11-21 04:06:29 +01:00
Dave Airlie
464c2d8083 r600: add cull distance support
This passes all the tests in piglit.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-21 09:00:52 +10:00
Eric Anholt
494effd242 broadcom/vc5: Align 1D texture miplevels to 64b.
Fixes tex-miplevel-selection GL2:texture() 1D
2017-11-20 13:54:45 -08:00
Eric Anholt
9d5972da80 broadcom/vc5: Clamp min lod to the last level.
Otherwise, the simulator would complain in tex-miplevel-selection that the
min/max clamp was out of order.  The actual HW seems to have clamped to
the max anyway.
2017-11-20 13:52:33 -08:00
Eric Anholt
2c8913e224 broadcom/vc5: Increase simulator memory for tex-miplevel-selection.
We were overflowing, because of all the little 4k allocations for CLs that
were getting expanded to 128kb in the simulator due to the GMP alignment.
2017-11-20 13:52:33 -08:00
Tim Rowley
34838c2212 swr/rast: Repair simd8 frontend code rot
Keep non-default simd8 frontend code running for comparison purposes.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:51:10 -06:00
Tim Rowley
005d937e15 swr/rast: Implement AVX-512 GATHERPS in SIMD16 fetch shader
Disabled for now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:51:06 -06:00
Tim Rowley
2e244c7168 swr/rast: Simplify GATHER* jit builder api
General cleanup, and prep work for possibly moving to llvm masked
gather intrinsic.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:51:01 -06:00
Tim Rowley
44025def06 swr/rast: Add alignment to transpose targets
Needed to ensure alignment for avx512.

Fixes address sanitizer crash.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:56 -06:00
Tim Rowley
bc356b0fc0 swr/rast: Cache eventmanager
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:51 -06:00
Tim Rowley
395a298fa5 swr/rast: Enable AVX-512 targets in the jitter
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:45 -06:00
Tim Rowley
37bb69fb88 swr/rast: Points with clipdistance can't go through simplepoints path
Fixes piglit glsl-1.20:vs-clip-vertex-primitives and
glsl-1.30:vs-clip-distance-primitives.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:38 -06:00
Tim Rowley
d9de8f3122 swr/rast: Code style change (NFC)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:29 -06:00
Tim Rowley
08512c52de swr/rast: Widen fetch shader to SIMD16
Widen fetch shader to SIMD16, enable SIMD16 types in the jitter,
and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:23 -06:00
Tim Rowley
e612231f20 swr/rast: Support flexible vertex layout for DS output
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:49:59 -06:00
Nicolai Hähnle
3f17d3c017 gallium/u_threaded: avoid syncing in threaded_context_flush
We could always do the flush asynchronously, but if we're going to wait
for a fence anyway and the driver thread is currently idle, the additional
communication overhead isn't worth it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:15 +01:00
Nicolai Hähnle
bc65dcab3b radeonsi: avoid syncing the driver thread in si_fence_finish
It is really only required when we need to flush for deferred fences.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:11 +01:00
Nicolai Hähnle
3db1ce01b1 radeonsi: recompute the relative timeout after waiting for ready fence
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:06 +01:00
Nicolai Hähnle
f5ea8d18ff ddebug: fix the hang detection timeout calculation
Fixes: c9fefa062b ("ddebug: rewrite to always use a threaded approach")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:03 +01:00
Nicolai Hähnle
16f8da2997 ddebug: fix use-after-free of streamout targets
Fixes: b47727a83a ("ddebug: implement pipelined hang detection mode")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:00 +01:00
Nicolai Hähnle
df5ebe0c26 radeonsi/gfx9: fix VM fault with fetched instance divisors
We need to account for SGPR locations in merged shaders.

This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations

Fixes: 79c2e7388c ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 16:26:10 +01:00
Eric Anholt
514db90448 broadcom/vc5: Fix up integer texture handling.
The original spec I had didn't expose integer textures and suggested that
you use unfiltered floats.  Now there are proper formats for them.

Fixes 16- and 32-bit texwrap integer tests in piglit, and
dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0.rgb10_a2ui.
2017-11-19 10:12:30 -08:00
Eric Anholt
65ae4527d9 broadcom/vc5: Fix simulator assertion failures about color RT clears.
When we tried to clear color while storing depth, it assertion failed
about basically not having enough information to decide which color RT to
clear.  It turns out the STORE_GENERAL picks the buffer according to the
color buffer being stored, or all of them if NONE.  If you're doing depth,
it doesn't know which to pick.
2017-11-19 10:12:30 -08:00
Rob Clark
ae44845aff freedreno/ir3: add texture gather support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-18 13:39:39 -05:00
Lucas Stach
f5d477f447 etnaviv: enable full overwrite when no color buffer is present
The OVERWRITE bit disables destination fetches, which is exactly what
we want when there is no valid color buffer bound.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-11-18 12:33:49 +01:00
Eric Anholt
8d5994098f broadcom/vc5: Set up the padded height at surface creation time.
This centralizes the calculation in the surface, instead of in each
load/store.
2017-11-17 16:09:55 -08:00
Eric Anholt
c40ac132e4 broadcom/vc5: Fix clear color for swap_color_rb render targets.
Fixes dEQP-GLES3.functional.depth_stencil_clear.depth.*
2017-11-17 16:09:55 -08:00
Eric Anholt
52f3e9e43c broadcom/vc5: Fix pasteo in front stencil ref value setup.
Fixes piglit masked-clear.
2017-11-17 16:09:55 -08:00
Eric Anholt
b63dd626b7 broadcom/vc5: Fix colormasking when we need to swap r/b colors.
Fixes part of piglit masked-clear.
2017-11-17 16:09:55 -08:00
Eric Anholt
2daf941a58 broadcom/vc5: Enable the Z min/max clipping planes. 2017-11-17 16:09:55 -08:00
Eric Anholt
c259bf686c broadcom/vc5: Fix driver for new PIPE_SHADER_CAP_MAX_HW_ATOMIC_*. 2017-11-17 16:09:54 -08:00
Brian Paul
2b5b6bebac r300: add PIPE_SHADER_CAP_MAX_HW_ATOMIC_COUNTER* switch cases
To silence compiler warnings.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-17 16:09:59 -07:00
Boyuan Zhang
3b7fd35d01 radeon/video: enable encode support for raven
Enable h.264 encode for vcn hardware (raven)

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-11-17 12:25:47 -05:00
Boyuan Zhang
549a41ed9d radeonsi: enable vcn encode
Enable vcn encode by creating radeon_encoder for vcn.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-11-17 12:25:47 -05:00
Boyuan Zhang
fe50797d93 radeon/vcn: add create encoder
Add implementation for create_encoder interface for vcn encode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-11-17 12:25:47 -05:00
Boyuan Zhang
3c53fbbc87 radeon/vcn: add encode get feedback
Add implementation for get_feedback interface for vcn encode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-11-17 12:25:47 -05:00
Boyuan Zhang
bc9644460d radeon/vcn: add encode destroy
Add implementation for destroy interface for vcn encode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-11-17 12:25:47 -05:00
Boyuan Zhang
3f83c24366 radeon/vcn: add encode end frame
Add implementation for end_frame interface for vcn encode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-11-17 12:25:47 -05:00
Boyuan Zhang
47443bc9f0 radeon/vcn: add encode bitstream
Add implementation for encode_bitstream interface for vcn encode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-11-17 12:25:47 -05:00
Boyuan Zhang
f40fe728a1 radeon/vcn: add encode begin frame
Add implementation for begin_frame interface for vcn encode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-11-17 12:25:47 -05:00