Commit graph

32177 commits

Author SHA1 Message Date
Roland Scheidegger
dcf2feadc3 gallivm: fix gather implementation a bit
gather is defined in terms of bilinear filtering, just without the filtering
part. However, there's actually some subtle differences required in our
implementation, because we use some tricks to simplify coord wrapping for the
two coords per direction.
For bilinear filtering, we don't care if we end up with an incorrect
texel, as long as the filter weight is 0.0 for it. Likewise, the order of
the texels doesn't actually matter (as long as they still have the correct
filter weight).
But for gather, these tricks lead to incorrect results.
Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions
which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix
right now, and noone really seems to care...).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-09-09 03:06:10 +02:00
Charmaine Lee
57d9222ef2 svga: abort shader translation upon indirect indexing of temporaries
This patch aborts shader translation upon indirect indexing of temporary
register on non-vgpu10 device. This prevents non-supported feature
sending to the device.

Tested wth MTT-piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-09-08 13:58:38 -06:00
Eric Engestrom
f77d06fb28 gallium/tests: use ARRAY_SIZE macro
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-08 10:29:40 +01:00
Eric Engestrom
db8c5ae853 r300: use ARRAY_SIZE macro
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-08 10:29:40 +01:00
Connor Abbott
b8a51c8c4b radeonsi: move the guts of ARB_shader_group_vote emission to ac
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08 04:12:49 +01:00
Connor Abbott
bd73b89792 radeonsi: move si_emit_ballot() to ac
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08 04:12:42 +01:00
Connor Abbott
ac27fa7294 radeonsi: move emit_optimization_barrier() to ac
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08 04:06:47 +01:00
Connor Abbott
c181d4f2b7 radeonsi: move llvm_get_type_size() to ac
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08 04:04:16 +01:00
Leo Liu
6e8ef53837 Revert "st/va: add enviromental variable to disable interlace"
This reverts commit 10dec2de2d.

The environment variable is no longer needed with the previous change

Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-07 13:32:36 -04:00
Leo Liu
15d4d44d9b st/va: move YUV content to deinterlaced buffer when reallocated for encoder
v2: use deinterlace common function
v3: make sure deinterlace only

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-07 13:32:36 -04:00
Leo Liu
cadeb73f6b st/va: reallocate the buffer if the layout isn't supported
So that it makes more clear for buffer reallocation based
on buffers layout for both decoder and encoder.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-07 13:32:36 -04:00
Leo Liu
78ec7400c5 vl/compositor: make vl_compositor_set_yuv_layer() static
Since it's no longer being called outside of compositor

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-07 13:32:36 -04:00
Leo Liu
9f32078c20 st/omx: use vl/compositor helper function for YUV deinterlacing
v2: separate helper function in different patch

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-07 13:32:36 -04:00
Leo Liu
a6da7e6c3a vl/compositor: make a helper function for YUV deinterlacing
The similar function is in OMX, and only used by OMX. Now have it
moved to vl/compositor for other state tracker to use later.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-07 13:32:36 -04:00
Marek Olšák
4bd2bdbb3c ac/surface: add radeon_surf::has_stencil for convenience
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 17:59:37 +02:00
Marek Olšák
7ec64bd88c radeonsi: don't read tcs_out_lds_layout.patch_stride from an SGPR
Same as before, writing TCS outputs to LDS is rare.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 13:00:07 +02:00
Marek Olšák
07fe10c75d radeonsi: don't read tcs_out_lds_layout.vertex_size from an SGPR
TCS outputs are usually not written to LDS, so no stats here.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 13:00:07 +02:00
Marek Olšák
89bf8668c2 radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HS
-44 bytes in a monolithic LS-HS binary.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 13:00:07 +02:00
Marek Olšák
f974bb768b radeonsi: don't read the LS output vertex stride from an SGPR in LS
Now it's able to generate ds_write2_b64 instead of ds_write2_b32.

-20 bytes in one shader binary. (having only 1 output)

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 13:00:07 +02:00
Marek Olšák
22f5dfd300 radeonsi: don't read the number of TCS out vertices from an SGPR in TCS
-16 bytes in one shader binary.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 13:00:07 +02:00
Marek Olšák
17dd4856a6 radeonsi: don't always apply the PrimID instancing bug workaround on SI
It looks like commit 391673af7a that should
have fixed the perf regression didn't really change much if anything.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 13:00:06 +02:00
Marek Olšák
a0823df148 radeonsi: remove 2 callbacks from si_shader_context
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 13:00:06 +02:00
Marek Olšák
1cda9a2fee winsys/amdgpu: disable local BOs on Raven
It hangs with a high degree of reproducibility.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 12:57:48 +02:00
Roland Scheidegger
6d9d6071ee llvmpipe, tgsi: hook up dx10 gather4 opcode
Trivial. We already support tg4 for legacy tex opcodes, so the actual
texture sampling code already handles it.
(Just like TG4, we don't handle additional capabilities and always sample
red channel.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-09-07 03:32:01 +02:00
Roland Scheidegger
de6810d9be llvmpipe, draw: increase shader cache limits
We're not particularly concerned with memory usage, if the tradeoff is
shader recompiles. And it's common for apps to have a lot of shaders
nowadays (and, since our shaders include a LOT of context state of course
we may create quite a bit more shaders even).
So quadruple the amount of shaders draw will cache (from 128 to 512).
For llvmpipe (fs shaders) quadruple the number of instructions, keep the
number of variants the same for now (only with very simple, non-texturing
shaders the variant limit could really be reached), and simplify the
definition, it's probably easier to just have one different definition
per branch...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-09-07 03:32:01 +02:00
Leo Liu
e1e3c0384b radeon/uvd: fix the assertion check for YUYV format
Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer")

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-06 15:53:18 -04:00
Tim Rowley
dad32fc61c swr/rast: FE/Clipper - unify SIMD8/16 functions using simdlib types
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-06 11:02:36 -05:00
Tim Rowley
1ebf6fc865 swr/rast: Remove use of C++14 template variable
SWR rasterizer must remain C++11 compliant.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-06 11:02:29 -05:00
Tim Rowley
9df5691fff swr/rast: SIMD16 FE remove templated immediates workaround
Fixed properly in gcc-compatible fashion.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-06 11:02:23 -05:00
Tim Rowley
404ac6da9e swr/rast: SIMD16 PA - rename Assemble_simd16 to Assemble
For consistency and to support overloading.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-06 11:02:17 -05:00
Tim Rowley
6cb20c9f3a swr/rast: FE/Binner - unify SIMD8/16 functions using simdlib types
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-06 11:02:12 -05:00
Tim Rowley
6afdc8732c swr/rast: Removed some trailing whitespace caught during review
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-06 11:02:06 -05:00
Tim Rowley
4edc5d8305 swr: set caps for VB 4-byte alignment
Needed to compensate for change to fetch jit requiring
alignment.

Fixes regressions in piglit: vertex-buffer-offsets and about
another hundred of the vs-input*byte* tests.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-06 11:01:59 -05:00
Tim Rowley
4475583f5e swr/rast: Allow gather of floats from fetch shader with 2-4GB offsets
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-06 11:01:39 -05:00
Nicolai Hähnle
45c5c44451 radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug
When the HS wave is empty, the hardware writes the LS VGPRs starting at
v0 instead of v2. Workaround by shifting them back into place when
necessary. For simplicity, this is always done in the LS prolog.

According to the hardware team, this will be fixed in future chips,
so take that into account already.

Note that this is not a bug fix, as the bug was already worked
around by commit 166823bfd2 ("radeonsi/gfx9: add a temporary workaround
for a tessellation driver bug"). This change merely replaces the
workaround by one that should be better.

v2: add workaround code to shader only when necessary
v3: clarify the prefer_mono comment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06 10:02:49 +02:00
Nicolai Hähnle
274f1dace7 amd/common: pass chip_class to ac_dump_reg
Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06 09:59:17 +02:00
Nicolai Hähnle
34124e412f radeonsi/gfx9: always flush DB metadata on framebuffer changes
This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06 09:57:08 +02:00
Charmaine Lee
c12ef63b69 svga: move index buffer bind flag assertion
The buffer bind flags can be promoted in svga_buffer_handle(), so
move the assertion after it. This has already been done for
vertex buffer in commit 6b4bf7e8be, but it misses the one for
index buffer.

Fixes assertion running WarThunder.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-09-05 10:31:18 -06:00
Charmaine Lee
98badd7f6e svga: avoid emitting redundant SetShaderResources and SetVertexBuffers
Minor performance improvement in avoiding binding the same shader resource
or the same vertex buffer for the same slot.

Tested with MTT glretrace.

v2: Per Brian's suggestion, add a helper function to do vertex buffer
    comparision.
v3: Change the helper function to vertex_buffers_equal().

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-09-05 10:31:18 -06:00
Marek Olšák
c3ebac6890 radeonsi/gfx9: implement primitive binning
This increases performance, but it was tuned for Raven, not Vega.
We don't know yet how Vega will perform, hopefully not worse.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05 12:09:02 +02:00
Marek Olšák
51e10c2770 radeonsi: add more state flags into si_state_dsa
3 flags for primitive binning, 2 flags for out-of-order rasterization
(but that will be done some other time)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05 12:09:02 +02:00
Marek Olšák
0797eea758 radeonsi/gfx9: don't use BREAK_BATCH and FLUSH_DFSM if DFSM is disabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05 12:09:02 +02:00
Marek Olšák
fb7ba68f6c radeonsi: eliminate PS color outputs when colormask kills them
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04 15:10:39 +02:00
Marek Olšák
468c131033 gallium/radeon: sort DBG shader flags according to pipe_shader_type
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04 15:10:39 +02:00
Nicolai Hähnle
50283109aa radeonsi: ensure cache flushes happen before SET_PREDICATION packets
The data is read when the render_cond_atom is emitted, so we must
delay emitting the atom until after the flush.

Fixes: 0fe0320dc0 ("radeonsi: use optimal packet order when doing a pipeline sync")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04 13:50:57 +02:00
Nicolai Hähnle
097cfe9fde radeonsi: fix ARB_transform_feedback_overflow_query on <= VI
The result written by the shader workaround needs to be written back, or
the CP may read stale data.

Fixes: 78476cfe07 ("radeonsi: enable ARB_transform_feedback_overflow_query")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04 13:50:54 +02:00
Nicolai Hähnle
55df3d2286 radeonsi: fix compute shader state dumping
Fixes: 420c438589 ("radeonsi: log draw and compute state into log context")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04 13:50:47 +02:00
Nicolai Hähnle
30a2f0dfd4 radeonsi: add an assertion that only two-dimensional constant references are used
v2: remove some redundant checks

Acked-by: Roland Scheidegger <sroland@vmware.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-09-04 13:44:09 +02:00
Nicolai Hähnle
3e4dff4f00 gallium/radeon: always use two-dimensional constant references
Acked-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-09-04 13:44:06 +02:00
Nicolai Hähnle
83923a1f17 gallium/tests: always use two-dimensional constant references
Acked-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-09-04 13:44:04 +02:00