gather is defined in terms of bilinear filtering, just without the filtering
part. However, there's actually some subtle differences required in our
implementation, because we use some tricks to simplify coord wrapping for the
two coords per direction.
For bilinear filtering, we don't care if we end up with an incorrect
texel, as long as the filter weight is 0.0 for it. Likewise, the order of
the texels doesn't actually matter (as long as they still have the correct
filter weight).
But for gather, these tricks lead to incorrect results.
Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions
which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix
right now, and noone really seems to care...).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch aborts shader translation upon indirect indexing of temporary
register on non-vgpu10 device. This prevents non-supported feature
sending to the device.
Tested wth MTT-piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 10dec2de2d.
The environment variable is no longer needed with the previous change
Reviewed-by: Christian König <christian.koenig@amd.com>
v2: use deinterlace common function
v3: make sure deinterlace only
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
So that it makes more clear for buffer reallocation based
on buffers layout for both decoder and encoder.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Since it's no longer being called outside of compositor
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The similar function is in OMX, and only used by OMX. Now have it
moved to vl/compositor for other state tracker to use later.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Same as before, writing TCS outputs to LDS is rare.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
TCS outputs are usually not written to LDS, so no stats here.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Now it's able to generate ds_write2_b64 instead of ds_write2_b32.
-20 bytes in one shader binary. (having only 1 output)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It looks like commit 391673af7a that should
have fixed the perf regression didn't really change much if anything.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Trivial. We already support tg4 for legacy tex opcodes, so the actual
texture sampling code already handles it.
(Just like TG4, we don't handle additional capabilities and always sample
red channel.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We're not particularly concerned with memory usage, if the tradeoff is
shader recompiles. And it's common for apps to have a lot of shaders
nowadays (and, since our shaders include a LOT of context state of course
we may create quite a bit more shaders even).
So quadruple the amount of shaders draw will cache (from 128 to 512).
For llvmpipe (fs shaders) quadruple the number of instructions, keep the
number of variants the same for now (only with very simple, non-texturing
shaders the variant limit could really be reached), and simplify the
definition, it's probably easier to just have one different definition
per branch...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Needed to compensate for change to fetch jit requiring
alignment.
Fixes regressions in piglit: vertex-buffer-offsets and about
another hundred of the vs-input*byte* tests.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
When the HS wave is empty, the hardware writes the LS VGPRs starting at
v0 instead of v2. Workaround by shifting them back into place when
necessary. For simplicity, this is always done in the LS prolog.
According to the hardware team, this will be fixed in future chips,
so take that into account already.
Note that this is not a bug fix, as the bug was already worked
around by commit 166823bfd2 ("radeonsi/gfx9: add a temporary workaround
for a tessellation driver bug"). This change merely replaces the
workaround by one that should be better.
v2: add workaround code to shader only when necessary
v3: clarify the prefer_mono comment
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The buffer bind flags can be promoted in svga_buffer_handle(), so
move the assertion after it. This has already been done for
vertex buffer in commit 6b4bf7e8be, but it misses the one for
index buffer.
Fixes assertion running WarThunder.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Minor performance improvement in avoiding binding the same shader resource
or the same vertex buffer for the same slot.
Tested with MTT glretrace.
v2: Per Brian's suggestion, add a helper function to do vertex buffer
comparision.
v3: Change the helper function to vertex_buffers_equal().
Reviewed-by: Brian Paul <brianp@vmware.com>
This increases performance, but it was tuned for Raven, not Vega.
We don't know yet how Vega will perform, hopefully not worse.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
3 flags for primitive binning, 2 flags for out-of-order rasterization
(but that will be done some other time)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The data is read when the render_cond_atom is emitted, so we must
delay emitting the atom until after the flush.
Fixes: 0fe0320dc0 ("radeonsi: use optimal packet order when doing a pipeline sync")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The result written by the shader workaround needs to be written back, or
the CP may read stale data.
Fixes: 78476cfe07 ("radeonsi: enable ARB_transform_feedback_overflow_query")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>