Fixes build issues with llvm-3.6
Fixes: 3115687f9b (clover: Fix build after
LLVM r313390)
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
We do not enable this by default for additive blending, since it slightly
breaks OpenGL invariance guarantees due to non-determinism.
Still, there may be some applications can benefit from white-listing
via the radeonsi_commutative_blend_add drirc setting without any real
visible artifacts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This option enables a performance optimization where typical non-blending
draws with depth buffer may be rasterized out-of-order (on VI+, multi-SE
chips).
This optimization can lead to incorrect results when an applications
renders multiple objects with the same Z value at the same pixel, so we
will never enable it by default. But there may be applications that could
benefit from white-listing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This does not take commutative blending into account yet.
R600_DEBUG=nooutoforder disables it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
To be able to properly distinguish between GL_ANY_SAMPLES_PASSED
and GL_ANY_SAMPLES_PASSED_CONSERVATIVE.
This patch goes through all drivers, having them treat the two
query types identically, except:
1. radeon incorrectly enabled conservative mode on
PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only
on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE.
2. st/mesa uses the new query type.
Fixes dEQP-GLES31.functional.fbo.no_attachments.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The NIR-to-LLVM pass already does this; now the same fix covers
radeonsi as well.
Fixes various tests of
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is the same workaround that radv already applied in commit
3ece76f03d ("radv/ac: gather4 cube workaround integer").
Fixes dEQP-GLES31.functional.texture.gather.basic.cube.rgba8i/ui.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: pass llvm context reference instead of a pointer
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The virgl protocol version of tgsi doesn't handle this yet,
transform it back to the old ways.
Thanks to Nicolai Hähnle <nicolai.haehnle@amd.com>
for also writing nearly the same patch.
Fixes: 41e342d5 tgsi/ureg: always emit constants (and their decls) as 2D
Tested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This will allow us to use it from radv.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The GLSL rules for interpolateAtSample are unfortunate:
"Returns the value of the input interpolant variable at
the location of sample number sample. If
multisample buffers are not available, the input
variable will be evaluated at the center of the pixel.
If sample sample does not exist, the position used to
interpolate the input variable is undefined."
This fix will fallback to monolithic shader compilation when
interpolateAtSample is used without multisampling.
One alternative would be to always upload 16 sample positions,
filling the buffer up with repetition when the actual number of
samples is less, and then ANDing the sample ID with 0xf. However,
that punishes all well-behaving users of interpolateAtSample,
when in reality, only conformance tests should be affected by
the issue.
Fixes
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
gl_SampleMaskIn is supposed to contain set bits only for the samples that
are covered by the current fragment shader invocation, but the VGPR
initialization hardware loads the set of all bits that are covered at the
current pixel.
Fixes various tests in
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If the last operation happens to be a non-draw, such as a
transfer_map that triggers a decompress blit, there may be
interesting messages left in the driver log.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add InstanceStrideEnable field and rename InstanceDataStepRate to
InstanceAdvancementState in INPUT_ELEMENT_DESC structure.
Add stubs for handling InstanceStrideEnable in FetchJit::JitLoadVertices()
and FetchJit::JitGatherVertices() and assert if they are triggered.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Make more robust to handle strange strange configurations like a vmware
exported 4-way numa X 1-core configuration.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add new field in SWR_BACKEND_STATE::vertexClipCullOffset to specify the
start of the clip/cull section of the vertex header. Removed use of
hardcoded slot from binner.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
SwrStallBE stalls the backend threads until all work submitted before
the stall has finished. The frontend threads can continue to make
forward progress.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This removes the barrier and LDS stores and loads for tess factors
when it's possible. The removal of the barrier seems more important
to me though.
In one shader, it removes 17 * 4 bytes from the shader binary.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The pass tries to deduce whether tess factors are always written by
all shader invocations.
The implication for radeonsi is that it doesn't have to use a barrier
near the end of TCS, and doesn't have to use LDS for passing the tess
factors to the epilog.
v2: Handle barriers and do the analysis pass for each code segment
surrounded by barriers separately, and AND results from all
such segments writing tess factors. The change is trivial in the main
switch statement.
Also, the result is renamed to "tessfactors_are_def_in_all_invocs"
to make the name accurate.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>