This reverts commit d8d6091a84.
Heap allocations may be only 8-byte aligned on 32-bit system, and so having
members with 16-byte alignment (such as in the case where pipe_blend_color is
embedded in radeonsi's si_context) is undefined behavior which indeed causes
crashes when compiled with gcc -O3.
Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96835
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
Acked-by: Chuck Atkins <chuck.atkins@kitware.com>
(cherry picked from commit 29f53d7937)
v2: use abort(), describe which LLVM version is affected
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit d227dbe272)
The size of the driver constant buffer for each stage should be 2048
and not 512 because it has been increased recently for buffers/images.
While we are at it, do the same change for indirect draws.
This fixes all ARB_shader_draw_parameters tests on GM107.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 31a615677b)
This fixes the following piglits:
arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index
arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 19d0450b27)
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.
As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit d938b8c0bf)
With commit f41f78cda1 ("radeonsi: drop the DRAW_PREAMBLE packet on
Polaris") we failed to attribute that the separate current/prev
radeon_winsys_cs_chunk(s) are not applicable/available in branch.
The latter of which introduced with commit 89ba076de4 ("radeon/winsys:
introduce radeon_winsys_cs_chunk").
Just drop "current." from the respective places to get things up and
running again.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96864
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
SVGA_3D_CMD_DX_GENRATE_MIPMAP & SVGA_3D_CMD_DX_SET_PREDICATION commands
are not presents in fedora 24 kernel module. Because of this
reason application like supertuxkart are not running.
v2: Add few comments and code modifications suggested by Brian P.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit 7988513ac3)
Since the function is exported like any other
public api function and put in the header
as if you could link against it, export it also
from shared objects.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 13affe0d3f)
Some games are sloppy.. perhaps because it is defined behavior for DX or
perhaps because nv blob driver defaults things to zero.
So add driconf param to force uninitialized variables to default to zero.
This issue was observed with rust, from steam store. But has surfaced
elsewhere in the past.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit f78a6b1ce3)
Devices with smaller GMEM size need more tiles. On db410c at 2048x1152,
glmark2 shadow needed ~330 tiles for fullscreen. Lets bump it up to
512. (Maybe with MRT you could end up needing more, but at that point
things are probably going to be painfully slow.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 7295428e41)
Otherwise things will fail to build, if the builder is using another
version of LLVM.
v2: annotate all the dependencies of builder_gen.h
v3: clean the generated files as needed
v4: comment cleanups (Tim)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com> (v2)
Reported-by: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 744d0d8f3b)
Considering how hard/annoying it was for many peoples' workflow to
properly generate the macro, it will be demoted to conditionally
available with follow-up commits.
v2: Kill off gracious blank line (Vedran).
Cc: mesa-stable@lists.freedesktop.org
Cc: Vedran Miletić <vedran@miletic.net>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Vedran Miletić <vedran@miletic.net>
(cherry picked from commit f98530b739)
In presence of an indirect image access, the base offset should be
zeroed because the stride will be computed twice. This is a pretty
rare situation but it can happen when tex.r > 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f3b9fff3c3)
When emitting OP_SUB, the sign bit for FADD and FADD32I is not
at the same position. It's at position 45 for FADD but 51 for FADD32I.
This fixes the following piglit test:
tests/spec/arb_fragment_program/fdo30337b.shader_test
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cb828b7b18)
This aligns the 4-element color float array to 16 byte boundaries. This
should allow compiler vectorizers to generate better optimizations.
Also fixes broken vectorization generated by Intel compiler.
v2: Fixed indentation and added a lengthy comment explaining the
reason for the alignment.
Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit d8d6091a84)
Encapsulate the test for which flags are needed to get a compiler to
support certain features. Along with this, give various options to try
for AVX and AVX2 support. Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different. This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.
This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.
v2: Pass preprocessor-check argument as true-state instead of
false-state for clarity.
v3: Reduce AVX2 define test to just __AVX2__. Additional defines suchas
__FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined
w.r.t thier availability.
v4: Fix C++11 flags being added globally and add more logic to
swr_require_cxx_feature_flags
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Tim Rowley <timothy.o.rowley@Intel.com>
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
(cherry picked from commit c1bf6692be)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Fixes a h265 video corruption bug which caused by uvd fw interface changes.
Signed-off-by: sonjiang <sonny.jiang@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit b928ff6f62)
Because Polaris uvd fw interface changes, the driver need to check fw version
to apply right interface. This change is to add uvd fw version.
Signed-off-by: sonjiang <sonny.jiang@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 28f85eab49)
[Emil Velikov: resolve trivial s/bool/boolean/ conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/radeon/radeon_winsys.h
Rely on the existence of a second destination when emitting a setcond
flag is dangerous, because this doesn't mean that the flag has been
correctly set. Instead rely on flagsDef like what emitX() does
for flagsSrc.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cc97b6a34a)
This was missing.
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c1dbc563f4)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7b9b096775)
LOP only allows to emit 19-bits immediates.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 83a4f28dc2)
These need to be passed from the host in caps structure if they
are larger, this fixes a bunch of tests on Intel hw, that I'd
put the limits too high for.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c7cc264ca9)
MOV only allows to emit 19-bits immediates. This is similar to the
previous fix I did for IMUL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c7fa3c92f8)
IMUL only allows to emit 19-bits immediates. This is similar to
d30768025a which fixed the same thing
for the GK110 emitter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b84c97587b)
ported from Vulkan (and no source explains why this is needed)
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 28d0d0c5b4)
We were ignoring the incoming box parameters, and were providing totally
bogus stride/layer stride, and other bits, for when a non-full-surface
map was requested.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b433cb51e5)
It will be removed from the firmware for the Polaris.
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 0da890e62c)
The non-MULTI variants will be removed in Polaris firmware.
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 2aa0485902)
The start instance is applied as an offset into the buffer directly,
ignoring the divisor, not as an instance id offset that respects the
divisor.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1f4bca798d)
The generic version gets this right already, but this was using an
incorrect formula in SSE.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 5b0d64886d)
The old calculation treated too many RBs as disabled.
Cc: 11.0 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit c95175581e)
The old limit, introduced in commit afa752d3f0,
was exceeded by 4 SE configurations which hit si_write_harvested_raster_configs.
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6c2e636982)
Apparently, these are deprecated. There's some AutoUpgrade feature which
is supposed to promote these to cmp/select, which apparently doesn't work
with jit code. It is possible it's not actually even meant to work (see
the bug filed against llvm which couldn't provide an answer neither)
but in any case this is meant to be only temporary unless the intrinsics
are really illegal. So, just use the fallback code (which should be cmp/select,
we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages
to optimize this back to pmin/pmax in the end).
This addresses https://llvm.org/bugs/show_bug.cgi?id=28176
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Aaron Watry <awatry@gmail.com>
(cherry picked from commit b0cf99165a)
This makes the check match up what we do on nv50 as well - there's no
point in switching over the push path if everything's in managed
buffers. This can happen when a shader uses a vertex without an enabled
array - we end up passing it a constant attribute.
This also has the effect of "fixing" some flickering in Talos. I have no
idea why. I've stared at the push logic forwards, backwards, and
sideways. By always forcing the push path (which is slow), the
flickering also goes away, but other rendering is still wrong
(specifically draw 383068 as identified in the bug). However by not
switching over to the push path, draw 383068 is correct.
Note that other flickering remains in Talos, like the red/green
walls/floors. This takes care of the shadow flickering though.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90513
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 154c0a42a2)
If we have a loop, instructions before the tex might be added as tex
uses, and those may in fact dominate all other uses of the tex results.
This however doesn't mean that we don't need a texbar after the tex.
Only check if uses dominate each other they are dominated by the tex.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96565
Fixes: 7752bbc44 (gk104/ir: simplify and fool-proof texbar algorithm)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1804aa0b80)
When a shader image view into a buffer texture can be written to, the buffer's
valid range must be updated, or subsequent transfers may incorrectly skip
synchronization.
This fixes a bug that was exposed in Xephyr by PBO acceleration for glReadPixels,
reported by Michel Dänzer.
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a64c7cd2ba)
Back-ported from commit a64c7cd2ba:
- include util/u_format.h
- code was extracted to si_set_shader_image in master, move it back
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
--
src/gallium/drivers/radeonsi/si_descriptors.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
This effectively limits registers to 32 and 64 for fermi and kepler when
1024 threads are used, but allows the full amount to be used with
smaller thread sizes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1f895caba0)
This fixes a problem with the CE preamble and restoring only stuff in the
preamble when needed.
To illustrate suppose we have two graphics IB's 1 and 2, which are submitted in
that order. Furthermore suppose IB 1 does not use CE ram, but IB 2 does, and we
have a context switch at the start of IB 1, but not between IB 1 and IB 2.
The old code put the CE RAM loads in the preamble of IB 2. As the preamble of
IB 1 does not have the loads and the preamble of IB 2 does not get executed, the
old values are not load into CE RAM.
Fix this by always restoring the entire CE RAM.
v2: - Just load all descriptor set buffers instead of load and store the entire
CE RAM.
- Leave the ce_ram_dirty tracking in place for the non-preamble case.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note: This commit differs from the one in master - 54f755fa0f
("radeonsi: Reinitialize all descriptors in CE preamble.")
Change MESA into Mesa in CL_PLATFORM_VERSION and CL_DEVICE_VERSION. For
both, always append git version suffix from git_sha1.h.
v5: move semicolon to same line as MESA_GIT_SHA1.
v4: drop #ifdef guards.
v3: add missing include.
v2: change CL_DEVICE_VERSION as well.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 4825264f75)
Squashed with commit
clover: Include generated sources in AM_CPPFLAGS
git_sha1.c is generated in $(top_builddir)/src.
Fixes out-of-tree builds since 4825264f75.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96516
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit fafe026dbe)
We already check that the address is not "too far", but we should also
clamp the UBO index in order to avoid looking at the wrong place in the
driver cb. This is a pretty rare situation though.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7f257abc1b)
Removes the need to set LIBVA_DRIVER_NAME=gallium for supported targets and is
consistent with vdpau and general gallium drivers.
Note: some versions of libva can detect the gallium name and use the
backend. Although that behaviour seems inconsistent since it only works
for some platforms/backends.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 0c0f841e5d)
When building from a release tarball (where the generated/built files
are in srcdir) in an OOT fashion we need to have both builddir and
srcdir in the includes list.
Otherwise we'll error out, as the file (header gen_knobs.h in this case)
won't be in the location where we are looking.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fcb5a75a66)
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."
So, the rectangles sharing just an edge shouldn't overlap.
-----------
| |
------- ---
| | |
| | |
------- ---
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 466b320163)