With the Android build system changes to ninja/kati, the use of
.SECONDEXPANSION is no longer supported. Fix this by avoiding rule specific
variables and using $(transform-generated-source).
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 794221fbb7)
This function is currently broken for BE. I assume it's because of
util_pack_color(). Until I fix this path, I prefer to disable it so users
would be able to see correct colors on their desktop and applications.
Together with the two following patches:
- gallium/r600: Don't let h/w do endian swap for colorformat
- gallium/radeon: remove separate BE path in r600_translate_colorswap
it fixes BZ#72877 and BZ#92039
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a640ad15e1)
Since the rework on gallium pipe formats, there is no more need to do
endian swap of the colorformat in the h/w, because the conversion between
mesa format and gallium (pipe) format takes endianess into account (see
the big #if in p_format.h).
v2: return ENDIAN_NONE only for four 8-bits components
(V_0280A0_COLOR_8_8_8_8)
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e3dfc0e095)
After further testing, it appears there is no need for
separate BE path in r600_translate_colorswap()
The only fix remaining is the change of the last if statement, in the 4
channels case. Originally, it contained an invalid swizzle configuration
that never got hit, in LE or BE. So the fix is relevant for both systems.
This patch adds an additional 120 available visuals for LE and BE,
as seen in glxinfo
v2:
Tested for regressions by running piglit gpu.py with CAICOS (r600g) on
x86-64 machine. No regressions found.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 9559071ed6)
It appears that it actually needs to be aligned to the datum size, so it
was 1 when testing with R8, but it can be as high as 16 with RGBA32.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit aa3b85fd18)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/nouveau/nv50/nv50_screen.c
Since commit d1314de293 we ignore
damage passed to SwapBuffersWithDamage.
Wayland 1.10 now has functionality that allows us to properly
process those damage rectangles, and a way to query if it's
available.
Now we can use wl_surface.damage_buffer and interpret the incoming
damage as being in buffer co-ordinates.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
(cherry picked from commit d085a5dff5)
Will allow us to catch/prevent issues, like the one in mesa 11.2.0-rc1.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 51c65a4c48)
It's not correct to CSE these multiplies
mul.sat dst1, -a, b
mul.sat dst2, a, b
by emitting a negated MOV from dst1 to dst2:
mul.sat dst1, -a, b
mov dst2, -dst1
Take 2.0*2.0 for example. The first multiply would produce 0.0 and the
second would produce 1.0.
Fixes bad generated code in 18 to 22 shaders:
instructions in affected programs: 432 -> 464 (7.41%)
helped: 4
HURT: 18
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 1567da1e28)
Because I changed the swizzle check, I also need to adapt the return
values for each check.
It's basically almost the same as before, we just cross between STD and
STD_REV, and cross between ALT and ALT_REV
This fixes the rgba test in gl-1.0-readpixsanity (piglit) and also
fixes tri-flat (mesa demos).
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 439b5b008f)
Two things were broken here:
- The depth/stencil surface dimensions were broken for MSAA.
- Sample count was programmed incorrectly.
Result was the depth resolve didn't work correctly on MSAA surfaces, and
so sampling the surface later produced garbage.
Fixes the new piglit test arb_texture_multisample-sample-depth, and
various artifacts in 'tesseract' with msaa=4 glineardepth=0.
Fixes freedesktop bug #76396.
Not observed any piglit regressions on Haswell.
v2: Just set brw_hiz_op_params::dst.num_samples rather than adding a
helper function (Ken).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
v3: moved the alignment needed for hiz+msaa to brw_blorp.cpp, as
suggested by Chad Versace (Alejandro Piñeiro on behalf of Chris
Forbes)
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 43d23e879c)
Nominated-by: Ben Widawsky <benjamin.widawsky@intel.com> (over IRC)
The EXT spec has been updated to:
- logically combine the es2_profile and es_profile exts
- allow any legal version to be requested
dEQP tests request a specific ES version when using GLX, so this allows
dEQP upstream to run against GLX with the appropriate X server patch
(which had similar disabling logic).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Adam Jackson <ajax@redhat.com> (v3)
v1 -> v2:
- distinguish between DRI_API_GLES{,2,3}
- add GLX_EXT_create_context_es_profile client-side support
v2 -> v3:
- fix error in computing mask
(cherry picked from commit 5ac7f0433b)
Nominated-by: Emil Velikov <emil.velikov@collabora.com>
This patch fixes a bug when building a pack instruction.
For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.
This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale
I've also corrected some coding style issues in the function.
v2: Returned else statements to vmware style
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 679a654a77)
The current code in r600_translate_colorswap uses the swizzle information
to determine which colorswap to use.
This works for BE & LE when the nr_channels is <4, but when nr_channels==4
(e.g. PIPE_FORMAT_A8R8G8B8_UNORM), this method can not be used for both BE
and LE, because the swizzle info is the same for both of them.
As a result, r600g doesn't support 24bit color formats, only 16bit, which
forces the user to choose 16bit color in X server.
This patch fixes this bug by separating the checks for LE and BE and
adapting the swizzle conditions in the BE part of the checks.
Tested on an Evergreen GPU (Cedar GL FirePro 2270) running inside POWER7
Big-Endian Machine.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "11.2" "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 4b7e219e61)
A nomination unadorned with a specific version is now interpreted as
being aimed at the 11.1 branch, which was recently opened.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
See commit 9db2098d for the i965 version of this.
This fixes depth in a bunch of dEQP EXT_texture_border_clamp tests. And
probably other ones as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0b10ec1086)
A return value of '-1' means that there was error during swap with a
window drawable, in this case we set error as EGL_BAD_NATIVE_WINDOW.
v2: coding style cleanup, better commit message
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit e6f1a44d14)
Null-check on "*value" is currently done in _eglGetSyncAttrib, which is
after eglGetSyncAttribKHR dereferences it.
Move the check a layer up (in the beginning of eglGetSyncAttribKHR) to
avoid segfaults.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Emil Velikov: tweak commit message, add stable tag]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit d1e1563bb6)
If the destination is a renderbuffer, dst_tex_image will be NULL. This
fixes the *to_renderbuffer dEQP copy image tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit b697400a97)
The Iris part is left unbranded because we did not have these with original SKL.
v2: 0x192d is gt3, not gt4
v3: Forgot to update the temporary brand string when I did v2.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry picked from commit 644c8a5151)
Even though it's a no-op, it's important to keep track of the type so
that we can pick the properly-signed op later on.
This fixes dEQP-GLES3.functional.shaders.precision.uint.highp_div_fragment,
which ended up using IDIV instead of UDIV.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 49c67926c7)
When there's a predicate, it just goes onto the sources list. If the
quadop only has a single regular source, we will end up thinking that
the predicate is the second source. Check explicitly for the predSrc so
that we don't accidentally emit the wrong thing.
This fixes a bunch of dEQP-GLES3.functional.shaders.derivate.* tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ca23c8081f)
Without the check, unsuccessful xcb_dri2_get_buffers_reply(...) causes
segmentation fault in dri2_get_buffers.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
(cherry picked from commit 5d87a7c894)
Fixes several of the
"dEQP-GLES31.functional.image_load_store*load_store*single_layer" dEQP
tests that use image formats we implement using untyped surface
messages.
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9e30d66b7c)
This fixes two issues. First, we had a use-after-free in the case where
the instruction got deleted and we tried to return mov->dest.write_mask.
Second, in the case where we are doing a self-mov of a register, we delete
those channels that are moved to themselves from the write-mask. This
means that those channels aren't reported as being handled even though they
are. We now stash off the write-mask before remove unneeded channels so
that they still get reported as handled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94073
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 70dff4a55e)
This fixes an assertion failure in [at least] one of the Unreal Engine Linux
demo/games that uses DXT1 compression. Specifically, the "Vehicle Game".
At some point, the game ends up trying to blit mip level whose size is 2x2,
which is smaller than a DXT1 block. As a result, the assertion in the blit path
is triggered. It should be safe to simply make sure we align the width and
height, which is sadly an example of compression being less efficient.
NOTE: The demo seems to work fine without the assert, and therefore release
builds of mesa wouldn't stumble over this. Perhaps there is some unnoticeable
corruption, but I had trouble spotting it.
Thanks to Jason for looking at my backtrace and figuring out what was going on.
v2: Use NPOT alignment to make sure ASTC is handled properly (Ilia)
Remove comment about how this doesn't fix other bugs, because it does.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93358
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit 088280e022)
The fast path for Intel's ReadPixels() unintentionally omits clipping
the specified area to a valid one. Rather than clip in various
corner-cases, perform this operation in the API validation stage.
The bug in intel_readpixels_tiled_memcpy() showed itself when the winsys
ReadBuffer's height was smaller than the one specified by ReadPixels().
yoffset became negative, which was an invalid input for tiled_to_linear().
v2: Move clipping to validation stage (Jason)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Marta Löfstedt <marta.lofstedt@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 605832736a)
v2: Use gl_renderbuffer::{Width,Height} (Jason)
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 55d56d34e0)
Ilia Mirkin found/fixed the mistake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93813
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit fe14110f35)
The builtin data can get released with a glReleaseShaderCompiler call.
We're careful everywhere to clone everything that comes out of builtins
except here, where we accidentally return the signature belonging to the
builtin version, rather than the locally-cloned one.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 88519c6087)
The builtin function shader is part of the builtin state, released
when glReleaseShaderCompiler is called. We must ensure that the
builtins have been (re)initialized before attempting to link with the
builtin shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ac57577e29)
The vec4 backend, at the end, does this:
if (inst->is_3src()) {
for (int i = 0; i < 3; i++) {
if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0)
assert(brw_is_single_value_swizzle(inst->src[i].swizzle));
So make sure that we use the same conditions when trying to
copy-propagate. UNIFORMs will be converted to vstride 0 in
convert_to_hw_regs, but so will ATTRs when interleaved (as will happen
in a GS with multiple attributes). Since the vstride is not set at
copy-prop time, infer it by inspecting dispatch_mode and reject ATTRs if
they have non-scalar swizzles and are interleaved.
Fixes assertion errors in dolphin-generated geometry shaders (or
misrendering on opt builds) on Sandybridge or on IVB/HSW with
INTEL_DEBUG=nodualobj.
Co-authored-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93418
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9f2e22bf34)
GLsync objects had a race condition when used from multiple threads
(which is the main point of the extension, really); it could be
validated as a sync object at the beginning of the function, and then
deleted by another thread before use, causing crashes. Fix this by
changing all casts from GLsync to struct gl_sync_object to a new
function _mesa_get_and_ref_sync() that validates and increases
the refcount.
In a similar vein, validation itself uses _mesa_set_search(), which
requires synchronization -- it was called without a mutex held, causing
spurious error returns and other issues. Since _mesa_get_and_ref_sync()
now takes the shared context mutex, this problem is also resolved.
Fixes bug #92757, found while developing Nageru, my live video mixer
(due for release at FOSDEM 2016).
v2: Marek: silence warnings, fix declaration after code
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit feb53912f8)
Fixup to commit 03b3eb90d - the number of buffers could be larger than
the number of elements, in which case we'd pass a negative argument to
PUSH_SPACE, which would be bad. While we're at it, merge it with the
other PUSH_SPACE at the top of the function.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 2065e380b2)
nvc0_vbo has explicit push space checking enabled, so we must run
PUSH_SPACE by hand. A few spots missed that.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 03b3eb90d7)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
The spill logic will insert convert ops when moving between files. It
seems like the emission logic wasn't quite ready for these converts.
Tested on fermi, and visually looked at nvdisasm output for maxwell.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1a0fde1f52)
64-bit Pentium 4 CPUs don't have the 3DNow prefetch instructions
which results in an Illegal instruction crash.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Timothy Arceri <t_arceri@yahoo.com.au>
https://bugs.freedesktop.org/show_bug.cgi?id=27512
(cherry picked from commit 9c78cfd547)
When a fragment shader is used that has no outputs but does conditional
discard (KILL_IF), all fragments are killed without this patch.
By comparing various register settings, my conclusion is that the exec mask
is either not properly forwarded to the DB by NULL exports or ends up being
unused, at least when there is _only_ a NULL export (the ISA documentation
claims that NULL exports can be used to override a previously exported exec
mask).
Of the various approaches I have tried to work around the problem, this one
seems to be the least invasive one.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93761
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need to tell the address generation functions about the dimensionality of
the texture to correctly implement the part of Section 3.8.1 (Texture Image
Specification) of the OpenGL 2.1 specification which says:
"For the purposes of decoding the texture image, TexImage2D is
equivalent to calling TexImage3D with corresponding arguments
and depth of 1, except that
...
* UNPACK SKIP IMAGES is ignored."
Fixes a low impact bug that was found by chance while browsing the spec and
extending piglit tests.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
(cherry picked from commit 4a448a63ad)
The scaling list should be filled out with zig zag scan
v2: integrate zig zag scan for list 4x4 to vl(Christian)
v3: move list determination out from the loop(Ilia)
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 6ad2e55a14)
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 4f598f2173)
We use this logic to detect live ranges and then do plain renaming
across the whole codebase. As such, to prevent WaW hazards, we have to
treat a write as if it were also a read.
For example, the following sequence was observed before this patch:
13: UIF TEMP[6].xxxx :0
14: ADD TEMP[6].x, CONST[6].xxxx, -IN[3].yyyy
15: RCP TEMP[7].x, TEMP[3].xxxx
16: MUL TEMP[3].x, TEMP[6].xxxx, TEMP[7].xxxx
17: ADD TEMP[6].x, CONST[7].xxxx, -IN[3].yyyy
18: RCP TEMP[7].x, TEMP[3].xxxx
19: MUL TEMP[4].x, TEMP[6].xxxx, TEMP[7].xxxx
While after this patch it becomes:
13: UIF TEMP[7].xxxx :0
14: ADD TEMP[7].x, CONST[6].xxxx, -IN[3].yyyy
15: RCP TEMP[8].x, TEMP[3].xxxx
16: MUL TEMP[4].x, TEMP[7].xxxx, TEMP[8].xxxx
17: ADD TEMP[7].x, CONST[7].xxxx, -IN[3].yyyy
18: RCP TEMP[8].x, TEMP[3].xxxx
19: MUL TEMP[5].x, TEMP[7].xxxx, TEMP[8].xxxx
Most importantly note that in the first example, the second RCP is done
on the result of the MUL while in the second, the second RCP should have
the same value as the first. Looking at the GLSL source, it is apparent
that both of the RCP's should have had the same source.
Looking at what's going on, the GLSL looks something like
float tmin_8;
float tmin_10;
tmin_10 = tmin_8;
... lots of code ...
tmin_8 = tmpvar_17;
... more code that never looks at tmin_8 ...
And so we end up with a last_read somewhere at the beginning, and a
first_write somewhere at the bottom. For some reason DCE doesn't remove
it, but even if that were fixed, DCE doesn't handle 100% of cases, esp
including loops.
With the last_read somewhere high up, we overwrite the previously
correct (and large) last_read with a low one, and then proceed to decide
to merge all kinds of junk onto this temp. Even if that weren't the
case, and there were just some writes after the last read, then we might
still overwrite a merged value with one of those.
As a result, we should treat a write as a last_read for the purpose of
determining the live range.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 047b917718)
If an instruction has multiple defs, we have to do a lot more checks to
make sure that we can move it forward. Among other things, various code
likes to do
a, b = tex()
if () c = a
else c = b
which means that a single phi node will have results pointing at the
same instruction. We obviously can't propagate the tex in this case, but
properly accounting for this situation is tricky. Just don't try for
instructions with multiple defs.
This fixes about 20 shaders in shader-db, including the dolphin efb2ram
shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 3ca941d60e)
It appears that the nvidia render engine is quite picky when it comes to
linear surfaces. It doesn't like non-256-byte aligned offsets, and
apparently doesn't even do non-256-byte strides.
This makes arb_clear_buffer_object-unaligned pass on both nv50 and nvc0.
As a side-effect this also allows RGB32 clears to work via GPU data
upload instead of synchronizing the buffer to the CPU (nvc0 only).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> # tested on GF108, GT215
Tested-by: Nick Sarnie <commendsarnex@gmail.com> # GK208
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 3ca2001b53)