This is our first CSE on a regs_written() > 1 instruction, so it takes a
bit of extra fixup. Reduces the number of loads on kwin's Lanczos shader
from 12 to 2.
v2: Fix compiler warning (false positive on possibly-uninitialized variable)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
NOTE: This is a candidate for the 9.1 branch.
(cherry picked from commit 9f43b84928)
Right now we don't have anything with regs_written() > 1 and !inst->mlen,
but that's about to change.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bc0e1591f6)
This puts the rounding-up logic into the function itself instead of all
the callers having to manage it. Also drop an "unused" comment in gen4,
as the stride *is* used for texbos (and will be for uniforms soon).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 2f41a60145)
I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 8c694dfe64)
We weren't inserting it into the list, so it did nothing. This line was
replaced by the MOV/MUL block above.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 59e858861c)
If the active uniform is an array, then the length of the uniform name should
include the three extra characters for the "[0]" suffix, which is required by
the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform().
This avoids the situation where the output buffer does not have enough space
to hold the "[0]" suffix, resulting in an incomplete array specification like
"foobar[0".
NOTE: This is a candidate for the 9.1 branch.
Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit bc0cc2944f)
Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC,
not just for specific platforms. Fixes Solaris build.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 95df2b2883)
Interpolation modes other than perspective-barycentric-pixel-center (and
their associated coefficients in the WM payload) only exist in Gen6 and
later.
Unfortunately, if a varying was declared as `centroid`, we would blindly
read the nonexistant values, and so produce all manner of bad behavior
-- texture swimming, snow, etc.
Fixes rendering in Counter-Strike Source and Team Fortress 2 on
Ironlake.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 79f786f936)
This is roughly a backport of Eric's commit 0967c362.
We avoided assigning a slot in the VUE map for gl_ClipVertex, but left
the bit set in outputs_written, producing horrible confusion further
down the pipe.
Mostly fixes rendering in source games, and probably in Freespace 2 SCP.
No Piglit regressions on Ironlake.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
V2: Mask out the bit, not its index. Strangely, the game still worked
with that wrong, but rendering of pretty much anything else was
completely trashed.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't support pre-2.6 kernels anyway - the install docs say 2.6.28
for DRI - and apparently this confuses ld.so's sorting when multiple
libGLs are installed. Just remove it.
Note: this is a candidate for the stable branches.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 904b03824b)
There are too many cases were we end up with lockups.
Once we sort out the remaining issues on master, they
can be backported and hyperz can be re-enabled on 9.1
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes a bug introduced in commit 258453716f and
triggered whenever "rb" is NULL.
Fixes at least one cause bug #59445:
[SNB/IVB/HSW Bisected]Oglc draw-buffers2(advanced.blending.none) segfault
https://bugs.freedesktop.org/show_bug.cgi?id=59445
(Though segfaults are still possible in that test case, but they have been
present since before commit 258453716f which is what's being fixed here.)
Reviewed-by: Eric Anholt <eric@anholt.net>
[jordan.l.justen@intel.com: fixes Anomaly Warzone Earth crash at title screen]
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Since the case was missing bec4->get_scalar_type() would return bvec4,
but vec4->get_scalar_type() would return float.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit c770faea0a)
Virtual address is used for PIPE_QUERY_SO* queries in
r600_emit_query_begin, but not in r600_emit_query_end.
This will trigger a GPU fault when one of those queries is
made and virtual address is enabled.
Note: this is a candidate for the 9.1 branch
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 92855bcc95)
Since half of ir_validate uses asserts() (the other using printf() then
abort()), there's not much use to calling it in a release build. Cuts
6.3% of the startup time of TF2.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 712bac1f41)
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit b2a4573c14)
NOTE: This is a candidate for the 9.1 branch.
Fixes piglit's texture-immutable-levels test.
Reported-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 12dc4be8a6)
Fixes random failures with piglit glsl-max-varyings.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Christian König <christian.koenig@amd.com>
(cherry picked from commit 032e5548b3)
Commit 33599433c7 began setting the texture swizzle mode to XYZ1 for
RED, RG, and RGB textures in order to force alpha to 1.0 in case we
actually stored the texture as RGBA.
This had a unforseen performance implication: the shader precompile
assumes that the texture swizzle mode will be XYZW for non-shadow
sampler types. By setting it to XYZ1, this means every shader used with
a RED, RG, or RGB texture has to be recompiled. This is a very common
case.
Unfortunately, there's no way to improve the precompile, since RGBA
textures still need XYZW, and there's no way to know by looking at
the shader source what texture formats might be used.
However, we only need to smash alpha to 1.0 if the texture's memory
format actually has alpha bits. If not, the sampler already returns 1.0
for us without any special swizzling. XRGB8888, for example, is a very
common case where this occurs.
This partially fixes a performance regression since commit 33599433c7.
More work is required to fully fix it in all cases. This at least helps
Warsow.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d86efc075e)
NOTE: This is a candidate for the 9.1 branch.
Tested-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
(cherry picked from commit 7c3d8301af)
If we're in some conditional or loop we must not return, or the code
after the condition is never executed.
(v2): And, we also can't just continue as nothing happened, since the
mask update code would later check if we actually have a mask, so we
need to remember that there was a return in main where we didn't exit
(to illustrate this, a ret in a if clause would cause a mask update
which is still ok as we're in a conditional, but after the endif the
mask update code would drop the mask hence bringing execution back to
pixels which should have their execution mask set to zero by the ret).
Thanks to Christoph Bumiller for figuring this out.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=62357.
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 5af7b45986)
v2: Andreas Boll <andreas.boll.dev@gmail.com>
- Fix formatting - use one CFLAG per line
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59238
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
(cherry picked from commit f70c385351)
Fast depth clears have the same depth/stencil alignment requirements
as other drawing operations. Therefore, we need to call
brw_workaround_depthstencil_alignment() from both the clear and
drawing paths.
Without this fix, we get image corruption if the following conditions
hold: (a) the first ever drawing operation to a depth miplevel (or the
first drawing operation after having used the texture for sampling) is
a clear, (b) the depth miplevel has a size that is eligible for fast
depth clears, and (c) the depth miplevel has an offset within the
miptree that isn't 8x8 aligned.
Fixes piglit "depthstencil-render-miplevels" tests with size 273.
NOTE: This is a candidate for stable branches
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit c5d5827951)
Untyped Atomic Operation messages are illegal for non-RAW formats. The
IVB hardware proceeds happily (after all, who cares what the format of the
surface is if you're doing untyped ops on it?), but later hardware
apparently doesn't. The simulator for gen7 does complain, though.
v2: Rebase against updates to previous patches. (by anholt)
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 91df4d746b)
This is basically a copy and paste of gen7_create_constant_surface, but
with the parameters filled in to offer a simpler interface.
It will diverge shortly.
I didn't bother adding it to the vtable for now since shader time is only
exposed on Gen7+.
v2: Replace tabs in the new code (by anholt)
Add back dropped memset() and add a comment about HSW channel selects.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 125b34cffb)
Haswell's "Data Cache" data port is a single unit, but split into two
SFIDs to allow for more message types without adding more bits in the
message descriptor.
Untyped Atomic Operations are now message 0010 in the second data cache
data port, rather than 6 in the first.
v2: Use the #defines from the previous commit. (by anholt)
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
(cherry picked from commit f27a220cad)
We were sparsely using some of these message types, but I'll just fill
them all in now. It will be used for fixing shader_time on HSW.
v2: Add missing MEDIA_BLOCK_READ.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit a2d08f170a)
Framebuffer blitting operation should be skipped if any of the
dimensions (width/height) of src/dst rect is zero.
V2: Move the dimension check after error checking in _mesa_BlitFramebuffer.
Fixes: fbblit(negative.nullblit.zeroSize) in Intel oglconform
https://bugs.freedesktop.org/show_bug.cgi?id=59495
Note: Candidate for all the stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
(cherry picked from commit d78dcdf103)
This is a squash-commit of the two commits listed below. The first
introduced a 'make check' failure, and the second fixed it.
mesa,gallium,egl,mapi: One definition of C99 inline/__func__ to rule them all.
We were in four already...
NOTE: Candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 70fe7c6d3e)
And:
tests: Add $(top_srcdir)/include to AM_CPPFLAGS.
Fixes this build error with make check.
CC collision.o
In file included from ../../../../../src/mesa/main/hash_table.h:34:0,
from collision.c:31:
../../../../../src/mesa/main/compiler.h:51:53: fatal error: c99_compat.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
(cherry picked from commit a6bb7a9495)
We were handling the the dependency workaround for the first written reg
of a send preceding the one we're fixing up, but didn't consider the other
regs. Thus if you had two sampler calls that got allocated to the same
set of regs, one might, rarely, ovewrite the other. This was occurring in
XBMC's GLSL shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 4dc7e6dcbf)