If an instruction has multiple defs, we have to do a lot more checks to
make sure that we can move it forward. Among other things, various code
likes to do
a, b = tex()
if () c = a
else c = b
which means that a single phi node will have results pointing at the
same instruction. We obviously can't propagate the tex in this case, but
properly accounting for this situation is tricky. Just don't try for
instructions with multiple defs.
This fixes about 20 shaders in shader-db, including the dolphin efb2ram
shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 3ca941d60e)
It appears that the nvidia render engine is quite picky when it comes to
linear surfaces. It doesn't like non-256-byte aligned offsets, and
apparently doesn't even do non-256-byte strides.
This makes arb_clear_buffer_object-unaligned pass on both nv50 and nvc0.
As a side-effect this also allows RGB32 clears to work via GPU data
upload instead of synchronizing the buffer to the CPU (nvc0 only).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> # tested on GF108, GT215
Tested-by: Nick Sarnie <commendsarnex@gmail.com> # GK208
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 3ca2001b53)
When using the "shared" vertex array configuration strategy, we bind
each of the buffers as a separate array. However there can be holes in
such vertex buffer lists, so just emit a disable for those.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 438d421f8b)
Just make sure that after we've submitted, we get to at least 5
(global) submits ago before we go on to do more. Prevents up to
seconds of lag with window movement in X with xcompmgr -c. There may
be useful tuning to do in the future, but for now this gets us
usability.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 3fba517bdd)
On an error return, the returned seqno will probably be unset, so we'd
lose track of what we've submitted so far for waiting on in the
future.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 2a449ce7c9)
When RA fails, and we spill, we have to clean everything up before doing
RA again. We were forgetting to reset the hi/lo linked lists - at
least the hi list is guaranteed to still have pointers to now-deleted
RIG nodes.
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 19ae5de981)
When setting the conservative thread counts, I halved everything. That isn't
correct for the wm, which has nothing to do with actual thread counts. I suck.
BXT only has 1 slice, and there is some ambiguity about subslices, so just
reserve the max possible for now. It looks like this might fix:
piglit.spec.glsl-1_50.execution.variable-indexing.gs-output-array-vec4-index-wr.bxtm64.
I kind of question why that is, but it is what Jenkins says.
Mark is current running some of the other blacklisted tests on this patch. (it
effects anything requiring scratch space).
Cc: mesa-stable <mesa-stable@lists.freedesktop.org>
Cc: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit a443b5b732)
_mesa_texture_parameteriv is used because (the more obvious)
_mesa_texture_parameteri just stuffs the parameter in an array and calls
_mesa_texture_parameteriv. This just cuts out the middleman.
As a side bonus we no longer need check that ARB_stencil_texturing is
supported. The test doesn't allow non-supporting implementations to
avoid any work, and it's redundant with the value-changed test.
Fix bug #93717 because the state restore commands at the bottom of
_mesa_meta_GenerateMipmap no longer depend on the bound state.
Fixes piglit arb_direct_state_access-generatetexturemipmap with the
changes recently sent to the piglit mailing list. See the bugzilla
entry for more info.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93717
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit 2542871387)
Commit c246828c added the code to save and restore the stencil
texturing mode. The restore, however, was erroneously inside the
'target != GL_TEXTURE_RECTANGLE' block.
Fixes piglit test 'arb_stencil_texturing-blit_corrupts_state
GL_TEXTURE_RECTANGLE'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit 18b0ba340b)
This fixes a VM fault and possible lockup in high memory pressure situations.
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
(cherry picked from commit 1067e6eb55)
Specifically, when the API switches from using a GS to not using a GS and then
back to using the same GS again, we do not have to re-send all the GS state,
but we do have to send VGT_GS_MODE. So make VGT_GS_MODE consistently be a part
of the VS state.
This fixes a rendering bug in Dolphin, but surely other applications are
affected as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93648
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 004fcd4230)
Normally there's a producer and consumer, and the producer var gets
picked. In both the vertex->gs and tes->gs cases, that's the un-arrayed
version.
In the SSO case, however, there is no producer. So we picked the arrayed
GS variable, and as a result, used more slots than we should. More
critically, these slots would also no longer line up with the producer's
calculation. To fix this, we need to fix up the type of the variable
based on stage no matter what.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93650
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit dac2964f3e)
This will be used in the following patch for calculating array sizes correctly
when reserving explicit varying locations.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
(cherry picked from commit 5907a02ab6)
Otherwise the user has no way of using it, and we'll try to access the
linear one.
v2:
- Bail out when KHR_gl_colorspace is missing and srgb is set (Marek)
Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Fixes: c2c2e9ab604(egl: implement EGL_KHR_gl_colorspace (v2))
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91596
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
(cherry picked from commit 54702c2fa1)
By using whole static libraries the android buildsystem provides
whole-archive (alike) solution. This means that we don't need to worry
about the order of the static libraries and any reverse, recursive or
circular dependencies that they have between one another.
Without this the linker will discard any unused hunks of one library
and we'll end up with unresolved symbols as those are required by
another static library. This issue has become more prominent with the
introduction of pipe-loader.
Whole static libraries has been used in i915/i965 for a very long
time, so we might do the same.
v2:
- Better commit message (Ilia)
- Keep external dependencies as [normal] static libs (Mauro)
Cc: mesa-stable@lists.freedesktop.org
Cc: Mauro Rossi <issor.oruam@gmail.com>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit f29a772a7e)
With an earlier commit we've spit the flags parsing to a separate
function, but forgot to update all the dri modules to use it.
Noticed when we've enabled KHR_debug for every dri module - fdo#93048
Fixes: 38366c0c6e "dri_util: Don't assume __DRIcontext->driverPrivate
is a gl_context"
Cc: Mark Janes <mark.a.janes@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit 72fda2b710)
The buffers are referenced from r600_update_driver_const_buffers()
-> r600_set_constant_buffer() -> u_upload_data(), but nothing
ever releases the reference. Similar case with driver_consts.
Found using valgrind.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
[Emil Velikov: resolve trivial conflicts]
(cherry picked from commit 0153ff8379)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Apparently, nobody has combined stippling with a fragment shader
containing immediates in almost five years...
Fixes a bug in Kodi with radeonsi reported by Christian König.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e6281a2850)
Print the stream value not the pointer to the expression,
also use the unsigned format specifier.
Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit d018619d7f)
One of the oglconform tests was crashing here, and it was
due to not cloning the actual parameters before creating the
new call. This makes a call clone function that does the right
things to make sure we clone all the needed info, and points
the callee at it. (It differs from ->clone due to this).
this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this
patch in my cts fixes tree, but hadn't had time to make sure I liked it.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 119bef9543)
From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL
4.5 Core spec:
"The command
void UniformSubroutinesuiv(enum shadertype, sizei count,
const uint *indices);
will load all active subroutine uniforms for shader stage
shadertype with subroutine indices from indices, storing
indices[i] into the uniform at location i. The indices for
any locations between zero and the value of
ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not
used will be ignored."
V2: simplify NULL check suggested by Jason.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.orghttps://bugs.freedesktop.org/show_bug.cgi?id=93731
(cherry picked from commit 86677f1016)
The ARB has decided that implicit conversions should be performed for
bitwise operators in future language revisions. Implementations of
current language revisions may or may not perform them.
This patch makes Mesa apply implicti conversions even on current
language versions. Applications appear to expect this behavior,
and there's really no downside to doing so.
Fixes shader compilation in Shadow of Mordor.
Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d54a70aa18)
In the vertex and fragment stages, the hardware is nice to us and leaves
g0.2 zerod out for us so we can use it for headers. However, in compute,
geometry, and tessellation stages, the hardware is not so nice. In
particular, for compute shaders on BDW, the hardware places some debug bits
in 23:15. As it happens, bit 15 is interpreted by the sampler as the alpha
channel mask. This means that if you use a texturing instruction with a
header in a compute shader, you may randomly get the alpha channel
disabled. Since channel masks affect the return length of the sampler
message, this can lead the GPU to expect a different mlen to the one you
specified in the shader and this, in turn, hangs your GPU.
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 61b0cfd84e)
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
[Emil Velikov: drop not applicable TES changes]
(cherry picked from commit 9870f798be)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_fs_generator.cpp
src/mesa/drivers/dri/i965/brw_shader.cpp
BDW adds the following restriction: "When multiplying DW x DW, the dst
cannot be accumulator."
Cc: "11.1,11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 0a6811207f)
I'm not sure about the consequences of this bug, but it's definitely
dangerous.
This applies to SI, CIK, VI.
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit dc96a18d24)
Currently, opt_vectorize() tries to combine:
result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);
into a single ir_quadop_bitfield_insert opcode, which operates on
ivec4s. However, GLSL IR's opcodes currently require the bits and
offset parameters to be scalar integers. So, this breaks.
We want to be able to vectorize this eventually, but for now, just
chicken out and make opt_vectorize() bail by marking all the bitfield
insert/extract related opcodes as horizontal. This is a relatively
uncommon case today, so we'll do the simple fix for stable branches,
and fix it properly on master.
Fixes assertion failures when compiling Shadow of Mordor vertex shaders
on i965 in vec4 mode (where OptimizeForAOS enables opt_vectorize()).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: resolve trivial conflicts]
(cherry picked from commit 5e3edd4b28)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/glsl/ir.h
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 051603efd5)
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 1b74c02e83)
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 8882b46226)
This is more future-proof than the current code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1c2187b1c2)
gl_buffer_object has grown more complicated and requires cleanup. Using this
function from drivers will be more future-proof.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 6aed083b93)
Add PCI IDs for the Intel Kabylake platforms. The IDs are taken
directly from the Linux kernel patches, which are under review:
http://lists.freedesktop.org/archives/intel-gfx/2015-October/078967.htmlhttp://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=kbl-upstream-v2
The Kabylake PCI IDs taken from the kernel are rearranged to be in order
of GT type, then PCI ID.
Please note that if this patch is backported, the following fixes will
need to be added before this patch:
commit 28ed1e08e8 "i965/skl: Remove early platform support"
commit c1e38ad370 "i965/skl: Use larger URB size where available."
Thanks to Ben for fixing a bug around setting urb.size, and being
patient with my questions about what the various fields mean.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (KBL-GT2)
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 39c41be50d)
We were checking the dirty->st flags but not the dirty->mesa flags.
When we took the early return, we didn't clear the dirty->mesa flags
so the next time we called st_validate_state() we'd often flush the
glBitmap cache. And since st_validate_state() is called from
st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap()
call.
This change seems to recover most of the performance loss observed
with the ipers demo on llvmpipe since commit commit 36c93a6fae.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: José Fonseca <jfonseca@vmware.com>
(cherry picked from commit c28d72a347)
The nir_opt_algebraic rule
(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),
can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv. (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point. So, make NIR do it for us.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7295f4fcc2)
Experimentally, 4M causes corruption and slowness, try to ramp it up
with size instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b16c9be4a5)
H264 doesn't have a bitplane bo. We just need a device reference, so use
the one from the client.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b5f2f7073f)
Spotted by luck. The GLSL uniform storage is only associated once
in LinkShader and can't be reallocated afterwards, because that would
break the association.
v2: don't remove st_upload_constants calls, clarify why they're needed
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 36c93a6fae)
The next commit will use this.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 294ed5cd13)
First off, we can't flush in the middle of a command. Secondly
requesting the extra push space might cause a flush to happen. If that
flush happens, we'd have to do the PUSH_REFN again. So instead do
PUSH_REFN after the push space request. This helps avoid rare crashes
with supertuxkart in libdrm due to assertion failures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c1d14c6817)
[Emil Velikov: resolve trivial conflict]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c