It was setting XYWZ swizzle and writemask to all uniforms, no matter if they
were a vector or scalar, so this can lead to problems when loading them
to the push constant buffer.
Moreover, 'shift' calculation was designed to calculate the offset in
DWORDS, but it doesn't take into account DFs, so the calculated swizzle
for the later ones was wrong.
The indirect case is not changed because MOV INDIRECT will write
to all components. Added an assert to verify that these uniforms
are aligned.
v2:
- Fix 'shift' calculation (Curro)
- Set both swizzle and writemask.
- Add assert(shift == 0) for the indirect case.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We are going to add a packing feature to reduce the usage of the push
constant buffer. One of the consequences is that 'nr_params' would be
modified by vec4_visitor's run call, so we need to restore it if one of
them failed before executing the fallback ones. Same thing happens to the
uniforms values that would be reordered afterwards.
Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when
the dvec4 alignment and packing patch is applied.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
If X11 did a software fallback to the entire screen, we would throw out
the BO the screen is scanning out from and allocate a new one.
Cc: mesa-stable@lists.freedesktop.org
Before: DrawElements (16 VBOs) w/ no state change: 4.34 million/s
After: DrawElements (16 VBOs) w/ no state change: 8.80 million/s
This inefficiency was uncovered by Timothy Arceri's no_error work.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Together with some fixes to xdriinfo this fixes xdriinfo not working
with glvnd.
Since apps (xdriinfo) expect GetDriverConfig to work without going to
need through the dance to setup a glxcontext (which is a reasonable
expectation IMHO), the dispatch for this ends up significantly different
then any other dispatch function.
This patch gets the job done, but I'm not really happy with how this
patch turned out, suggestions for a better fix are welcome.
Cc: Kyle Brenneman <kbrenneman@nvidia.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
This change fixes the build break with llvm-svn.
r301981 of llvm-svn made add/remove of function attributes
use AttrBuilder instead of AttributeList.
Tested with llvm-3.9, llvm-4.0, llvm-svn.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Commit 6facb0c0 ("android: fix libz dynamic library dependencies")
unconditionally adds libz as a dependency to all shared libraries.
That is unnecessary.
Commit 85a9b1b5 introduced libz as a dependency to libmesa_util.
So only the shared libraries that use libmesa_util need libz.
Fix Android Lollipop build by adding the include path of zlib to
libmesa_util explicitly instead of getting the path implicitly
from zlib since it doesn't export the include path in Lollipop.
Fixes: 6facb0c0 "android: fix libz dynamic library dependencies"
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rob Herring <robh@kernel.org>
This reduces duplication between the dsa and non-dsa function
and will also be used in the following commit to add
KHR_no_error support.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
To be used to add KHR_no_error support while sharing code between
the DSA and non-DSA OpenGL function.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All combined depth stencil buffers (even those with just stencil)
require a 4x4 alignment on Sandy Bridge. The only depth/stencil buffer
type that requires 4x2 is separate stencil.
Reviewed-by: Chad Versace <chadversary@chromium.org>
The Ivy Bridge PRM provides a nice table that handles most of the
alignment cases in one place. For standard color buffers we have a
little freedom of choice but for most depth, stencil and compressed it's
hard-coded. Chad's original functions split halign and valign apart and
implemented them almost entirely based on restrictions and not the
table. This makes things way more confusing than they need to be. This
commit gets rid of the split and makes us implement the exact table
up-front. If our surface isn't one of the ones in the table then we
have to make real choices.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The reasoning Chad gave in the comment for choosing a valign of 4 is
entirely bunk. The fact that you have to multiply pitch by 2 is
completely unrelated to the halign/valign parameters used for texture
layout. (Not completely unrelated. W-tiling is just Y-tiling with a
bit of extra swizzling which turns 8x8 W-tiled chunks into 16x4 y-tiled
chunks so it makes everything easier if miplevels are always aligned to
8x8.) The fact that RENDER_SURFACE_STATE::SurfaceVerticalAlignmet
doesn't have a VALIGN_8 option doesn't matter since this is gen7 and you
can't do stencil texturing anyway.
v2 (Jason Ekstrand):
- Delete most of Chad's comment and add a more descriptive commit
message.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>