The existing code uses SSSE3, and because it isn't compiled in a
separate file compiled with that, it is usually not used (that, of
course, could be fixed...), whereas SSE2 is always present with 64-bit
builds. This should be pretty much as fast as the pshufb version,
albeit those code paths aren't really used on chips without llc in any
case.
v2: fix andnot argument order, add comments
v3: use pshuflw/hw instead of shifts (suggested by Matt Turner), cut comments
v4: [mattst88] Rebase
Reviewed-by: Matt Turner <mattst88@gmail.com>
Replaces four byte loads and four byte stores with a load, bswap,
rotate, store; or a movbe, rotate, store.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This was triggered by
dEQP-GLES3.functional.vertex_array_objects.all_attributes
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Move the buffer resource extraction code out into its own function.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
It is incorrect to assume that pixel format is always in BGR byte order.
We need to check bitmask parameters (such as |redMask|) to determine whether
the RGB or BGR byte order is requested.
v2: reformat code to stay within 80 character per line limit.
v3: just fix the byte order problem first and investigate SRGB later.
v4: rebased on top of the GLES3 sRGB workaround fix.
v5: rebased on top of the GLES3 sRGB workaround fix v2.
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes
dEQP-GLES31.functional.uniform_location.negative.atomic_fragment
dEQP-GLES31.functional.uniform_location.negative.atomic_vertex
Both of which have lines like
layout(location = 3, binding = 0, offset = 0) uniform atomic_uint uni0;
The ARB_explicit_uniform_location spec makes a very tangential mention
regarding atomic counters, but location isn't something that makes sense
with them.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The XMesaVisual instances freed in the visuals table on display close
are being freed with a free() call, instead of XMesaDestroyVisual(),
causing a memory leak.
Signed-off-by: John Sheu <sheu@google.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
GLvoid was used before in OpenGL but it has changed to just using void.
All GLvoids in mesa's state tracker has been changed to void in this patch.
Tested this with piglit and no problems were found. No compiler warnings.
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Now it follows the compatibility criteria listed in section 2.1 of
the GLX 1.4 specification.
This is needed for post-process effects in SW:KotOR.
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
- Check for unused blocks every few frames or every 64K draws
- Delete data unused since the last check if total unused data is > 20MB
Doesn't seem to cause a perf degridation
Acked-by: Brian Paul <brianp@vmware.com>
If the glDrawPixels size changed, we leaked the previously cached
texture, if there was one. This patch fixes the reference counting,
adds a refcount assertion check, and better handles potential malloc()
failures.
Tested with a modified version of the drawpix Mesa demo which changed
the image size for each glDrawPixels call.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
It's the same as radeonsi. This adds guard band support to r600g.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Guard band clipping speeds up rasterization for primitives that are
partially off-screen. This change in particular results in small
framerate improvements in a wide range of games.
Started by Grigori Goronzy <greg@chown.ath.cx>.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
and clamp it right before emitting. This is a prerequisite for computing
the guard band.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Just for consistency. This should have no effect, because OpenGL textures
always go to VRAM.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This makes Tonga with vramlimit=128 2x faster in Heaven.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>