The only mismatch between the two is that we have to clear the
destination's alpha to 1.0. Fixes WOW performance on my Ironlake,
from a few frames a second to almost playable.
Before, we were going off of a couple of known (hopeful) matches
between internalFormats and the cpp of the read buffer. Instead, we
can now just look at the gl_format of the two to see if they match.
We should avoid bad blits that might have been possible before, but
also allow different internalFormats to work without having to
enumerate each one.
The blit that follows appears in the command stream so it's serialized
with previous rendering. Any queued vertices in the tnl layer were
already flushed up in mesa/main/.
Create a constant int pointer to the C function, then cast it to the
function's type. This avoids using trampoline code which seem to be
inadvertantly freed by LLVM in some situations (which leads to segfaults).
The root issue and work-around were found by José.
NOTE: This is a candidate for the 7.10 branch
Michel Hermier reported libdrm segfault (and kernel crash) on nv40 using
gallium :
http://www.mail-archive.com/nouveau@lists.freedesktop.org/msg06563.html
It turns out these were caused by some missing WAIT_RING (or wrong
computation of the WAIT_RING sizes). Unlike all other libdrm_nouveau users,
nvfx gallium tried to use a mininum calls of WAIT_RING, one WAIT_RING could
apply to many methods for different code paths and spread across several
functions. This made it too tricky to find out what the missing or wrong
WAIT_RING was.
By restoring BEGIN_RING, we force one WAIT_RING per method, and it's much
easier to check if the free size required in the pushbuffer is correct. As
curro said, "let's keep it simple for the maintainers until the big
bottlenecks are gone"
Benchmarked on nv35 with openarena, nexuiz and ut2004 and no performance
regression.
The core of this patch was made with Coccinelle, with minor manual fixes
made on top.
Tested-by: Michel Hermier <hermier@frugalware.org>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
This is the hack for input interactivity of frontbuffer rendering
(like we do for backbuffer at intelDRI2Flush()) by waiting for the n-2
frame to complete before starting a new one. However, for an
application doing multiple contexts or regular rebinding of a single
context, this would end up lockstepping the CPU to the GPU because
every unbind was considered the end of a frame.
Improves WOW performance on my Ironlake by 48.8% (+/- 2.3%, n=5)