This will at least allow us to make the initial gains to get decent
vertex performance much more quickly & with higher confidence of getting
it right.
At some later point can look again at code-generating all the
fetch/cliptest/viewport extras in the same block as the vertex shader.
For now, just need to get some decent baseline performance.
Makes state emission into a 2 phase, prepare sets things up and accounts
the size of all referenced buffer objects. The emit stage then actually
does the batchbuffer touching for emitting the objects.
There is an assert in dri_emit_reloc if a reloc occurs for a buffer
that hasn't been accounted yet.
So with compiz on Intel hw with fake bufmgr, opening 4 firefox windows at 1680x1050 and hitting alt-tab, could cause the batchbuffer to try and reference more than the 32MB of RAM allocated.
Fix 1:
Fix 1 is to pre-verify the list of buffers against the current batchbuffer and if it can't possibly fit in the aperture to flush the batchbuffer to the hardware
and try again. If the buffers still can't fit well then you are hosed as I'm not sure there is a nice way to tell anyone.
Fix 2:
Next problem was that even with a simple check for total < aperture, we ran
into fragmentation issues, this meant that half way down a set of buffers,
we would fail as no blocks were available. Fix this by nuking the memory
manager from orbit and letting it start again and relayout the blocks in a
manner that fits.
Fix 3:
Finally the initial problem we were seeing was a memcpy to a NULL backing store.
We seem to end up with a texture at some point that never gets mapped but ends up with data in it. compiz al-tab icons have this property. So I created a card dirty bit that memcpy's any buffer that is !static and is written to back to memory. This probably is wrong but it makes compiz work for now.
Caveats:
965 support is still fail.