Since each one is only 64b, and kernel allocations are a page anyway, this
lets us reduce buffer allocation by packing many CURBEs into one buffer, for
each batchbuffer submitted. Improves openarena performance by around 10%.
The draw module's vbuf stage builds buffers of post-transformed vertices
and issues draw-elements calls to render them. We'll pass the vertex and
index buffers to the SPUs...
The previous change gave us only two modes, one which looped over the batch
per cliprect (3d drawing) and one that didn't (state updeast).
However, we really want 4:
- Batch doesn't care about cliprects (state updates)
- Batch needs DRAWING_RECTANGLE looping per cliprect (3d drawing)
- Batch needs to be executed just once (region fills, copies, etc.)
- Batch already includes cliprect handling, and must be flushed by unlock time
(copybuffers, clears).
All callers should now be fixed to use one of these states for any batchbuffer
emits. Thanks to Keith Whitwell for pointing out the failure.
softpipe_map_surfaces get call several time but softpipe_unmap_surfaces
get call only once. So to make sure stuff are properly unmap when
softpipe_unmap_surfaces get call we map surfaces only one time in
softpipe_map_surfaces.
This may allow better concurrency (noop in openarena performance now), but is
also important for the previous commit -- otherwise, we may end up with
BufferData, draw_prims, BufferData and the draw_prims would use the new VBO
data instead of old. This could still occur with user-supplied VBOs and poor
use of MapBuffer without BufferData.
The comment about (vbo)_exec_api.c appeared to be stale, as the VBO code seems
to only use non-named VBOs (not actual VBOs) or freshly-allocated VBO data.
This brings a 2x speedup to openarena, because we can submit nearly-full
batchbuffers instead of many 450-byte ones.
This allows us to avoid re-emitting some state when validate_state happens
multiple times per batchbuffer. Even though we flush batch per primitive
currently, that may still happen already if the primitive changed (this should
probably be fixed as well).
Both drivers have ended up relying on lost_hardware being called after each
batch buffer, so update the name. This removes one of the calls on 965 whic
h was outside of the batchbuffer handling code and just duplicating what had
already happened through batchbuffer handling.
In particular, batch buffers are no longer flushed when switching from
CLIPRECTS to NO_CLIPRECTS or vice versa, and 965 just uses DRM cliprect
handling for primitives instead of trying to sneak in its own to avoid the
DRM stuff. The disadvantage is that we will re-execute state updates per
cliprect, but the advantage is that we will be able to accumulate larger
batch buffers, which were proving to be a major overhead.