This allows us to avoid re-emitting some state when validate_state happens
multiple times per batchbuffer. Even though we flush batch per primitive
currently, that may still happen already if the primitive changed (this should
probably be fixed as well).
Both drivers have ended up relying on lost_hardware being called after each
batch buffer, so update the name. This removes one of the calls on 965 whic
h was outside of the batchbuffer handling code and just duplicating what had
already happened through batchbuffer handling.
In particular, batch buffers are no longer flushed when switching from
CLIPRECTS to NO_CLIPRECTS or vice versa, and 965 just uses DRM cliprect
handling for primitives instead of trying to sneak in its own to avoid the
DRM stuff. The disadvantage is that we will re-execute state updates per
cliprect, but the advantage is that we will be able to accumulate larger
batch buffers, which were proving to be a major overhead.
Each array element is now a BUFFER_x token rather than a BUFFER_BIT_x bitmask.
The number of active color buffers is specified by _NumColorDrawBuffers.
This builds on the previous DrawBuffer changes and will help with drivers
implementing GL_ARB_draw_buffers.
These fields are no longer indexed by shader output. Now, we just have
a simple array of renderbuffer pointers.
If the shader writes to gl_FragData[i], send those colors to the N
_ColorDrawBuffers. Otherwise, replicate the single gl_FragColor (or
the fixed-function color) to the N _ColorDrawBuffers.
A few more changes and simplifications can follow from this...
Match behaviour of DRI driver. Fix fragment shader to find the other
parameters one slot further on. Will need more work to cope with FP's
that actually reference position.
By avoiding the repeated relocation buffer creation/map/unmap/destroy for each
new batch buffer, this improves OpenArena framerates by 30%. Caching batch
buffers themselves doesn't appear to be a significant performance win over
this change.
We have two consumers of relocations. One is static state buffers, which
want the same relocation every time. The other is the batchbuffer, which gets
thrown out immediately after submit. This lets us reduce repeated computation
for static state buffers, and clean up the code by moving relocations nearer
to where the state buffer is computed.
The cell "render_stage" (last in the "draw" pipeline) emits vertices into
a buffer which is pulled by the SPUs in response to a "RENDER" command.
This is pretty much temporary/scaffold code for now.