The pull constants require sending out to an overworked shared unit
and waiting for a response, while push constants are nicely loaded in
for us at thread dispatch time. By putting things we access in every
VS invocation there, ETQW performance improved by 2.5% +/- 1.6% (n=6).
Previously for frame throttling we would wait on the first batch after
a swap before emitting another swap, because we had no hook after a
swap was emitted. This meant that if an app managed to squeeze
everything it for a frame had into one batch, it would lock-step with
the GPU. With the swapbuffers changes, we now have the entrypoint we
want.
This takes the WoW intro screen from 25% GPU idle and visibly jerky to
4-5% GPU idle and rather smooth. Other apps such as OpenArena have
run into this problem as well.
Sun cc 5.9 and later (__SUNPRO_C >= 0x590) support __attribute__ calls
for aligned, always_inline, noinline, pure, const, and malloc.
This commit includes updates to files that were regenerated by gl_XML.py
after adding the __SUNPRO_C checks to it
Signed-off-by: Alan Coopersmith <alan.coopersmith@sun.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
the new relocation for CB_COLOR0_FRAG & CB_COLOR0_TILE add 4
dwords to the default command stream. Increase the prediction
default size to take this into account
FRAG & TILE buffer are unused but still they need
to be associated with a valid relocation so that
userspace can't try to abuse them to overwritte
GART and then try to write anywhere in system
memory.
rtype enums are different, DST_REG_OUTPUT got SRC_REG_CONSTANT in some
shaders and produced invalid output/hang
as TEX output is temp register always set out src to SRC_REG_TEMPORARY
4 samples should be enough for GLUT to be satisfied, and I think most
of the HW that does any MSAA, can do it.
Note that any pipe that doesn't multisample can just ignore the
corresponding flag in pipe_rasterizer_state.
It uses a slow path to copy the render buffer of the surface to the
target pixmap. We might be able to create a pipe context for
EGLDisplay's use and use a blitter context for the purpose. It is left
for a future consideration.
A validate call asks for the buffers of a native surface. Using a mask
to represent the interested buffers is more intuitive. It also rules
out corner cases such as a single attachment being listed multiple
times.