This is implemented in nvfx_state_fb and fragtex but was missing
in nvfx_screen.
This allows to avoid glCopyTexSubImage CPU fallbacks and makes Doom 3
much faster as a result.
This was proposed by Marek Olšák and no one objected, so just
pushing it.
The extension is currently not exposed, because the mechanism to
discover if the driver actually supports this is missing.
We probably should change is_format_supported to handle this too.
This will allow to test Gallium drivers anyway in the meantime.
Based on work by Dave Airlie.
Changes by me:
1. Fix assertion in st
2. Change to use unpadded Gallium formats
Autobinding creates additional pushbuffer usage which may not be
accounted in callers, and is also slow.
The next relocations patch depends on this for correctness.
Assert instead if the objects are not bound, which should happen at
screen creation time.
RING_3D creates a method start for subchannel 7.
Bind the 3D engine to a fixed subchannel to make it work
This is much faster than the old BEGIN_RING, since we don't need
to waste cycles trying to "autobind" stuff, when a fast static binding
is perfectly good.
Subchannel 7 is chosen because the kernel takes up the lowest ones.
Currently we miscalculate the space needed to push vertices, causing
flushes where they should not happen.
Use a much more conservative estimate to fix it.
It will be done better in the future (e.g. using the nv50 primitive
splitter).
CFLAGS needs to be passed, as you already know.
Commit 3e17a5b047 broke this by adding a new link
command without CFLAGS.
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
This reverts commit bd09fce271. Török
Edwin sent the correct fix to the list a couple days ago in
<1270832747-15611-1-git-send-email-edwintorok@gmail.com>.
Otherwise, we read from VRAM...
Yes, again, it should be fixed to tell whether the buffer is in
VRAM or not and behave appropriately.
But this should be in pipebuffer/a generic layer; revisit this later
too.
Currently we are relocating transfers to VRAM to use the blitter,
which is terrible.
Maybe for ->VRAM the blitter could be better, but we can't be
perfectly sure of that due to relocations.
In other words, just do the simple thing, and defer fine-tuning the
transfer hardware method to a later stage, while making it work
decently now.