The anv_block_pool data structure suffered from the exact same race as the
state pool. Namely, that the uniqueness of the blocks handed out depends
on the next_block value increasing monotonically. However, this invariant
did not hold thanks to our block "return" concept.
The previous algorithm had a race because of the way we were using
__sync_fetch_and_add for everything. In particular, the concept of
"returning" over-allocated states in the "next > end" case was completely
bogus. If too many threads were hitting the state pool at the same time,
it was possible to have the following sequence:
A: Get an offset (next == end)
B: Get an offset (next > end)
A: Resize the pool (now next < end by a lot)
C: Get an offset (next < end)
B: Return the over-allocated offset
D: Get an offset
in which case D will get the same offset as C. The solution to this race
is to get rid of the concept of "returning" over-allocated states.
Instead, the thread that gets a new block simply sets the next and end
offsets directly and threads that over-allocate don't return anything and
just futex-wait. Since you can only ever hit the over-allocate case if
someone else hit the "next == end" case and hasn't resized yet, you're
guaranteed that the end value will get updated and the futex won't block
forever.
We have pools, so we should be using them. Also, I think this will help
keep valgrind from getting confused when we have to end up fighting with
system allocations such as those from malloc/free and mmap/munmap.
Jason started the task by creating anv_cmd_buffer.c and anv_cmd_emit.c.
This patch finishes the task by renaming all other files except
gen*_pack.h and glsl_scraper.py.