The GPU does most of this for us as long as we set up tight bounds for
the buffers, which we do. Additionally, we range check dynamically
buffers in the shader. With that it's safe to turn on robustBufferAccess.
We can't use the more fine-grained load and store fence commands (lfence
and mfence), since clflush is only guaranteed to be ordered with respect
to mfence.
As far as I can tell, this patch sets all pipeline multisample state
except:
- alpha to coverage
- alpha to one
- the dispatch count for per-sample dispatch
The border color packet is specified as a 64-byte aligned address relative
to dynamic state base address. The way the packing functions are currently
set up, we need to provide it with (offset >> 6) because it just shoves the
bits in where the PRM says they go and isn't really aware that it's an
address.
These are working as well as Broadwell and Cherryiew. The recent merge
from mesa master brings in Kabylake device info and that should be all
we need to enable that.
This is needed because compute push constant data is replicated per
invocation. For gen7, this can be up to 64. With a push constant data
max of 128 bytes, this is 8k of data. We need additional space for
local-id payloads, so we are going with 16k for now.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
This is not really a cache yet, but it allows us to share one state
stream for all pipelines, which means we can bump the block size without
wasting a lot of memory.
The kernel is going to give us whole pages anyway, so allocating part of a
page doesn't help. And this ensures that we can always work with whole
pages.
As per the spec:
minMemoryMapAlignment is the minimum required alignment, in bytes, of
host-visible memory allocations within the host address space. When
mapping a memory allocation with vkMapMemory, subtracting offset bytes
from the returned pointer will always produce a multiple of the value of
this limit.
anv_block_pool_init calls anv_block_pool_grow which checks
device->info.has_llc to see if it needs to set caching parameters.
If we don't set device->info early enough, this reads an undefined value
which is probably 0 and not what we want on llc platforms.
Found with valgrind.
We're required to expose a host-visible, coherent memory type. On big
core GPUs that share, LLC, we can expose one such memory type that's
also cached. However, on non-LLC GPUs we can't both be cached and
coherent. Thus, we expose both the required coherent type and the cached
but non-coherent combination.