We hash the input SPIR-V, specialization constants, entrypoint and the
shader key using SHA1 to determine a unique identifier for the
combination. A VkPipelineCache is then a hash table mapping these
identifiers to the corresponding prog_data and kernel data.
Aparently there are some issues in symbol resolution if an application
packages its own loader and you have a system-installed one. I don't
really understand the details, but it's not onorous to add.
The immediate write from PIPE_CONTROL is 64-bits at least on BDW. This
used to work on 64-bit archs because the compiler would align the following
anv_state struct up for us. However, in 32-bit builds, they overlap and it
causes problems.
We use the simple batch helper to submit a batch at driver startup time
which holds all the state that never changes. We don't have a whole lot
and once we enable tesselation there'll be even less. Even so, it's a
simple mechanism and reduces our steady state batch sizes a bit.
This is kind-of silly. We *really* need to do a better job of making sure
all objects have all their default values set. We probably also want to,
eventually, put everything into the BO (to save memory) and, more
specifically, make the GPU write the "ready" flag. That way GetFenceStatus
won't ever have to call into the kernel.
The GPU does most of this for us as long as we set up tight bounds for
the buffers, which we do. Additionally, we range check dynamically
buffers in the shader. With that it's safe to turn on robustBufferAccess.
We can't use the more fine-grained load and store fence commands (lfence
and mfence), since clflush is only guaranteed to be ordered with respect
to mfence.
As far as I can tell, this patch sets all pipeline multisample state
except:
- alpha to coverage
- alpha to one
- the dispatch count for per-sample dispatch
The border color packet is specified as a 64-byte aligned address relative
to dynamic state base address. The way the packing functions are currently
set up, we need to provide it with (offset >> 6) because it just shoves the
bits in where the PRM says they go and isn't really aware that it's an
address.
These are working as well as Broadwell and Cherryiew. The recent merge
from mesa master brings in Kabylake device info and that should be all
we need to enable that.
This is needed because compute push constant data is replicated per
invocation. For gen7, this can be up to 64. With a push constant data
max of 128 bytes, this is 8k of data. We need additional space for
local-id payloads, so we are going with 16k for now.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
This is not really a cache yet, but it allows us to share one state
stream for all pipelines, which means we can bump the block size without
wasting a lot of memory.
The kernel is going to give us whole pages anyway, so allocating part of a
page doesn't help. And this ensures that we can always work with whole
pages.
As per the spec:
minMemoryMapAlignment is the minimum required alignment, in bytes, of
host-visible memory allocations within the host address space. When
mapping a memory allocation with vkMapMemory, subtracting offset bytes
from the returned pointer will always produce a multiple of the value of
this limit.
anv_block_pool_init calls anv_block_pool_grow which checks
device->info.has_llc to see if it needs to set caching parameters.
If we don't set device->info early enough, this reads an undefined value
which is probably 0 and not what we want on llc platforms.
Found with valgrind.