This commit switches block pools over to being allocated from the BO
cache rather than being allocated manually by the block pool.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're about to start depending on the BO cache in the state and block
pools so we need them properly initialized for the tests to work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Some of the tests were actually relying on some of those uninitialized
bits to be non-zero. In particular, a couple want use_softpin = true.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
All block pools are allocated with the same flags. There's no good
reason why it needs to be configurable.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes a number of changes to the current API:
1. Everything is renamed to anv_device_* instead of anv_bo_cache_*
because the BO cache is soon going to be the sole BO allocation path
and not some special case to make import/export work.
2. Drop the cache parameter. It's totally redundant with the device
and just annoying to keep typing.
3. Rework flags so that they go the convenient direction for usage in
ANV rather than whichever awkward way the i915 specified it to
maintain backwards compatibility. This also gives us the
opportunity to set some defaults.
4. Add flags for mapping and coherency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The growing algorithms for the softpin case and the userptr version are
almost entirely different. Having this weird join doesn't make the code
more comprehensible. This rework does a few things:
1. Move the comment about 48-bit addresses to anv_device_init where we
actually unset the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag.
2. Separate the paths in anv_block_pool_expand_range so it's easier to
see what happens in the two different cases.
3. Use the anv_block_poo::bos array for storing all allocated BOs in
both paths rather than using the cleanup list in both paths. This
lets us make the cleanups array only used for mmaps of the memfd for
the userptr case.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of depending on a mutable BO in the state pool for handling
growing state pools, add a concept of "wrapper" BOs which just wrap an
actual BO. This way, the wrapper can exist once for all of time and we
can put it in relocation lists even if the actual BO it references gets
swapped out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're not THAT strapped for space that we can't burn one extra bit for
a boolean. If we're really worried about it, we can always shrink the
flags field to 16 bits because the kernel only uses 7 currently.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It has exactly one caller and we're about to change some of the dynamics
which would make this confusing as a separate function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We have to go through and rewrite them all anyway so it doesn't do us
any good to put them in the list in anv_reloc_list_add. Also, for state
pools the handles are likely wrong by the time vkQueueSubmit is called.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously, we would read the offset from the BO in anv_reloc_list_add
to generate the presumed offset and then again in the caller to compute
the 64-bit address to write into the buffer. However, if the offset
somehow changed between these two points, the presumed offset would no
longer match the written offset. This is unlikely to actually ever be a
problem in practice because the presumed offset gets recorded first and
so if the written address is wrong then the presumed offset is almost
certainly wrong and the relocation will trigger. However, it's much
safer to simply have anv_reloc_list_add return the 64-bit address.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us do less allocation because the anv_bo's are now embedded in
the sparse array and it also allows lock-free translation from GEM
handle to BO which will be useful in future commits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The resulting locale is not used for Vulkan, and it is not reference
counted, giving issues when multiple instances are created.
CC: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too. When it didn't find
it, the script would fail with:
Traceback (most recent call last):
File "./report.py", line 351, in <module>
main()
File "./report.py", line 182, in main
before_count = before[p][m]
KeyError: 'sends'
Fixes: f192741ddd ("intel/compiler: Report the number of non-spill/fill SEND messages")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These reworks were combined into this patch:
* Matt Turner: i965: Disable NoDDChk/NoDDClr test on Gen12+
* Francisco Jerez: intel/eu/validate/gen12: Disable
qword_low_power_no_depctrl eu_validate test.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On gen11 and older, compressed images are tiled and aligned to 4K. On
gen12 this 4K alignment restriction was removed. However, only aligning
the fast clear color buffer to 64B (a cacheline, as it's on the
documentation) is causing some bugs where the fast clear color is not
converted during the fast clear operation. Aligning things to 4K seems
to fix it.
v2: Assert that image->planes[plane].offset is 4K aligned (Nanley)
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The EU compaction unit test fuzzes the compaction code by flipping bits.
We use a simple skip_bits() function with a list of reserved bits to
ignore, but for more complex cases like invalid combinations of register
file:type, we need either machinery to check validity or for these
functions to simply inform us whether a combination was valid.
enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it
with an "INVALID" value, just return -1 and let the caller check for
that.
Scott suggested redefining unreachable() within the unit test to
longjmp() which would allow driver code like this to still use it and
allow the test to handle expected failures like this. If that plan works
out, I plan to revert this.
This shaves around 4-5% off of a CPU-limited example running with the
Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur. Unfortunately, he neglected to set
the bit in the vec4 back-end. This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable. We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.
Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen12+, Stencil buffer's lossless compression should be resolved
with WM_HZ_OP packet.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We never saw any failures regarding this typo but it's good to assign
correct stencil view while constructing blorp_params.
Fixes: 0cabf93b80 "intel/blorp: Add an entrypoint for clearing depth and stencil"
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The original value of 256 was under the assumption that you're a batch
buffer which is likely going to have a large number of relocations.
However, pipeline objects on Gen7 will have at most 6 relocations (one
per shader stage and one for the workaround BO) so this is a lot of
per-pipeline wasted space.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The old relocation list code always allocated 256 relocations and a hash
set up-front without knowing whether or not we really need them. In
particular, in the softpin case, this is two fairly large allocations
that we don't need to be making. Also, for pipeline objects on haswell
where we don't have softpin, we don't need relocations unless scratch is
used so this is extra data per-pipeline. Instead, we should do it
on-demand. This shaves 3.5% off of a cpu-limited example running with
the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Add depth bounds testing to the list of supported
physical device features.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The bias for the 3DSTATE_DEPTH_BOUNDS instruction
should be 2 not 1.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
A few equations/programming changes for ICL.
v2: Fix a couple of issues in naming and floating/integer operations (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The anv_batch_bo contents are linked one to another, and when printing
we have to start with the first of those. Since in `u_vector` new
elements are added to the head, to get the first element we need the
vector's tail.
Fixes: 32ffd90002 ("anv: add support for INTEL_DEBUG=bat")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>