It's included in declaration of INTEL_DEBUG.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6732>
In FreeBSD x86 and aarch64 __u64 is typedef to unsigned long and
is the same size as unsigned long long.
Since we are explicitly specifying the format, cast the value
to the proper type.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emmanuel <manu@FreeBSD.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3559>
The render cache hash table is now *mostly* redundant with the more
general seqno matrix-based cache tracking mechanism. Most hash table
operations are now gone except for the format mismatch checks done in
iris_cache_flush_for_render(). Redundant code removed as a separate
patch for bisectability.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The depth cache set is now redundant with the more general seqno
matrix-based cache tracking mechanism. Removed as a separate patch
for bisectability.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This introduces a representation of the cache coherency status of the
GPU at any point in the batch. This is done by defining a matrix C of
synchronization sequence numbers such that at any point of batch
construction, a memory operation from domain i introduced into the
batch is guaranteed to be ordered after any memory operation from
domain j in a previous batch section with seqno n if the following
condition holds:
C_i_j >= n
This allows us to efficiently determine whether additional flushing
and/or invalidation is required in order to access a buffer object
from some arbitrary domain.
Except for batch buffer reset which requires clearing the whole
matrix, all operations on the matrix are either O(n) or O(1) on the
number of caching domains (which is basically constant).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Probably the most annoying patch to review from the whole series --
Mark every buffer object use as accessed through some caching domain
with the sequence number of the current synchronization section of the
batch. The additional argument of iris_use_pinned_bo() makes sure I'd
have gotten a compile error if I had missed any buffer added to the
batch validation list.
There are only a few exceptions where a buffer is left untracked while
adding it to the validation list, justified below:
- Batch buffers: These are strictly read-only for the moment.
- BLORP buffer objects: Their seqnos are bumped manually at the end
of iris_blorp_exec() instead, in order to avoid plumbing domain
information through BLORP address combining.
- Scratch buffers: The contents of these are strictly thread-local.
- Shader images and SSBOs: Accesses of these buffers are explicitly
synchronized at the API level.
v2: Opt out of tracking more aggressively (Ken): In addition to the
above, surface states, binding tables, instructions and most
dynamic states are now left untracked, which means a *lot* more BO
uses marked IRIS_DOMAIN_NONE which need to be reviewed extremely
carefully, since the cache tracker won't be able to provide any
coherency guarantees for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This introduces some minimalistic infrastructure which will be used in
order to partition the batch into a series of sections, each one with
a unique, monotonically-increasing sequence number. Section
boundaries will typically lie at points in the batch where the
execution and memory coherency status of some previous commands are
known, e.g. at batch buffer boundaries or PIPE_CONTROL commands.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This makes iris_batch_prepare_noop() return a boolean instead of
passing through the relevant set of dirty flags. It will make it
easier to change the representation of dirty flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
Rename iris_seqno to iris_fine_fence, borrowed from si_fine_fence, to
avoid introducing any confusion with any other seqno used for tracking
pipelines.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5233>
A buffer added to all execbufs so that we can attribute a batch that
caused a hang to a particular driver.
v2: Reuse workaround BO
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3203>
We can use seqno as a basic for fast userspace fences: where we can
check a value directly to test for fence completion without having to
query using the kernel. To do so we need to write a breadcrumb from the
batch and track those writes as the basis for our lightweight fences.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
We're going to need it to create a uploader in the batch soon. We still
avoid storing it, to maintain the charade of separation, and make people
think twice about fetching random fields from there and intertwining
things even worse.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
This is just a refcounted wrapper around a drm_syncobj. There is
enough terminology going on in the area of synchronization (sync
objects, sync files, ...) that I'd rather not invent our own.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
Iris allocates gem buffers using buckets of allocation sizes that are
page aligned. We always ask for batch buffers of size BATCH_SZ +
BATCH_RESERVED, which is not page aligned: we ask for 65552 bytes,
which ends up in the bucket of size 81920, resulting in 20% unused
space. Adjust things so there is no waste of space: BATCH_SZ +
BATCH_RESERVED is now 65536.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4561>
For the moment, everything is I915_EXEC_RENDER, so this isn't necessary.
But even should that change, I don't think we want to handle multiple
engines in this manner.
Nowadays, we have batch->name (IRIS_BATCH_RENDER, IRIS_BATCH_COMPUTE,
possibly an IRIS_BATCH_BLIT for blorp batches someday), which describes
the functional usage of the batch. We can simply check that and select
an engine for that class of work (assuming there ever is more than one).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3613>
i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.
iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone, So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.
i965 links against libdrm for drmIoctl, but anv and iris both
re-implement this routine to avoid the dependency.
intel/dev also needs an ioctl wrapper, so lets share the same
implementation everywhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It can be useful to call the decoder on a single batch. But, that batch
may not contain STATE_BASE_ADDRESS, at which point the decoder will have
no idea how to find any buffers. We can initialize the two static bases
at the beginning of time, so it has them even if it never sees SBA.
Surface base address changes dynamically, possibly in the middle of a
batch. So we update it at the start of each batch, making it always
start at the value we inherited from the previous one. SBA commands
inside the batch can update it to a proper value.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
Felix noticed a crash when using INTEL_DEBUG=bat decoding. It turned
out that we were sometimes placing variable length data near the end
of a buffer, and with the decoder guessing random lengths rather than
having an actual count, it was walking off the end and crashing. So
this does more than improve the decoder output.
Unfortunately, this is a bit more complicated than i965's handling,
because we don't have a single state buffer. Various places upload
data via u_upload_mgr, and so there isn't a central place to record
the size. We don't need to catch every single place, however, since
it's only important to record variable length packets (like viewports
and binding tables).
State data also lives arbitrarily long, rather than being discarded on
every batch like i965, so we don't know when to clear out old entries
either. (We also don't have a callback when an upload buffer is
released.) So, this tracking may space leak over time. That's probably
okay though, as this is only a debugging feature and it's a slow leak.
We may also get lucky and overwrite existing entries as we reuse BOs,
though I find this unlikely to happen.
The fact that the decoder works in terms of offsets from a state base
address is also not ideal, as dynamic state base address and surface
state base address differ for iris. However, because dynamic state
addresses start from the top of a 4GB region, and binding tables start
from addresses [0, 64K), it's highly unlikely that we'll get overlap.
We can always improve this, but for now it's better than what we had.
This provides a way for the application to query whether any resets have
happened, which lets us expose "robust" contexts. This also enables the
KHR_robust_buffer_access_behavior tests.
This mechanism lets the driver inform the state tracker about GPU
resets, say for destroying a robust API context and reporting a "device
lost" error to the application, making it take action to deal with this.
The iris batch module now tries to detect that the kernel has banned
our GEM context, creates a new non-banned context, and informs the
iris context module that all assumptions about state are now invalid
and it needs to reinitialize the relevant state.
Based on Chris Wilson's work, but significantly rewritten by me.
Propagate the failure from GEM_EXECBUFFER2, cleanup then report failure
if need be. We retain the current behaviour to abort() at the first sign
of trouble -- for a non-robustness context, arguably this is the right
thing to do as the client cannot recover, and the system state is lost.
How to properly integrate with KHR_robustness and reset-strategy is
left as a future exercise.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Invoking VALGRIND_CHECK_MEM_IS_DEFINED pulls in enough code to convince
gcc to not inline __gen_uint and results in a lot of packing code ending
up out-of-line with lots of stack copying. To ameliorate this, only
insert the check inside the packer if DEBUG is defined and instead
perform the validation checking before submitting the batch to the
kernel. This should give accurate results if --trace-origins=yes is
used, and failing that we can recompile in full debug mode to check on
insertion.
Improve drawoverhead baseline by 25% with a default build with
valgrind-dev installed (with effectively no loss of vg coverage).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add the missing PIPE_CAP_CONTEXT_PRIORITY_MASK and parsing of the context
construction flags.
Testcase: piglit/egl-context-priority
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested by Chris Wilson, if only to make it obvious to the human
readers that these are volatile reads. It may also be necessary for
the compiler in a few cases.
I inherited this from i965. It would be nice to track the state size
so INTEL_DEBUG=color,bat decoding can print the right number of e.g.
binding table entries or blend states, but...without a single point
of entry for state, it's a little tricky to get right. Punt for now,
and drop the dead code in the meantime.
i965 re-emits 3DSTATE_CONSTANT_* on every batch, so there's no point in
restoring the constants from the context. Iris actually re-pins the
constant buffers properly across the batch, and avoids re-emitting the
constant packets unless it's necessary. So, we don't want ISP_DIS.
This is an awkward corner case. We create batches in order, each of
which creates and pins a BO. The other batches may not be set up yet,
so it may not be safe to ask whether they reference a BO.
Just avoid this for now. We could avoid it for other context-local BOs
too, but we currently don't have a flag for that (and I'm not certain
whether it's worth it).
When flushing a batch due to a data dependency, we need to not only
kick off the other batch's work, but stall our execution until it
completes. Just wait on last_syncpt after flushing it.
(adjusted by Ken to make the signalling sync object immediately on
batch reset, rather than batch finish time. this will work better
with deferred flushes...)