In a few cases, we switch to MI_MATH instead of MI_PREDICATE,
just because we were already doing math and it's easier to chain
together.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In the past, each query object had their own BO. Checking if the batch
referenced that BO was an easy way to check if commands were still
queued to compute the query value. If so, we needed to flush.
More recently (c24a574e6c), we started using an u_upload_mgr for query
objects, placing multiple queries in the same BO. One side-effect is
that iris_batch_references is a no longer a reasonable way to check if
commands are still queued for our query. Ours might be done, but a
later query that happens to be in the same BO might be queued. We don't
want to flush in that case.
Instead, check if the current batch's signalling syncpt is the one we
referenced when ending the query. We know the syncpt can't have been
reused because our query is holding a reference, so a simple pointer
comparison should suffice.
Removes all batch flushing caused by query objects in Shadow of Mordor.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
We don't execute any of the commands to record snapshots, so we can't
actually produce a real result. We do however need to avoid waiting
on a syncpt which will never be signalled. So, just return 0.
iris_draw_vbo is divided into two functions to remove unnecessary
operations from the loop. This implementation of ARB_indirect_parameters
takes into account NV_conditional_render by saving MI_PREDICATE_RESULT
at the start of a draw call and restoring it at the end also the result
of NV_conditional_render is taken into account when computing predicates
that limit draw calls for ARB_indirect_parameters in a similar way
to 1952fd8d in ANV.
v2: Optimize indirect draws (suggested by Kenneth Graunke)
v3: (by Kenneth Graunke)
- Fix an issue where indirect draws wouldn't set patch information
before updating the compiled TCS.
- Move some code back to iris_draw_vbo to avoid duplicating it.
- Fix minor indentation issues.
Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to subtract the starting offset from the final offset before
dividing by the stride. See src/intel/vulkan/genX_cmd_buffer.c:3142.
Not known to fix anything.
MI_PREDICATE_DATA is an intermediate storage for the MI_PREDICATE
command's calculations - it holds the result of the subtraction when
the compare operation is SRCS_EQUAL or DELTAS_EQUAL. But the actual
result of the predication is MI_PREDICATE_RESULT, which is what we
want to copy from the render context to the compute context.
This function can be used to stall on the CPU and resolve the predicate
for the conditional render. It will convert ice->state.predicate from
IRIS_PREDICATE_STATE_USE_BIT to either IRIS_PREDICATE_STATE_RENDER or
IRIS_PREDICATE_STATE_DONT_RENDER, depending on the result of the query.
v2:
- return void (Ken)
- update the stored condition (Ken)
- simplify the code leading to resolve the predicate (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested by Chris Wilson, if only to make it obvious to the human
readers that these are volatile reads. It may also be necessary for
the compiler in a few cases.
When switching from bo_wait to sync-points, I missed that we turned an
if (not landed) bo_wait into a while (not landed) check_syncpt(), which
has a timeout of 0. This meant, rather than sleeping until the batch
is complete, we'd busy-loop, continually asking the kernel "is the batch
done yet???". This is not what we want at all - if we wanted a busy
loop, we'd just loop on !snapshots_landed. We want to sleep.
Add an effectively infinite timeout so that we sleep.
Instead of allocating 4K BO per query object, we can create a large blob
of memory and split it into pieces as required.
Having one BO for multiple query objects, we don't want to wait on all
of them, instead when we write last snapshot, we create a sync point, and
check syncpoints while waiting on particular object.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
1. Set a render condition. We emit it immediately on the render
engine, and stash q->bo as ice->state.compute_predicate in case
the compute engine needs it.
2. Clear the render condition. We were incorrectly leaving a stale
compute_predicate kicking around...
3. Dispatch compute. We would then read the stale compute predicate,
and try to load it into MI_PREDICATE_DATA. But q->bo may have been
freed altogether, causing us to try and use garbage memory as a BO,
adding it to the validation list, failing asserts, and tripping
EINVALs in execbuf.
Huge thanks to Mark Janes for narrowing this sporadic GL CTS failure
down to a list of 48 tests I could easily run to reproduce it. Huge
thanks to the Valgrind authors for the memcheck tool that immediately
pinpointed the problem.
I had a hack in place earlier to pass the query type as q->index
for the regular statistics query, but we ended up adjusting the
interface and adding a new query type. Use that instead, fixing
pipeline statistics queries since the rebase.
We were dividing by 4 in calculate_result_on_gpu(), and also in
iris_get_query_result(). We should stop doing the latter, and instead
divide by 4 in calculate_result_on_cpu() as well.
Otherwise, if snapshots were available, and you hit the
calculate_result_on_cpu() path, but requested it be written to a QBO,
you'd fail to get a divide.
otherwise, get results may check q->map->snapshots_landed...before our
commands to initialize it to false have actually executed...so it'd get
some random garbage from the BO...