CCS_E can fall back to CCS_D with incompatible format views
CCS_D is pretty useless without fast clears and we may as well use NONE,
but we're surely going to hook those up at some point, so may as well
just go ahead and do it now...
We can safely assume that the given resource is depth, depth/stencil,
or stencil already. The stencil-only case is easily detectable with
a single format check, and all other cases are handled identically.
This saves some CPU overhead.
This just moves the code for dealing with pipe_shader_state /
pipe_compute_state / iris_uncompiled_shader to the end of the file.
Now that those do precompiles, they want to call the actual compile
functions. Putting them at the end eliminates the need for a bunch
of prototypes.
Caio noted that this is not necessary on Gen8+:
"Before Gen8, there was a historical configuration control field to
swizzle address bit[6] for in X/Y tiling modes. This was set in
three different places: TILECTL[1:0], ARB_MODE[5:4], and
DISP_ARB_CTL[14:13]. For Gen8 and subsequent generations, the
swizzle fields are all reserved, and the CPU's memory controller
performs all address swizzling modifications."
Since we don't support earlier hardware, we can skip it entirely.
The Vulkan driver only sets this if color writes are disabled, which
is more conservative - but would require us to inspect blend state.
(If color writes are enabled, we don't need to force anything, because
the internal signal is already correct. But it shouldn't hurt to do so.)
I was misreading i965 - the 3DSTATE_WM::PixelShaderKillsPixel bit from
Gen < 8 needed all of this, but the 3DSTATE_PS_EXTRA bit only needs
prog_data->uses_kill.
Suggested by Chris Wilson, if only to make it obvious to the human
readers that these are volatile reads. It may also be necessary for
the compiler in a few cases.
When switching from bo_wait to sync-points, I missed that we turned an
if (not landed) bo_wait into a while (not landed) check_syncpt(), which
has a timeout of 0. This meant, rather than sleeping until the batch
is complete, we'd busy-loop, continually asking the kernel "is the batch
done yet???". This is not what we want at all - if we wanted a busy
loop, we'd just loop on !snapshots_landed. We want to sleep.
Add an effectively infinite timeout so that we sleep.
Instead of allocating 4K BO per query object, we can create a large blob
of memory and split it into pieces as required.
Having one BO for multiple query objects, we don't want to wait on all
of them, instead when we write last snapshot, we create a sync point, and
check syncpoints while waiting on particular object.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>