Commit graph

108449 commits

Author SHA1 Message Date
Timur Kristóf
cacf84ed5f iris: implement clearing render target and depth stencil
v2 (Kenneth Graunke): split color/depthstencil cases, fix iris_clear
2019-02-21 10:26:12 -08:00
Kenneth Graunke
8ab82bd1fd iris: Drop XXX about checking for swizzling
Caio noted that this is not necessary on Gen8+:

   "Before Gen8, there was a historical configuration control field to
    swizzle address bit[6] for in X/Y tiling modes.  This was set in
    three different places: TILECTL[1:0], ARB_MODE[5:4], and
    DISP_ARB_CTL[14:13].  For Gen8 and subsequent generations, the
    swizzle fields are all reserved, and the CPU's memory controller
    performs all address swizzling modifications."

Since we don't support earlier hardware, we can skip it entirely.
2019-02-21 10:26:12 -08:00
Kenneth Graunke
bf23e79629 iris: Set HasWriteableRT correctly
A bit of irritating state cross dependency here, but nothing too hard
2019-02-21 10:26:12 -08:00
Kenneth Graunke
d612cd1bf8 iris: Set 3DSTATE_WM::ForceThreadDispatchEnable
The Vulkan driver only sets this if color writes are disabled, which
is more conservative - but would require us to inspect blend state.

(If color writes are enabled, we don't need to force anything, because
the internal signal is already correct.  But it shouldn't hurt to do so.)
2019-02-21 10:26:12 -08:00
Kenneth Graunke
27d751cdd8 iris: Drop XXX about alpha testing
I was misreading i965 - the 3DSTATE_WM::PixelShaderKillsPixel bit from
Gen < 8 needed all of this, but the 3DSTATE_PS_EXTRA bit only needs
prog_data->uses_kill.
2019-02-21 10:26:12 -08:00
Andre Heider
bffb65d28e iris: improve PIPE_CAP_VIDEO_MEMORY bogus value
-1 is a little too bogus for most games ;)

Signed-off-by: Andre Heider <a.heider@gmail.com>
2019-02-21 10:26:12 -08:00
Andre Heider
f89a578818 iris: fix build with gallium nine
Signed-off-by: Andre Heider <a.heider@gmail.com>
2019-02-21 10:26:12 -08:00
Kenneth Graunke
be49fb051d iris: Stop chopping off the first nine characters of the renderer string 2019-02-21 10:26:12 -08:00
Kenneth Graunke
15341778ba iris: rework num textures to util_lastbit 2019-02-21 10:26:12 -08:00
Kenneth Graunke
974229df46 iris: Add PIPE_CAP_MAX_VARYINGS 2019-02-21 10:26:11 -08:00
Kenneth Graunke
1cd001aa63 iris: Make a iris_batch_reference_signal_syncpt helper function.
Suggested by Chris Wilson.  More obvious what's going on.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
9376799bd6 iris: Use READ_ONCE and WRITE_ONCE for snapshots_landed
Suggested by Chris Wilson, if only to make it obvious to the human
readers that these are volatile reads.  It may also be necessary for
the compiler in a few cases.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
18e31a9b31 iris: Fix accidental busy-looping in query waits
When switching from bo_wait to sync-points, I missed that we turned an
if (not landed) bo_wait into a while (not landed) check_syncpt(), which
has a timeout of 0.  This meant, rather than sleeping until the batch
is complete, we'd busy-loop, continually asking the kernel "is the batch
done yet???".  This is not what we want at all - if we wanted a busy
loop, we'd just loop on !snapshots_landed.  We want to sleep.

Add an effectively infinite timeout so that we sleep.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
3b1ac8244e iris: Add a timeout_nsec parameter, rename check_syncpt to wait_syncpt
I want to be able to wait with a non-zero timeout from elsewhere.
2019-02-21 10:26:11 -08:00
Sagar Ghuge
c24a574e6c iris: Don't allocate a BO per query object
Instead of allocating 4K BO per query object, we can create a large blob
of memory and split it into pieces as required.

Having one BO for multiple query objects, we don't want to wait on all
of them, instead when we write last snapshot, we create a sync point, and
check syncpoints while waiting on particular object.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
a1ebac3750 iris: Implement ALT mode for ARB_{vertex,fragment}_shader
Fixes gl-1.0-spot-light
2019-02-21 10:26:11 -08:00
Kenneth Graunke
732c3a90a4 iris: Fix bug in bound vertex buffer tracking
res might be NULL, at which point this is an unbind.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
4bfd12bbf7 iris: minor tidying 2019-02-21 10:26:11 -08:00
Kenneth Graunke
b1bacbf038 iris: Unreference some more things on state module teardown 2019-02-21 10:26:11 -08:00
Kenneth Graunke
e092ed9213 iris: Drop dead state_size hash table
I inherited this from i965.  It would be nice to track the state size
so INTEL_DEBUG=color,bat decoding can print the right number of e.g.
binding table entries or blend states, but...without a single point
of entry for state, it's a little tricky to get right.  Punt for now,
and drop the dead code in the meantime.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
6e41f1b459 iris: Drop comment about ISP_DIS
i965 re-emits 3DSTATE_CONSTANT_* on every batch, so there's no point in
restoring the constants from the context.  Iris actually re-pins the
constant buffers properly across the batch, and avoids re-emitting the
constant packets unless it's necessary.  So, we don't want ISP_DIS.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
edd3ce5a63 iris: Enable PIPE_CAP_COMPACT_ARRAYS 2019-02-21 10:26:11 -08:00
Kenneth Graunke
1db394f46b iris: Remap stream output indexes back to VARYING_SLOT_*.
Previously I had a hack in st/mesa to make it stop remapping
VARYING_SLOT_* into the naively compacted slots, which aren't
what we want.  But that wasn't very feasible, as we'd have to
update all drivers, or add capability bits, and it gets messy fast.

It turns out that I can map back to VARYING_SLOT_* in about 5 LOC,
so let's just do that.  It removes the need for hacks, and is easy.

This also fixes KHR-GL46.enhanced_layouts.xfb_capture_struct, which
apparently with my hack was still getting the wrong slot info.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
5d3d757178 iris: Zero the compute predicate when changing the render condition
1. Set a render condition.  We emit it immediately on the render
   engine, and stash q->bo as ice->state.compute_predicate in case
   the compute engine needs it.

2. Clear the render condition.  We were incorrectly leaving a stale
   compute_predicate kicking around...

3. Dispatch compute.  We would then read the stale compute predicate,
   and try to load it into MI_PREDICATE_DATA.  But q->bo may have been
   freed altogether, causing us to try and use garbage memory as a BO,
   adding it to the validation list, failing asserts, and tripping
   EINVALs in execbuf.

Huge thanks to Mark Janes for narrowing this sporadic GL CTS failure
down to a list of 48 tests I could easily run to reproduce it.  Huge
thanks to the Valgrind authors for the memcheck tool that immediately
pinpointed the problem.
2019-02-21 10:26:11 -08:00
Caio Marcelo de Oliveira Filho
4fd1f70e62 iris: always include an extra constbuf0 if using UBOs
In st_nir_lower_uniforms_to_ubo() all UBO access in the shader have
its index incremented to open room for uniforms in constbuf0.  So if
we use UBOs, we always need to include the extra binding entry in the
table.

To avoid doing this checks both when compiling the shader and when
assigning binding tables, store the num_cbufs in iris_compiled_shader.

Fixes a bunch of tests from Piglit and CTS that use UBOs but don't use
uniforms or system values.  Note that some tests fitting this criteria
were passing because the UBOs were moved to be push
constants (avoiding the problem).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
4801af2f26 iris: Do binder address allocations per-context, not globally.
iris_bufmgr allocates addresses across the entire screen, since buffers
may be shared between multiple contexts.  There used to be a single
special address, IRIS_BINDER_ADDRESS, that was per-context - and all
contexts used the same address.  When I moved to the multi-binder
system, I made a separate memory zone for them.  I wanted there to be
2-3 binders per context, so we could cycle them to avoid the stalls
inherent in pinning two buffers to the same address in back-to-back
batches.  But I figured I'd allow 100 binders just to be wildly
excessive/cautious.

What I didn't realize was that we need 2-3 binders per *context*,
and what I did was allocate 100 binders per *screen*.  Web browsers,
for example, might have 1-2 contexts per tab, leading to hundreds of
contexts, and thus binders.

To fix this, we stop allocating VMA for binders in bufmgr, and let
the binder handle it itself.  Binders are per-context, and they can
assign context-local addresses for the buffers by simply doing a
ringbuffer style approach.  We only hold on to one binder BO at a
time, so we won't ever have a conflicting address.

This fixes dEQP-EGL.functional.multicontext.non_shared_clear.

Huge thanks to Tapani Pälli for debugging this whole mess and
figuring out what was going wrong.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
0f33204f05 iris: Fix memzone_for_address for the surface and binder zones
We use > for IRIS_MEMZONE_DYNAMIC because IRIS_BORDER_COLOR_POOL_ADDRESS
lives at the very start of that zone.  However, IRIS_MEMZONE_SURFACE and
IRIS_MEMZONE_BINDER are normal zones.  They used to be a single zone
(surface) with a single binder BO at the beginning, similar to the
border color pool.  But when I moved us to multiple binders, I made them
have a real zone (if a small one).  So both zones should use >=.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
3bcb1a7fcd iris: Don't whack SO dirty bits when finishing a BLORP op
Re-emitting 3DSTATE_SO_BUFFERS can be hazardous, as it could zero
offsets.  Plus, it's just not necessary - BLORP doesn't change these.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
b9697dd820 iris: Fix SO issue with INTEL_DEBUG=reemit, set fewer bits
INTEL_DEBUG=reemit was breaking streamout tests, by re-emitting
3DSTATE_SO_BUFFER commands that tell the HW to zero the SO write
offsets.  We would need to alter them to use 0xFFFFFFFF for the offset.

Also, have each upload function only flag bits relevant to its own
pipeline.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
61798e3c88 iris: CS stall on VF cache invalidate workarounds
See commit 31e4c9ce40 in i965.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
c81941f1e7 iris: Pay attention to blit masks
For combined depth/stencil formats, we may want to only blit one half.
If PIPE_BLIT_Z is set, blit depth; if PIPE_BLIT_S is set, blit stencil.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
7837fec740 iris: Assert about blits with color masking
st/mesa never asks for this today, but in theory someone might, and we
don't support it.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
0f677b0d87 iris: Don't enable smooth points when point sprites are enabled
dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_*.primitives.points
2019-02-21 10:26:11 -08:00
Kenneth Graunke
3b336a1513 iris: Allow sample mask of 0
I think this was an attempt to work around various sample mask bugs I
had early on.  It's not correct.  A sample mask of 0 is legal and means
to disable all samples.

Fixes dEQP-GLES31.functional.texture.multisample.*.*sample_mask*
2019-02-21 10:26:11 -08:00
Kenneth Graunke
e17333ea1e iris: fail to create screen for older unsupported HW
loader shouldn't try, but let's be paranoid
2019-02-21 10:26:11 -08:00
Kenneth Graunke
1f91f688e8 iris: Switch to the new PIPELINE_STATISTICS_QUERY_SINGLE capability
I had a hack in place earlier to pass the query type as q->index
for the regular statistics query, but we ended up adjusting the
interface and adding a new query type.  Use that instead, fixing
pipeline statistics queries since the rebase.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
a23c06cabc iris: Use new PIPE_STAT_QUERY enums rather than hardcoded numbers. 2019-02-21 10:26:11 -08:00
Kenneth Graunke
5aef30b886 iris: Fix Broadwell WaDividePSInvocationCountBy4
We were dividing by 4 in calculate_result_on_gpu(), and also in
iris_get_query_result().  We should stop doing the latter, and instead
divide by 4 in calculate_result_on_cpu() as well.

Otherwise, if snapshots were available, and you hit the
calculate_result_on_cpu() path, but requested it be written to a QBO,
you'd fail to get a divide.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
7f318bf2ac iris: Delete genx->bound_vertex_buffers
This is actually stored in ice->state, as it isn't gen-specific
2019-02-21 10:26:11 -08:00
Kenneth Graunke
02991e2878 iris: Drop a dead comment 2019-02-21 10:26:11 -08:00
Kenneth Graunke
572fad1e84 iris: Don't check other batches for our batch BO
This is an awkward corner case.  We create batches in order, each of
which creates and pins a BO.  The other batches may not be set up yet,
so it may not be safe to ask whether they reference a BO.

Just avoid this for now.  We could avoid it for other context-local BOs
too, but we currently don't have a flag for that (and I'm not certain
whether it's worth it).
2019-02-21 10:26:11 -08:00
Kenneth Graunke
8eda6f2288 iris: Handle PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE somewhat
Various places in the transfer code need to know whether they must
read the existing resource's values.  Rather than checking both flags
everywhere, just make PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE also flag
PIPE_TRANSFER_DISCARD_RANGE - if we can discard everything, we can
discard a subrange, too.

Obviously, we can do better for PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE,
but eventually u_threaded_context should handle swapping out buffers
for new idle buffers, anyway.  In the meantime, this is at least better.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
bacc722d13 iris: Flush the render cache in flush_and_dirty_for_history
BLORP uses the render engine to write to buffers, and we need to flush
that data out to the actual surface (finishing the write).  Then, the
rest of this function invalidates any caches that might have stale data
which needs to be refetched.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
7a9e87c224 iris: Implement multi-slice copy_region
I don't know if this is required - surprisingly, I haven't seen it
matter - but I'd like to use it for multi-slice transfer maps.  We may
as well do the right thing.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
307f3f9924 iris: Leave a comment about why Broadwell images are broken
There are a variety of ways to fix this, many of which are simple, but
I could use some advice on which ones other people prefer, and so we'll
punt until after the holidays.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
7ed1383c0a iris: Fix surface states for Gen8 lowered-to-untype images
We have to use SURFTYPE_BUFFER and ISL_FORMAT_RAW for these.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
477e7d575b iris: Fill out brw_image_params for storage images on Broadwell 2019-02-21 10:26:11 -08:00
Kenneth Graunke
7e35333c73 iris: Don't make duplicate system values
We were relying on CSE/GVN/etc to coalesce all intrinsics that load the
same value, but that's a bad idea.  We might have a couple intrinsics
that reload the same value.  If so, we only want to set up the uniform
on the first one we see.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
bc3bb28645 iris: Don't enable push constants just because there are system values
System values are built-in uniforms.  We set them up as UBO values, and
might pull or push them.  UBO push analysis will take care of that.  We
only want to enable push constants if there's an actual range being
pushed.  Otherwise, we might get into a scenario where 3DSTATE_PS
enables push constants but 3DSTATE_CONSTANT_PS isn't pushing anything.

This fixes GPU hangs in Broadwell image load store tests which have
unused image param system values but no other uniforms.  (We shouldn't
be making those anyway, but that's a separate fix...)
2019-02-21 10:26:11 -08:00
Kenneth Graunke
2ca0d913ea iris: Fix framebuffer layer count
cso_fb->layers is only valid for no-attachment framebuffers.  Use the
helper function to get the real value, then stash it so we don't have
to call the helper function on the old value for comparison, or at draw
time for Force Zero RTA Index setting.

This fixes Force Zero RTA Index being set even when attempting layered
rendering.
2019-02-21 10:26:11 -08:00