This is genxml, we can compile out this code.
Fixes: 2660667284 ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Create separate SURFACE_STATE for render target read in order to support
non coherent framebuffer fetch on broadwell.
Also we need to resolve framebuffer in order to support CCS_D.
v2: Add outputs_read check (Kenneth Graunke)
v3: 1) Import Curro's comment from get_isl_surf
2) Rename get_isl_surf method
3) Clean up allocation in case of failure
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The DRI interface for modifiers with aux data treats the aux data as a
separate plane of the main surface.
When the dri layer requests the plane associated with the aux data, we
save the required information into the dri aux plane image.
Later when the image is used, the dri plane image will be available in
the pipe_resource structure's `next` field. Therefore in iris, we
reconstruct the aux setup from this separate dri plane image when the
image is used.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is not currently required because the hiz buffer is in a separate
buffer, and therefore the offset is 0. If we combine the aux buffer
with the main surface buffer, then the hiz offset may become non-zero.
Suggested-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.
v2: Don't need to set the mask - it's mbo (Ken).
v3: Don't keep a reference to the resource used for emitting the table
(Ken).
With this commit, Iris will report that AMD_performance_monitor is
supported, and will allow the caller to query the available metrics.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Iris advertises support for PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION
so let's actually implement it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110657
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The brw_wm_prog_data_dispatch_grf_start_reg and _prog_offset helpers
read the _NPixelDispatchEnable fields from 3DSTATE_PS to figure out
which bits to pull out of the prog data and stuff where. Therefore,
they need to be called with the final set of _NPixelDispatchEnable bits
after we've done the workaround for SIMD32 and 16x MSAA. Otherwise, if
you end up with a somewhat odd combination of enables, the GRF start reg
and KSP data ends up in the wrong slots. In particular, running
SIMD32-only is broken but several other combinations are as well.
Fixes: 5445c176e2 "iris: Disable SIMD32 when using a 16x MSAA..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were emitting 3DSTATE_INDEX_BUFFER on every indexed draw, even if
back-to-back draws referred to the same index buffer. This improves
drawoverhead scores in the DrawElements cases by about 10%, by giving
us even more minimal batches.
Often times, the depth buffer is entirely disabled, but color render
targets change. For example, GenerateMipmaps will change the color
render target for each miplevel, but there is no depth buffer.
In the Civilization VI benchmark, this drops the median number of
3DSTATE_DEPTH_BUFFER etc. packets emitted per frame from 472 to 34.
We accidentally started copying a full 64-bit value rather than copying
a 32-bit offset and zeroing the top 32-bits. This caused us to compute
bogus vertex counts which could lead to GPU hangs in some cases.
Thanks to Clayton Craft for catching the regressions!
Fixes: 0e24d10ff5 ("iris: Use gen_mi_builder to handle CS ALU operations.")
In a few cases, we switch to MI_MATH instead of MI_PREDICATE,
just because we were already doing math and it's easier to chain
together.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This will let us put the genxml boilerplate in one place, before we
expand genxml to more files shortly. Like i965/genX_boilerplate.h.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.
We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.
This reverts commit 9c421d6b47.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were failing to pipe_resource_unreference on the failure path due
to a non-renderable format. Instead of fixing this, just move the
checks earlier, before we even bother with refcounting or calloc.
We were failing to unreference the old image resource. Instead of open
coding this and doing it badly, just use the copier function which does
the right thing.
Prepare for a bug fix by adding and using helpers which convert
isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of
surface elements.
v2:
- Update iris (Ken).
- Update anv.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Today, we stream the compute shader thread IDs simply because they're
(annoyingly) relative to dynamic state base address. We could upload
them once at compile time, but we'd need a separate non-streaming
uploader for IRIS_MEMZONE_DYNAMIC, and I'm not sure it's worth it.
stream_state pins the buffer for use in the current batch, but also
returns a reference to the pipe_resource. We dropped this reference
on the floor, leaking a reference basically every time we dispatched
a compute shader after switching to a new one.
The reason it returns a reference is so that we can hold on to it and
re-pin it in iris_restore_compute_saved_bos, which we were also failing
to do. So if we actually filled up a batch with repeated dispatches to
the same compute shader, and flushed, then continued dispatching, we
would fail to pin it and likely GPU hang.
We were unconditionally uploading the new data, but then conditionally
using it with MEDIA_CURBE_LOAD. If we're not going to emit the command,
there's no point in uploading the data.
We only use push the compute shader thread IDs, not any actual constant
buffer data. So we should track the compute shader variant changing,
not constbuf changes.
MEDIA_INTERFACE_DESCRIPTOR's Interface Descriptor Data Start Address
field's docs say: "This bit specifies the 64-byte aligned address..."
And we were doing 32. Superfluous thread ID uploading was apparently
saving us from GPU hangs in most cases.
This commit moves the sysvals to a separate, new constant buffer
at the end (before the shader constants). It also allows us to
remove the special handling we had for cbuf0, and enables all
constant buffers to support user-specified resources and user
buffers.
v2: (by Kenneth Graunke)
- Rebase on the previous patch to fix system value uploading.
- Fix disk cache num_cbufs calculation
- Fix passthrough TCS to report num_cbufs = 1 so upload actually occurs
- Change upload_sysvals to assert that num_cbufs > 0 when
num_system_values > 0.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I neglected to mark cbuf0_needs_upload = false after uploading it.
The obvious fix regressed user clip plane tests, because of a second
bug: we also forgot to mark that they may need re-uploading when
changing shader programs (which may have more or less system values).
Thanks to Timur Kristóf for catching the original issue.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Most vertex data lives in user VBOs in IRIS_MEMZONE_OTHER, which
typically have high bits set to 0xffff. The shader draw parameters were
being uploaded in IRIS_MEMZONE_DYNAMIC, which have high bets set to 0x2.
This was causing a lot of ping-ponging of high bits, leading to
unnecessary VF cache flushing.
Cuts 7.2% of the flushes in the Civizilation VI demo on Kabylake GT2.
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
Don't have a separate mechanism for NIR constants to be removed from
the table. If unused, we will compact it away. The use_null_surface
is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change the iris_binding_table to keep track of what surfaces are
actually going to be used, then assign binding table indices just for
those. Reducing unused bytes on those are valuable because we use a
reduced space for those tables in Iris.
The rest of the driver can go from "group indices" (i.e. UBO #2) to
BTI and vice-versa using helper functions. The value
IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is
not used or a certain BTI is not valid.
The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be
set to skip compacting binding table.
v2: (all from Ken)
Use BITFIELD64_MASK helper. Improve comments.
Assert all group is marked as used when we have indirects.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Stop using brw_compiler to lower the final binding table indices for
surface access. This is done by simply not setting the
'prog_data->binding_table.*_start' fields. Then make the driver
perform this lowering.
This is a better place to perfom the binding table assignments, since
the driver has more information and will also later consume those
assignments to upload resources.
This also prepares us for two changes: use ibc without having to
implement binding table logic there; and remove unused entries from
the binding table.
Since the `block` field in brw_ubo_range now refers to the final
binding table index, we need to adjust it before using to index
shs->constbuf.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>