This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode. Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle. In SPIR-V, signed min/max are separate
opcodes from unsigned.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Create separate SURFACE_STATE for render target read in order to support
non coherent framebuffer fetch on broadwell.
Also we need to resolve framebuffer in order to support CCS_D.
v2: Add outputs_read check (Kenneth Graunke)
v3: 1) Import Curro's comment from get_isl_surf
2) Rename get_isl_surf method
3) Clean up allocation in case of failure
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used in next patches for supporting non coherent
framebuffer fetch on Broadwell.
v2: Fix comment (Kenneth Graunke)
v3: 1) Fix a few nits (Caio)
2) Add comment (Caio)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The DRI interface for modifiers with aux data treats the aux data as a
separate plane of the main surface.
When the dri layer requests the plane associated with the aux data, we
save the required information into the dri aux plane image.
Later when the image is used, the dri plane image will be available in
the pipe_resource structure's `next` field. Therefore in iris, we
reconstruct the aux setup from this separate dri plane image when the
image is used.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reworks:
* If the aux-state is not ISL_AUX_STATE_AUX_INVALID, then use memset
even when memset_value is zero. The hiz buffer initial aux-state
will be set to invalid, and therefore we can skip the memset. But,
for CCS it will be set to ISL_AUX_STATE_PASS_THROUGH, and therefore
the aux data must be cleared to 0 with the memset. Previously we
would use BO_ALLOC_ZEROED with the CCS aux data, so this memset
wasn't required. Now, the CCS aux data may be part of the main
surface. We prefer to not use BO_ALLOC_ZEROED excessively, so the
memset is needed for the CCS case. (Nanley)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is not currently required because the hiz buffer is in a separate
buffer, and therefore the offset is 0. If we combine the aux buffer
with the main surface buffer, then the hiz offset may become non-zero.
Suggested-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.
v2: Don't need to set the mask - it's mbo (Ken).
v3: Don't keep a reference to the resource used for emitting the table
(Ken).
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct. This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is the first call that provides the iris context to the monitor
implementation. On the first call, use the iris context to initialize
the monitor context.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this commit, Iris will report that AMD_performance_monitor is
supported, and will allow the caller to query the available metrics.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v5: add patch
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The perf subsystem needs several macro definitions that were
duplicated in Iris and i965 headers. Place these macros within perf,
if the perf implementation contains the only references to the values.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The DBG marco in brw_blorp.c ends up calling an android log function:
error: undefined reference to '__android_log_print'
v2: On suggestion from Lionel, hang the Android dependency onto a new
libintel_common dependency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Iris advertises support for PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION
so let's actually implement it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110657
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A while ago, we started deferring GEM object closure and VMA release
until buffers were idle. This had some unforeseen interactions with
external buffers.
We keep imported buffers in hash tables, so if we have repeated imports
of the same GEM object, we map those to the same iris_bo structure.
This is critical for several reasons. Unfortunately, we broke this
assumption. When freeing a non-idle external buffer, we would drop it
from the hash tables, then move it to the zombie list. If someone
reimported the same GEM object, we would not find it in the hash tables,
and go ahead and make a second iris_bo for that GEM object. But the old
iris_bo would still be in the zombie list, and so we would eventually
call GEM_CLOSE on it - closing a BO that should have still been live.
To work around this, we defer removing a BO from the hash tables until
it's actually fully closed. This has the strange effect that an
external BO may be on the zombie list, and yet be resurrected before
it can be properly cleaned up. In this case, we remove it from the
list so it won't be freed.
Fixes severe instability in Weston, which was hitting EINVALs and
ENOENTs from execbuf2, due to batches referring to a GEM object that
had been closed, or at least had its VMA torched.
Fixes: 457a55716e ("iris: Defer closing and freeing VMA until buffers are idle.")
Everybody importing an external buffer was looking it up in the hash
table, then referencing it. We can just do that in the helper instead,
which also gives us a convenient spot to stash extra code shortly.
The brw_wm_prog_data_dispatch_grf_start_reg and _prog_offset helpers
read the _NPixelDispatchEnable fields from 3DSTATE_PS to figure out
which bits to pull out of the prog data and stuff where. Therefore,
they need to be called with the final set of _NPixelDispatchEnable bits
after we've done the workaround for SIMD32 and 16x MSAA. Otherwise, if
you end up with a somewhat odd combination of enables, the GRF start reg
and KSP data ends up in the wrong slots. In particular, running
SIMD32-only is broken but several other combinations are as well.
Fixes: 5445c176e2 "iris: Disable SIMD32 when using a 16x MSAA..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This exposes the textureSamplesIdenticalEXT function in GLSL.
We enable it for iris and radeonsi, because their compilers already
have support for this. Tested on Intel Kabylake and AMD Vega 64.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
i965 links against libdrm for drmIoctl, but anv and iris both
re-implement this routine to avoid the dependency.
intel/dev also needs an ioctl wrapper, so lets share the same
implementation everywhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were emitting 3DSTATE_INDEX_BUFFER on every indexed draw, even if
back-to-back draws referred to the same index buffer. This improves
drawoverhead scores in the DrawElements cases by about 10%, by giving
us even more minimal batches.
Often times, the depth buffer is entirely disabled, but color render
targets change. For example, GenerateMipmaps will change the color
render target for each miplevel, but there is no depth buffer.
In the Civilization VI benchmark, this drops the median number of
3DSTATE_DEPTH_BUFFER etc. packets emitted per frame from 472 to 34.
We accidentally started copying a full 64-bit value rather than copying
a 32-bit offset and zeroing the top 32-bits. This caused us to compute
bogus vertex counts which could lead to GPU hangs in some cases.
Thanks to Clayton Craft for catching the regressions!
Fixes: 0e24d10ff5 ("iris: Use gen_mi_builder to handle CS ALU operations.")