The brw_wm_prog_data_dispatch_grf_start_reg and _prog_offset helpers
read the _NPixelDispatchEnable fields from 3DSTATE_PS to figure out
which bits to pull out of the prog data and stuff where. Therefore,
they need to be called with the final set of _NPixelDispatchEnable bits
after we've done the workaround for SIMD32 and 16x MSAA. Otherwise, if
you end up with a somewhat odd combination of enables, the GRF start reg
and KSP data ends up in the wrong slots. In particular, running
SIMD32-only is broken but several other combinations are as well.
Fixes: 5445c176e2 "iris: Disable SIMD32 when using a 16x MSAA..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This exposes the textureSamplesIdenticalEXT function in GLSL.
We enable it for iris and radeonsi, because their compilers already
have support for this. Tested on Intel Kabylake and AMD Vega 64.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
i965 links against libdrm for drmIoctl, but anv and iris both
re-implement this routine to avoid the dependency.
intel/dev also needs an ioctl wrapper, so lets share the same
implementation everywhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were emitting 3DSTATE_INDEX_BUFFER on every indexed draw, even if
back-to-back draws referred to the same index buffer. This improves
drawoverhead scores in the DrawElements cases by about 10%, by giving
us even more minimal batches.
Often times, the depth buffer is entirely disabled, but color render
targets change. For example, GenerateMipmaps will change the color
render target for each miplevel, but there is no depth buffer.
In the Civilization VI benchmark, this drops the median number of
3DSTATE_DEPTH_BUFFER etc. packets emitted per frame from 472 to 34.
We accidentally started copying a full 64-bit value rather than copying
a 32-bit offset and zeroing the top 32-bits. This caused us to compute
bogus vertex counts which could lead to GPU hangs in some cases.
Thanks to Clayton Craft for catching the regressions!
Fixes: 0e24d10ff5 ("iris: Use gen_mi_builder to handle CS ALU operations.")
It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input. It also makes zero sense because we have to
special-case it in the back-end.
Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a few cases, we switch to MI_MATH instead of MI_PREDICATE,
just because we were already doing math and it's easier to chain
together.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This will let us put the genxml boilerplate in one place, before we
expand genxml to more files shortly. Like i965/genX_boilerplate.h.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This lets us specify the prototypes once, instead of cut and pasting
them per generation. isl uses a similar approach (isl_genX_priv.h).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API. However, all GL requires is that
it's a uniform. Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
st_extensions.c sets const->MaxImageSamples (GL_MAX_IMAGE_SAMPLES) by
looping over [16, 15, .. 1x] MSAA modes, and RGBA/BGRA/ARGB/ABGR 8888
color formats, calling pipe->is_format_supported() for each, with
the usage set to PIPE_BIND_SHADER_IMAGE. If any are supported, it
selects that number of samples.
We were checking if sample_count <= 1, which meant that we were getting
a value of 1x MSAA, rather than the expected 0x (feature doesn't exist).
But, only on Icelake because Gen11 adds support for typed read messages
for R8G8B8A8_UNORM. The lack of typed read messages for these formats
was tricking the check on Gen9 to say no correctly. This caused some
Icelake conformance failures, because we don't implement this feature.
Just check for sample_count == 0 instead.
Until now we only supported fast clear colors on the first miplevel and
layer. The main reason for it is that we can't have different fast clear
values at different levels/layers, since the surface state only supports
one clear value.
We can, however, enable it if we make sure we only use the same value
for all levels/layers, and if one of them changes, we resolve all the
others. We already do that for depth fast clears so hopefully it will be
fine for color fast clears too.
v2: Add check for partial clear too (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It can be useful to call the decoder on a single batch. But, that batch
may not contain STATE_BASE_ADDRESS, at which point the decoder will have
no idea how to find any buffers. We can initialize the two static bases
at the beginning of time, so it has them even if it never sees SBA.
Surface base address changes dynamically, possibly in the middle of a
batch. So we update it at the start of each batch, making it always
start at the value we inherited from the previous one. SBA commands
inside the batch can update it to a proper value.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We were failing to flag the program dirty when it changed. Also, we
were unnecessarily setting key->input_vertices for SINGLE_PATCH mode,
which would reduce program cache hits. Only set it if needed.
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data. I'd like to add another thing or two and need a
place to put it. This commit adds a new brw_base_prog_key struct which
contains those two common bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
PIPE_CAP_SM3 has always been an odd one out of all our caps. While most
other caps are fine-grained and single-purpose, this cap encode several
features in one. And since OpenGL cares more about single features, it'd
be nice to get rid of this one.
As it turns, this is now relatively simple. We only really care about
three features using this cap, and those already got their own caps. So
we can remove it, and make sure all current drivers just give the same
response to all of them.
The only place we *really* care about SM3 is in nine, and there we can
instead just re-construct the information based on the finer-grained
caps. This avoids DX9 semantics from needlessly leaking into all of the
drivers, most of who doesn't care a whole lot about DX9 specifically.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
this can fail unexpectedly due to bugs, so it's good to provide feedback
when this occurs
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
There will unfortunately be circumstances where we cannot re-use a
virtual memory address until it's no longer active on the GPU. To
facilitate this, we instead move BOs to a "dead" list, and defer
closing them and returning their VMA until they are idle. We
periodically sweep these away in cleanup_bo_cache, which triggers
every time a new object's refcount hits zero.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
In the future, some images will need to be aligned to a larger value
than 4096. Most buffers, however, don't have any such requirement,
so for now we only add the parameter to iris_bo_alloc_tiled() and
leave the others with the simpler interface.
v2: Fix missing alignment in vma_alloc, caught by Caio!
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
If our resource_copy_region size is a small number of DWords, then
instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
a CS stall). We also try and select the optimal batch.
Improves performance in Shadow of Mordor on Low settings at 1920x1080
on Skylake GT4e by 0.689096% +/- 0.473968% (n=4). It tries to copy
4 bytes of data to a buffer which was most recently used as a writable
compute shader SSBO. Previously we were switching from compute to the
render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.
I arbitrarily decided to support 4/8/12/16 bytes. Jason thinks this
is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.
We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.
This reverts commit 9c421d6b47.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were failing to pipe_resource_unreference on the failure path due
to a non-renderable format. Instead of fixing this, just move the
checks earlier, before we even bother with refcounting or calloc.
We were failing to unreference the old image resource. Instead of open
coding this and doing it badly, just use the copier function which does
the right thing.
Prepare for a bug fix by adding and using helpers which convert
isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of
surface elements.
v2:
- Update iris (Ken).
- Update anv.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>