Commit graph

1089 commits

Author SHA1 Message Date
Jason Ekstrand
aebca3961b iris: Fix handling of SIMD32 fragment shaders
The brw_wm_prog_data_dispatch_grf_start_reg and _prog_offset helpers
read the _NPixelDispatchEnable fields from 3DSTATE_PS to figure out
which bits to pull out of the prog data and stuff where.  Therefore,
they need to be called with the final set of _NPixelDispatchEnable bits
after we've done the workaround for SIMD32 and 16x MSAA.  Otherwise, if
you end up with a somewhat odd combination of enables, the GRF start reg
and KSP data ends up in the wrong slots.  In particular, running
SIMD32-only is broken but several other combinations are as well.

Fixes: 5445c176e2 "iris: Disable SIMD32 when using a 16x MSAA..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-03 22:24:40 +00:00
Timothy Arceri
06ec14d692 iris: bump compat profile support to 4.6
All of the current piglit compat profile tests pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-02 18:56:53 +10:00
Kenneth Graunke
18c2e09dc7 gallium: Implement GL_EXT_shader_samples_identical via a new capability
This exposes the textureSamplesIdenticalEXT function in GLSL.

We enable it for iris and radeonsi, because their compilers already
have support for this.  Tested on Intel Kabylake and AMD Vega 64.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 23:38:54 -07:00
Mark Janes
49465f1330 iris/screen: use initialization routine for gen_device_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:48 -07:00
Mark Janes
7852fe5415 intel/common: provide common ioctl routine
i965 links against libdrm for drmIoctl, but anv and iris both
re-implement this routine to avoid the dependency.

intel/dev also needs an ioctl wrapper, so lets share the same
implementation everywhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:38:40 -07:00
Timothy Arceri
2afedfaf9a iris: add support for gl_ClipVertex in tess eval shaders
Required for OpenGL compat support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-01 16:12:37 -07:00
Timothy Arceri
00b5bf2d72 iris: add support for gl_ClipVertex in geometry shaders
This will enable us to support the OpenGL compat profile.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-01 16:12:27 -07:00
Kenneth Graunke
b61f17d362 iris: Skip emitting 3DSTATE_INDEX_BUFFER if possible
We were emitting 3DSTATE_INDEX_BUFFER on every indexed draw, even if
back-to-back draws referred to the same index buffer.  This improves
drawoverhead scores in the DrawElements cases by about 10%, by giving
us even more minimal batches.
2019-07-31 15:14:10 -07:00
Kenneth Graunke
3a22a8bf49 iris: Skip repeated depth buffer disables.
Often times, the depth buffer is entirely disabled, but color render
targets change.  For example, GenerateMipmaps will change the color
render target for each miplevel, but there is no depth buffer.

In the Civilization VI benchmark, this drops the median number of
3DSTATE_DEPTH_BUFFER etc. packets emitted per frame from 472 to 34.
2019-07-30 19:47:41 -07:00
Sagar Ghuge
587a497529 iris: Enable EXT_texture_shadow_lod
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Kenneth Graunke
44e713eddb iris: Fix SO offset to be 32-bit in DrawTransformFeedback handling
We accidentally started copying a full 64-bit value rather than copying
a 32-bit offset and zeroing the top 32-bits.  This caused us to compute
bogus vertex counts which could lead to GPU hangs in some cases.

Thanks to Clayton Craft for catching the regressions!

Fixes: 0e24d10ff5 ("iris: Use gen_mi_builder to handle CS ALU operations.")
2019-07-29 16:38:19 -07:00
Jason Ekstrand
4bb6e6817e intel: Use a system value for gl_FragCoord
It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input.  It also makes zero sense because we have to
special-case it in the back-end.

Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 23:30:26 +00:00
Kenneth Graunke
0e24d10ff5 iris: Use gen_mi_builder to handle CS ALU operations.
In a few cases, we switch to MI_MATH instead of MI_PREDICATE,
just because we were already doing math and it's easier to chain
together.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
fe7ed6b057 iris: Make iris_query.c a genxml-compiled file.
This will let us use Jason's new MI-builder shortly.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
975f7e4a59 iris: Move iris_resolve_conditional_render to the vtable.
It's going to be in genxml code shortly.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
6c4c7b600d iris: Refactor genxml macros and inlines into iris_genx_macros.h.
This will let us put the genxml boilerplate in one place, before we
expand genxml to more files shortly.  Like i965/genX_boilerplate.h.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
204a3bb816 iris: Make an iris_genx_protos.h header for prototypes.
This lets us specify the prototypes once, instead of cut and pasting
them per generation.  isl uses a similar approach (isl_genX_priv.h).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Jason Ekstrand
c84b8eeeac intel/compiler: Be more conservative about subgroup sizes in GL
The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API.  However, all GL requires is that
it's a uniform.  Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Ilia Mirkin
0e30c6b8a7 gallium: switch boolean -> bool at the interface definitions
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).

This has been build-tested on amd64 with:

Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
                 vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st:      mesa xa xvmc xvmc vdpau va

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 22:13:51 -04:00
Kenneth Graunke
7cdde962c5 iris: Support storage images that have matching typed formats for reads
Even if we don't directly support typed reads on a format, we can often
translate them to a reasonable matching format.  Advertise those too.
2019-07-22 17:30:13 -07:00
Kenneth Graunke
2f1c7fae9e iris: Stop advertising MSAA storage images by mistake
st_extensions.c sets const->MaxImageSamples (GL_MAX_IMAGE_SAMPLES) by
looping over [16, 15, .. 1x] MSAA modes, and RGBA/BGRA/ARGB/ABGR 8888
color formats, calling pipe->is_format_supported() for each, with
the usage set to PIPE_BIND_SHADER_IMAGE.  If any are supported, it
selects that number of samples.

We were checking if sample_count <= 1, which meant that we were getting
a value of 1x MSAA, rather than the expected 0x (feature doesn't exist).

But, only on Icelake because Gen11 adds support for typed read messages
for R8G8B8A8_UNORM.  The lack of typed read messages for these formats
was tricking the check on Gen9 to say no correctly.  This caused some
Icelake conformance failures, because we don't implement this feature.

Just check for sample_count == 0 instead.
2019-07-22 17:30:13 -07:00
Timothy Arceri
80c2c17e1e iris: change last_vue_stage() to look at uncompiled shaders
This allows us to find the last vue stage before we have compiled
the shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-19 09:25:47 +10:00
Rafael Antognolli
393f659ed8 iris: Enable fast clears on other miplevels and layers than 0.
Until now we only supported fast clear colors on the first miplevel and
layer. The main reason for it is that we can't have different fast clear
values at different levels/layers, since the surface state only supports
one clear value.

We can, however, enable it if we make sure we only use the same value
for all levels/layers, and if one of them changes, we resolve all the
others. We already do that for depth fast clears so hopefully it will be
fine for color fast clears too.

v2: Add check for partial clear too (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-17 14:53:37 -07:00
Rafael Antognolli
8bbd4f32bf iris: Allow resolving clear color of CCS_D surfaces.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-17 14:53:16 -07:00
Kenneth Graunke
df4c2ec5e1 iris: Make iris_has_color_unresolved non-static
We want to use this in the transfer code and possibly for fast clears.
2019-07-17 13:43:04 -07:00
Kenneth Graunke
1d5ee31553 iris: Drop copy and pasted iris_timebase_scale
Lionel moved brw_timebase_scale to gen_device_info_timebase_scale a few
months ago, so we should just use that, and not our own copy in iris.
2019-07-16 17:22:48 -07:00
Kenneth Graunke
5e76c99923 iris: Better handle decoder base addresses
It can be useful to call the decoder on a single batch.  But, that batch
may not contain STATE_BASE_ADDRESS, at which point the decoder will have
no idea how to find any buffers.  We can initialize the two static bases
at the beginning of time, so it has them even if it never sees SBA.

Surface base address changes dynamically, possibly in the middle of a
batch.  So we update it at the start of each batch, making it always
start at the value we inherited from the previous one.  SBA commands
inside the batch can update it to a proper value.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-15 11:49:19 -07:00
Kenneth Graunke
712ac83033 iris: Simplify devinfo access in calculate_result_on_gpu()
We have devinfo, no need for screen->devinfo.
2019-07-12 00:33:19 -07:00
Kenneth Graunke
5445c176e2 iris: Disable SIMD32 when using a 16x MSAA framebuffer.
We weren't doing this documented workaround because it's sorta painful.
2019-07-11 11:34:21 -07:00
Kenneth Graunke
a01770b9c8 iris: Fix key->input_vertices for 8_PATCH TCS mode.
We were failing to flag the program dirty when it changed.  Also, we
were unnecessarily setting key->input_vertices for SINGLE_PATCH mode,
which would reduce program cache hits.  Only set it if needed.
2019-07-11 01:18:24 -07:00
Kenneth Graunke
c58f52f0ef iris: Only set key->flat_shade if COL0/COL1 are written.
This was just laziness on my part, we already added similar checks in
the VS key handling.  Just need to do it here too.  Should improve cache
hits.
2019-07-11 00:12:50 -07:00
Kenneth Graunke
cb82d534a0 iris: Drop comment about var->data.binding not being set.
I refactored the sampler lowering passes a long time ago to ensure
that gl_nir_lower_samplers_as_deref is run and var->data.binding is set.
2019-07-11 00:12:00 -07:00
Kenneth Graunke
38f9954208 iris: Drop comments about missing NOS
These stages don't need NOS.  If they do, we can add it - the
infrastructure is there if we need it someday.
2019-07-11 00:12:00 -07:00
Kenneth Graunke
2bd1234a77 iris: Drop a TODO comment
This is literally implemented two lines above.
2019-07-11 00:12:00 -07:00
Jason Ekstrand
14781e2122 intel/compiler: Add a "base class" for program keys
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data.  I'd like to add another thing or two and need a
place to put it.  This commit adds a new brw_base_prog_key struct which
contains those two common bits.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-10 19:35:55 +00:00
Erik Faye-Lund
39e7fbf24a gallium: get rid of PIPE_CAP_SM3
PIPE_CAP_SM3 has always been an odd one out of all our caps. While most
other caps are fine-grained and single-purpose, this cap encode several
features in one. And since OpenGL cares more about single features, it'd
be nice to get rid of this one.

As it turns, this is now relatively simple. We only really care about
three features using this cap, and those already got their own caps. So
we can remove it, and make sure all current drivers just give the same
response to all of them.

The only place we *really* care about SM3 is in nine, and there we can
instead just re-construct the information based on the finer-grained
caps. This avoids DX9 semantics from needlessly leaking into all of the
drivers, most of who doesn't care a whole lot about DX9 specifically.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 15:50:51 +02:00
Dongwon Kim
6866765cb3 iris: disable repacking for compression for applicable gen
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-07-08 10:54:38 -07:00
Jason Ekstrand
4633298fd6 iris: Use a uint16_t for key sizes
sizeof(struct brw_vs_prog_key) == 324.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-04 19:52:34 -05:00
Kenneth Graunke
9ea67f0a79 iris: Fix MOCS for grid surface
Hardcoding 4 is bad; we have a function for this now.
2019-07-03 22:24:50 -07:00
Kenneth Graunke
10560f8506 iris: Minor tidying 2019-07-03 22:24:44 -07:00
Mike Blumenkrantz
e005470466 iris: assert isl_surf_init success in resource_from_handle
this can fail unexpectedly due to bugs, so it's good to provide feedback
when this occurs

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-07-02 15:39:44 -07:00
Kenneth Graunke
457a55716e iris: Defer closing and freeing VMA until buffers are idle.
There will unfortunately be circumstances where we cannot re-use a
virtual memory address until it's no longer active on the GPU.  To
facilitate this, we instead move BOs to a "dead" list, and defer
closing them and returning their VMA until they are idle.  We
periodically sweep these away in cleanup_bo_cache, which triggers
every time a new object's refcount hits zero.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-02 07:23:55 +00:00
Kenneth Graunke
07f3455664 iris: Add an explicit alignment parameter to iris_bo_alloc_tiled().
In the future, some images will need to be aligned to a larger value
than 4096.  Most buffers, however, don't have any such requirement,
so for now we only add the parameter to iris_bo_alloc_tiled() and
leave the others with the simpler interface.

v2: Fix missing alignment in vma_alloc, caught by Caio!

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-02 07:23:55 +00:00
Kenneth Graunke
9b1b971491 iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls.
If our resource_copy_region size is a small number of DWords, then
instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
a CS stall).  We also try and select the optimal batch.

Improves performance in Shadow of Mordor on Low settings at 1920x1080
on Skylake GT4e by 0.689096% +/- 0.473968% (n=4).  It tries to copy
4 bytes of data to a buffer which was most recently used as a writable
compute shader SSBO.  Previously we were switching from compute to the
render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.

I arbitrarily decided to support 4/8/12/16 bytes.  Jason thinks this
is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
2019-07-01 13:59:49 -07:00
Anuj Phogat
d96cba7754 Revert "iris/icl: Add WA_2204188704 to disable pixel shader panic dispatch"
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.

We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.

This reverts commit 9c421d6b47.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-28 14:02:13 -07:00
Kenneth Graunke
847ef8ee4f iris: Don't leak resources in iris_create_surface for incomplete FBOs
We were failing to pipe_resource_unreference on the failure path due
to a non-renderable format.  Instead of fixing this, just move the
checks earlier, before we even bother with refcounting or calloc.
2019-06-28 01:13:11 -07:00
Kenneth Graunke
bed305fb7a iris: Fix major resource leak in iris_set_shader_images
We were failing to unreference the old image resource.  Instead of open
coding this and doing it badly, just use the copier function which does
the right thing.
2019-06-27 19:08:46 -07:00
Nanley Chery
fb1350c76f intel: Add and use helpers for level0 extent
Prepare for a bug fix by adding and using helpers which convert
isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of
surface elements.

v2:
- Update iris (Ken).
- Update anv.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-27 23:38:37 +00:00
Kenneth Graunke
3d3685d354 iris: Fix memory leak of SO targets
We need to pitch these on context destroy.
2019-06-27 14:59:39 -07:00
Kenneth Graunke
d65819f054 iris: Fix memory leak for draw parameter resources
Need to pitch these on context destroy.
2019-06-27 14:59:39 -07:00