Commit graph

117072 commits

Author SHA1 Message Date
Samuel Pitoiset
ef1787dbc9 radv: only enable VK_AMD_gpu_shader_{half_float,int16} on GFX9+
These two extensions are supported on GFX8 but the throughput
of 16-bit floats/integers is same as 32-bit. Also, shaderInt16
is only enabled on GFX9+ for the same reason, be more consistent.

This fixes a crash with Wolfenstein II because it expects
shaderInt16 to be enabled when VK_AMD_gpu_shader_half_float is
exposed. Note that AMDVLK only enables these extensions on GFX9+.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-28 08:40:44 +02:00
Samuel Pitoiset
5d6d29ed5d radv: add si_emit_ia_multi_vgt_param() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-28 08:40:42 +02:00
Alexandros Frantzis
7da90a7cc9 virgl: Don't allow creating staging pipe_resources
Staging buffers are now created directly by the virgl_staging_mgr. We
don't need to support creating staging pipe_resources.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Alexandros Frantzis
5388be039b virgl: Use virgl_staging_mgr
Use an instance of virgl_staging_mgr instead of u_upload_mgr to handle
the staging buffer. This removes the need to track the availability
of the staging manager, since virgl_staging_mgr can handle concurrent
active allocations.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Alexandros Frantzis
790d1a0b17 virgl: Add tests for virgl_staging_mgr
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Alexandros Frantzis
55a58dfcfb virgl: Introduce virgl_staging_mgr
Add a manager for the staging buffer used in virgl. The staging manager
is heavily inspired by u_upload_mgr, but is simpler and is a better fit
for virgl's purposes. In particular, the staging manager:

* Allows concurrent staging allocations.
* Calls the virgl winsys directly to create and map resources, avoiding
  unnecessarily going through gallium resources and transfers.

olv: make virgl_staging_alloc_buffer return a bool

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Alexandros Frantzis
6a03f25522 virgl: Store the virgl_hw_res for copy transfers
Store the virgl_hw_res instead of the pipe_resource for copy transfer
sources. This prepares the codebase for a change to provide only the
virgl_hw_res for the staging buffers in upcoming commits.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Kenneth Graunke
bed305fb7a iris: Fix major resource leak in iris_set_shader_images
We were failing to unreference the old image resource.  Instead of open
coding this and doing it badly, just use the copier function which does
the right thing.
2019-06-27 19:08:46 -07:00
Kenneth Graunke
255c71ec07 gallium: Make util_copy_image_view handle shader_access
A while back, we added a new field, but failed to update the copier.
I believe iris is the only current user of the new field, and it hasn't
used the copier, so noone noticed.

Fixes: 8b626a22b2 st/mesa: Record shader access qualifiers for images
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-27 19:06:19 -07:00
Kenneth Graunke
0d6fc6f07e gallium: Teach GALLIUM_REFCNT_LOG about array textures
Otherwise they are classified as pipe_martian_resource, and don't
contain any helpful information about the texture.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-27 16:56:15 -07:00
Nanley Chery
02f6995d76 isl: Don't align phys_level0_sa by block dimension
Aligning phys_level0_sa by the compression block dimension prior to
mipmap layout causes the layout of compressed surfaces to differ from
the sampler's expectations in certain cases. The hardware docs agree:

From the BDW PRM, Vol. 5, Compressed Mipmap Layout,

   The compressed mipmaps are stored in a similar fashion to
   uncompressed mipmaps [...]

   The following exceptions apply to the layout of compressed (vs.
   uncompressed) mipmaps:
      * [...]
      * The dimensions of the mip maps are first determined by applying
	the sizing algorithm presented in Non-Power-of-Two Mipmaps
	above. Then, if necessary, they are padded out to compression
	block boundaries.

The last bullet indicates that alignment should not be done for
calculating a miplevel's dimensions, but rather for determining miplevel
placement/padding. Comply with this text by removing the extra
alignment.

Fixes some fbo-generatemipmap-formats piglit failures on all tested
platforms (SNB-KBL).

v2:
- Note fixed platforms.
- Update some consumers via a helper function.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-27 23:38:38 +00:00
Nanley Chery
fb1350c76f intel: Add and use helpers for level0 extent
Prepare for a bug fix by adding and using helpers which convert
isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of
surface elements.

v2:
- Update iris (Ken).
- Update anv.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-27 23:38:37 +00:00
Dylan Baker
0ba0c0c15c meson: try to use cmake as a finder for clang
Clang (like LLVM), very annoyingly refuses to provide pkg-config, and
only provides cmake (unlike LLVM which at least provides llvm-config,
even if llvm-config is terrible). Meson has gained the ability to use
cmake to find dependencies, and can successfully find Clang. This change
attempts to use cmake to find clang instead of a bunch of library
searches, when paired with -Dcmake_prefix_path we can much more reliably
use cmake to control which clang we're getting. This is only enabled for
meson >= 0.51, which adds the required options.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-27 22:12:02 +00:00
Dylan Baker
5157a42765 meson: Add support for using cmake for finding LLVM
Meson has support for using cmake as a finder for some dependencies,
including LLVM. Using cmake has a lot of advantages: it needs less meson
maintenance to keep working (even for llvm updates); it works more
sanely for cross compiles (as llvm-config is a compiled binary not a
shell script). Meson 0.51.0 also has a new generic variable getter that
can be used to get information from either cmake, pkg-config, or
config-tools dependencies, which is needed for cmake. We continue to
support using llvm-config if you don't have cmake installed, or if cmake
cannot find a suitable version.

Fixes: 0d59459432
       ("meson: Force the use of config-tool for llvm")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-27 22:12:02 +00:00
Kenneth Graunke
3d3685d354 iris: Fix memory leak of SO targets
We need to pitch these on context destroy.
2019-06-27 14:59:39 -07:00
Kenneth Graunke
d65819f054 iris: Fix memory leak for draw parameter resources
Need to pitch these on context destroy.
2019-06-27 14:59:39 -07:00
Kenneth Graunke
50eb1c1396 iris: Drop u_upload_unmap
We use persistent maps so this does nothing.
2019-06-27 14:59:39 -07:00
Lionel Landwerlin
836225840c intel/compiler: fix derivative on y axis implementation
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more
obvious what is going on and also sets the group each of the 4 threads
in the groups are supposed to execute.

Fixes the following CTS tests :

   dEQP-VK.glsl.derivate.dfdyfine.dynamic_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2134ea3800 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")
2019-06-27 18:14:58 +00:00
Eric Engestrom
53f17c4efd meson: set up a proper internal dependency for xmlconfig
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-27 17:42:25 +00:00
Eric Engestrom
ad0ee5bfa5 xmlconfig: add missing #include
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-27 17:42:25 +00:00
Eric Engestrom
069e6d587e xmlpool: fix typo in comment
s/otions/options/, and while here let's give the full path to xmlpool.h
since `../` won't be true in the generated file.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-27 17:42:25 +00:00
Kenneth Graunke
d6683e118f iris: Also properly restore INTERFACE_DESCRIPTOR_DATA buffer object
We were at least cleaning up this reference, but we were failing to
pin it in iris_restore_compute_saved_bos.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
340df53d6a iris: Fix resource tracking for CS thread ID buffer
Today, we stream the compute shader thread IDs simply because they're
(annoyingly) relative to dynamic state base address.  We could upload
them once at compile time, but we'd need a separate non-streaming
uploader for IRIS_MEMZONE_DYNAMIC, and I'm not sure it's worth it.

stream_state pins the buffer for use in the current batch, but also
returns a reference to the pipe_resource.  We dropped this reference
on the floor, leaking a reference basically every time we dispatched
a compute shader after switching to a new one.

The reason it returns a reference is so that we can hold on to it and
re-pin it in iris_restore_compute_saved_bos, which we were also failing
to do.  So if we actually filled up a batch with repeated dispatches to
the same compute shader, and flushed, then continued dispatching, we
would fail to pin it and likely GPU hang.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
16d334951e iris: Only bother with thread ID upload if doing MEDIA_CURBE_LOAD
We were unconditionally uploading the new data, but then conditionally
using it with MEDIA_CURBE_LOAD.  If we're not going to emit the command,
there's no point in uploading the data.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
8f51f1ba6e iris: Do MEDIA_CURBE_LOAD when IRIS_DIRTY_CS is set, not constants
We only use push the compute shader thread IDs, not any actual constant
buffer data.  So we should track the compute shader variant changing,
not constbuf changes.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
85c72da1b1 iris: Drop UBO range stuff from iris_restore_compute_saved_bos
Compute doesn't use UBO ranges (annoyingly), so this is dead code.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
f94ebf0c9d iris: Properly align interface descriptor data addresses
MEDIA_INTERFACE_DESCRIPTOR's Interface Descriptor Data Start Address
field's docs say: "This bit specifies the 64-byte aligned address..."

And we were doing 32.  Superfluous thread ID uploading was apparently
saving us from GPU hangs in most cases.
2019-06-27 08:12:22 -07:00
Andrii Simiklit
62c6059584 mesa: use a correct function return type
v2: standard 'bool' can be used
     ( Eric Engestrom <eric.engestrom@intel.com> )

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
2019-06-27 07:53:41 +00:00
Tomeu Vizoso
9bef1f1ff1 panfrost/decode: Mention the address of a few descriptors
When the fault_pointer field in the header is set, we can get some idea
of which descriptor the HW isn't happy with if we know their addresses.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-27 09:13:48 +02:00
Tomeu Vizoso
de02fb19ed panfrost/decode: Wait for a job to finish before dumping
Then we can get some information back about any exception that might
have happened.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-27 09:13:42 +02:00
Tomeu Vizoso
fa36c194fd panfrost/decode: Decode exception status
Arm's kernel driver mentions how to decode this field, which makes a bit
clearer what had happened.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-27 09:13:35 +02:00
Tomeu Vizoso
b26c2b4840 panfrost/decode: Print AFBC struct when appropriate
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-27 09:12:56 +02:00
Samuel Pitoiset
d5004f60be radv: only export clip/cull distances if PS reads them
The only exception is the GS copy shader which emits them
unconditionally.

Totals from affected shaders:
SGPRS: 71320 -> 71008 (-0.44 %)
VGPRS: 54372 -> 54240 (-0.24 %)
Code Size: 2952628 -> 2941368 (-0.38 %) bytes
Max Waves: 9689 -> 9723 (0.35 %)

This helps Dota2, Doom, GTAV and Hitman 2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-27 08:56:37 +02:00
Samuel Pitoiset
1e9ccc5429 radv: fix FMASK expand if layerCount is VK_REMAINING_ARRAY_LAYERS
This doesn't fix anything known, but it's likely going to
break if layerCount is ~0U.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-27 08:56:34 +02:00
Kenneth Graunke
8551dc17a7 iris: Disable loop unrolling in GLSL IR.
Leave it to NIR instead, like i965 does.  Thanks to Tim Arceri for
noticing that I'd left this enabled by accident.

shader-db results on Skylake:

total instructions in shared programs: 15522628 -> 15521642 (<.01%)
instructions in affected programs: 94008 -> 93022 (-1.05%)
helped: 34
HURT: 33
helped stats (abs) min: 12 max: 48 x̄: 33.82 x̃: 42
helped stats (rel) min: 0.06% max: 22.14% x̄: 9.86% x̃: 10.89%
HURT stats (abs)   min: 1 max: 16 x̄: 4.97 x̃: 3t
HURT stats (rel)   min: 0.82% max: 3.77% x̄: 1.73% x̃: 1.53%
95% mean confidence interval for instructions value: -20.08 -9.35
95% mean confidence interval for instructions %-change: -5.95% -2.36%
Instructions are helped.

total cycles in shared programs: 367105221 -> 367074230 (<.01%)
cycles in affected programs: 10017660 -> 9986669 (-0.31%)
helped: 266
HURT: 184
helped stats (abs) min: 1 max: 9556 x̄: 151.35 x̃: 12
helped stats (rel) min: 0.08% max: 59.91% x̄: 4.66% x̃: 1.67%
HURT stats (abs)   min: 1 max: 1716 x̄: 50.37 x̃: 6
HURT stats (rel)   min: <.01% max: 24.40% x̄: 2.42% x̃: 0.85%
95% mean confidence interval for cycles value: -133.90 -3.84
95% mean confidence interval for cycles %-change: -2.44% -1.10%
Cycles are helped.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-26 22:55:03 -07:00
Kenneth Graunke
acadeaff6a st/mesa: Set EmitNoIndirectSampler if GLSLVersion < 400.
This patch changes the code which sets EmitNoIndirectSampler to check
the core profile GLSL version, rather than the ARB_gpu_shader5 extension
enable.  st/mesa exposes ARB_gpu_shader5 if GLSLVersion (in core
profiles) or GLSLVersionCompat (in compat profiles) >= 400.

The Intel drivers do not currently expose ARB_gpu_shader5 in compat
profiles.  But the backend can absolutely handle indirect samplers.
Looking at the core profile version number should be a good indication
of what the driver supports.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-26 22:54:52 -07:00
Kenneth Graunke
116144d65e iris: Delete dead ice->state.streamout_strides field.
Nothing uses this, it must be a remnant from an earlier approach.
2019-06-26 20:17:22 -07:00
Caio Marcelo de Oliveira Filho
085c0f1f13 nir/algebraic: Add helpers and a rule involving wrapping
The helpers are needed so we can use the syntax `instr(cond)` in the
algebraic rules.  Add simple rule for dropping a pair of mul-div of
the same value when wrapping is guaranteed to not happen.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-26 14:13:02 -07:00
Caio Marcelo de Oliveira Filho
5a143965b8 spirv: Implement NoSignedWrap and NoUnsignedWrap decorations
When handling the specified ALU operations, check for the decorations
and set nir_alu_instr no_signed_wrap and no_unsigned_wrap flags accordingly.

v2: Add a glsl_base_type_is_unsigned_integer() helper.  (Karol)

v3: Rename helper to glsl_base_type_is_uint().

v4: Use two flags, so we don't need the helper anymore.  (Connor)

v5: Pass alu directly to handle function.  (Jason)

Reviewed-by: Karol Herbst <kherbst@redhat.com> [v3]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-26 14:13:02 -07:00
Caio Marcelo de Oliveira Filho
ae37237713 nir: Add a no wrapping bits to nir_alu_instr
They indicate the operation does not cause overflow or underflow.
This is motivated by SPIR-V decorations NoSignedWrap and
NoUnsignedWrap.

Change the storage of `exact` to be a single bit, so they pack
together.

v2: Handle no_wrap in nir_instr_set.  (Karol)

v3: Use two separate flags, since the NIR SSA values and certain
    instructions are typeless, so just no_wrap would be insufficient
    to know which one was referred to.  (Connor)

v4: Don't use nir_instr_set to propagate the flags, unlike `exact`,
    consider the instructions different if the flags have different
    values.  Fix hashing/comparing.  (Jason)

Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-26 14:13:02 -07:00
Dylan Baker
f97dcb7a55 docs: add news item and link release notes for 19.0.8
This is an emergency release due to a critical bug.
2019-06-26 13:48:06 -07:00
Dylan Baker
290495a431 docs: Add mesa 19.0.8 sha256 sums 2019-06-26 13:46:30 -07:00
Dylan Baker
10a24925a0 docs: Add docs for 19.0.8 2019-06-26 13:46:29 -07:00
Jonathan Marek
a70ff70158 nir: remove fnot/fxor/fand/for opcodes
There doesn't seem to be any reason to keep these opcodes around:
* fnot/fxor are not used at all.
* fand/for are only used in lower_alu_to_scalar, but easily replaced

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-26 15:26:10 -04:00
Jonathan Marek
0b5a483baa nir: opt_vectorize: combine different constant sources
We can vectorize instructions with different constant sources by creating
a new load_const and using that.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 14:56:28 -04:00
Alyssa Rosenzweig
10688257bd panfrost/midgard: Merge embedded constants
In Midgard, a bundle consists of a few ALU instructions. Within the
bundle, there is room for an optional 128-bit constant; this constant is
shared across all instructions in the bundle.

Unfortunately, many instructions want a 128-bit constant all to
themselves (how selfish!). If we run out of space for constants in a
bundle, the bundle has to be broken up, incurring a performance and
space penalty.

As an optimization, the scheduler now analyzes the constants coming in
per-instruction and attempts to merge shared components, adjusting the
swizzle accessing the bundle's constants appropriately. Concretely,
given the GLSL:

   (a * vec4(1.5, 0.5, 0.5, 1.0)) + vec4(1.0, 2.3, 2.3, 0.5)

instead of compiling to the naive two bundles:

   vmul.fmul [temp], [a], r26
   fconstants 1.5, 0.5, 0.5, 1.0

   vadd.fadd [out], [temp], r26
   fconstants 1.0, 2.3, 2.3, 0.5

The scheduler can now fuse into a single (pipelined!) bundle:

   vmul.fmul [temp], [a], r26.xyyz
   vadd.fadd [out], [temp], r26.zwwy
   fconstants 1.5, 0.5, 1.0, 2.3

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig
a0a34946d8 panfrost/midgard: Share swizzle compose
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig
f6fde45d5c panfrost/midgard: Share swizzle/mask code
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig
0979ea9de8 panfrost: Fix checksumming typo
Fixes: 3e6c6bb0 ("panfrost: Merge checksum buffer with main BO")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 09:58:30 -07:00
Kenneth Graunke
ab009b7d6e iris: Fix overzealous query object batch flushing.
In the past, each query object had their own BO.  Checking if the batch
referenced that BO was an easy way to check if commands were still
queued to compute the query value.  If so, we needed to flush.

More recently (c24a574e6c), we started using an u_upload_mgr for query
objects, placing multiple queries in the same BO.  One side-effect is
that iris_batch_references is a no longer a reasonable way to check if
commands are still queued for our query.  Ours might be done, but a
later query that happens to be in the same BO might be queued.  We don't
want to flush in that case.

Instead, check if the current batch's signalling syncpt is the one we
referenced when ending the query.  We know the syncpt can't have been
reused because our query is holding a reference, so a simple pointer
comparison should suffice.

Removes all batch flushing caused by query objects in Shadow of Mordor.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-06-26 09:49:01 -07:00