Commit graph

109801 commits

Author SHA1 Message Date
Jason Ekstrand
b28bad89b9 nir: Get rid of nir_register::is_packed
All we ever do is initialize it to zero, clone it, print it, and
validate it.  No one ever sets or uses it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Dave Airlie
ff852fdc05 virgl: add support for ARB_indirect_parameters
The protocol changes are already in place for it.

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-04-09 14:25:01 +10:00
Dave Airlie
05ff2dbf13 virgl: add support for ARB_multi_draw_indirect
This will pass the multi draw through to the host if it has
support for it instead of using the st to emulate it

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-04-09 14:15:24 +10:00
Dave Airlie
316b785c59 virgl: add support for missing command buffer binding.
When I added indirect support I forgot this, however to use it
now we need to check for a new enough capability on the host side.

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-04-09 14:15:12 +10:00
Caio Marcelo de Oliveira Filho
899fd66b44 docs: Add NV_compute_shader_derivatives to 19.1.0 relnotes 2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
45a4129392 anv: Implement VK_NV_compute_shader_derivatives
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
bd73531677 spirv: Add support for DerivativeGroup capabilities
As defined in SPV_NV_compute_shader_derivatives. These control how the
invocations are arranged in a CS when doing derivative and related
operations (which are also enabled by the extension).

Since we expect valid SPIR-V, we don't need to do more work at SPIR-V
level to enable the derivative and related operations to be called.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
956226c8ba iris: Enable NV_compute_shader_derivatives
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
f9b29c4a58 gallium: Add PIPE_CAP_COMPUTE_SHADER_DERIVATIVES
To enable NV_compute_shader_derivatives, which allows derivatives (and
texture lookups with implicit derivatives) in compute shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
c9d1569689 i965: Advertise NV_compute_shader_derivatives
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
94abc53030 intel/fs: Use NIR_PASS_V when lowering CS intrinsics
This will make that step visible in NIR_PRINT=1.

v2: Also use the macro for the cleanup passes.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
0425b34b79 intel/fs: Don't loop when lowering CS intrinsics
This was needed when certain intrinsics were lowered to other ones
that were defined by the same pass.  After 060817b2 "intel,nir: Move
gl_LocalInvocationID lowering to nir_lower_system_values" we don't
need the loop anymore.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
3ee3024804 intel/fs: Add support for CS to group invocations in quads
When using quads, instead of mapping the elements to the next 4 local
invocation indices, we map the two next in the "current" row and two
next in the "next row".  A side effect is that a thread will execute
the indices in a different order.

We now perform the lowering of both local invocation ID and index
together -- and don't rely anymore on lowering done by
nir_lower_system_values.  That is convenient when doing the math for
quads, because we need X and Y to get the right invocation index.

When the pass progresses, fold the constants and clean up to reduce
the noise from the indexing math.

This implements the derivative_group_quadsNV semantics from
NV_compute_shader_derivatives.

v2: Take subgroup_id into account, otherwise only values in the first
    subgroup would be used. (Jason)

v3: Calculate invocation index and ID together, to avoid duplicating
    some math in the quads case when both index and ID are used. (Jason)

v4: Don't call cleanup passes as part of the lowering, let that to the
    call site. (Jason)
    Change calculation to use less instructions. (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
ef0339d5ea intel/fs: Use TEX_LOGICAL whenever implicit lod is supported
Make sure we include compute shaders that have a derivative group
defined.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
fcbc5ccaae nir: Don't set LOD=0 for compute shader that has derivative group
When using NV_compute_shader_derivatives to set a derivative group,
a compute shader supports texture with implicit LOD calculation, so
don't set an explicit LOD.

Note if the extension is used but the derivative group is not
specified, it will default to LOD=0 as before.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
d08a74d2bf nir/algebraic: Lower CS derivatives to zero when no group defined
In compute shaders if no derivative group is defined, the derivatives
will always be zero.  Specified in NV_compute_shader_derivatives.

To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules.  (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
3c5ddaeacd glsl: Parse and propagate derivative_group to shader_info
NV_compute_shader_derivatives allow selecting between two possible
arrangements (quads and linear) when calculating derivatives and
certain subgroup operations in case of Vulkan.  So parse and propagate
those up to shader_info.h.

v2: Do not fail when ARB_compute_variable_group_size is being used,
    since we are still clarifying what is the right thing to do here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
ca60f0b7ba glsl: Enable texture builtins for NV_compute_shader_derivatives
Renamed a few predicates from "fs_only" to be "derivative_only" (or
similar pairs).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
09a3273fe7 glsl: Enable derivative builtins for NV_compute_shader_derivatives
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
289478ea89 glsl: Remove redundant conditions when asserting in_qualifier
As the code evolved, we ended up with a redundant conditions.  Clean
this up.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
163655b33e mesa: Extension boilerplate for NV_compute_shader_derivatives
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Timothy Arceri
e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Dave Airlie
c6cf602121 softpipe: add support for vertex streams (v2)
This enables the ARB_gpu_shader5 vertex streams on softpipe.

v2: only enable when not using llvm.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-09 11:20:39 +10:00
Dave Airlie
7720ce32aa draw: add support to tgsi paths for geometry streams. (v2)
This hooks up the geometry shader processing to the TGSI
support added in the previous commits.

It doesn't change the llvm interface other than to
keep things building.

v2: fix some regressions caused by primitiveoffsets

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-09 11:19:38 +10:00
Dave Airlie
ddb9ad363d softpipe: add support for indexed queries.
We need indexed queries to retrieve the geom shader info.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-09 11:19:38 +10:00
Dave Airlie
00fe67c015 tgsi: add support for geometry shader streams.
This adds support to retrieve the primitive counts
for each stream, along with the offset for each
primitive into the output array.

It also adds support for parsing the stream argument
to the emit and end instructions.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-09 11:19:38 +10:00
Dave Airlie
333746011d draw: add stream member to stats callback
This just adds space for the member to the callback, doesn't
change anything else.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-09 11:19:38 +10:00
Chia-I Wu
63b823130d vulkan/wsi: make wl_drm optional
When wl_drm is missing and the driver supports modifiers, use
zwp_linux_dmabuf_v1 for the list of supported formats and for buffer
creation.

Limit the supported formats to those with modifiers, which are
WL_DRM_FORMAT_{ARGB8888,XRGB8888} currently.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-04-09 00:42:30 +00:00
Chia-I Wu
5318858f35 vulkan/wsi: add wsi_wl_display_dmabuf
Add wsi_wl_display_dmabuf for zwp_linux_dmabuf_v1-related states.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-04-09 00:42:30 +00:00
Chia-I Wu
fd7fecf59a vulkan/wsi: add wsi_wl_display_drm
Add wsi_wl_display_drm for wl_drm-related states.  We will move
formats into the struct in a later commit.

Remove the unnecessary check for wl_registry_bind failures.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-04-09 00:42:30 +00:00
Chia-I Wu
22dcb080d9 vulkan/wsi: refactor drm_handle_format
Refactor the swtich statement in drm_handle_format out to
wsi_wl_display_add_wl_format.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-04-09 00:42:30 +00:00
Chia-I Wu
2d214d9405 vulkan/wsi: create wl_drm wrapper as needed
When modifiers are specified, we have to use dmabuf rather than
wl_drm.  We don't need the wrapper in that case.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-04-09 00:42:30 +00:00
Chia-I Wu
ab74937b2c vulkan/wsi: move modifier array into wsi_wl_swapchain
This avoids repeated checks for each wsi_wl_image.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-04-09 00:42:30 +00:00
Adam Jackson
52426ce4a9 drisw: Try harder to probe whether MIT-SHM works
XQueryExtension merely tells you whether the extension exists, it
doesn't tell you whether you're local enough for it to work.
XShmQueryVersion is not enough to discover this either, you need to
provoke the server to do actual work, and if it thinks you're remote it
will throw BadRequest at you. So send an invalid ShmDetach and use the
error code to distinguish local from remote.

[airlied: fixed bug not resetting xshm_error to 0 on success,
which made later stuff fail completely.]

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2019-04-09 09:50:24 +10:00
Jason Ekstrand
50f3535d1f nir/search: Search for all combinations of commutative ops
Consider the following search expression and NIR sequence:

    ('iadd', ('imul', a, b), b)

    ssa_2 = imul ssa_0, ssa_1
    ssa_3 = iadd ssa_2, ssa_0

The current algorithm is greedy and, the moment the imul finds a match,
it commits those variable names and returns success.  In the above
example, it maps a -> ssa_0 and b -> ssa_1.  When we then try to match
the iadd, it sees that ssa_0 is not b and fails to match.  The iadd
match will attempt to flip itself and try again (which won't work) but
it cannot ask the imul to try a flipped match.

This commit instead counts the number of commutative ops in each
expression and assigns an index to each.  It then does a loop and loops
over the full combinatorial matrix of commutative operations.  In order
to keep things sane, we limit it to at most 4 commutative operations (16
combinations).  There is only one optimization in opt_algebraic that
goes over this limit and it's the bitfieldReverse detection for some UE4
demo.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15310125 -> 15302469 (-0.05%)
    instructions in affected programs: 1797123 -> 1789467 (-0.43%)
    helped: 6751
    HURT: 2264

    total cycles in shared programs: 357346617 -> 357202526 (-0.04%)
    cycles in affected programs: 15931005 -> 15786914 (-0.90%)
    helped: 6024
    HURT: 3436

    total loops in shared programs: 4360 -> 4360 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 23675 -> 23666 (-0.04%)
    spills in affected programs: 235 -> 226 (-3.83%)
    helped: 5
    HURT: 1

    total fills in shared programs: 32040 -> 32032 (-0.02%)
    fills in affected programs: 190 -> 182 (-4.21%)
    helped: 6
    HURT: 2

    LOST:   18
    GAINED: 5

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-04-08 21:38:48 +00:00
Lionel Landwerlin
48e48b8560 intel: add dependency on genxml generated files
Drivers using genxml will start compilation before generated files are
created, so add a dependency to it.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Cc: mesa-stable@lists.freedesktop.org
2019-04-08 20:52:47 +00:00
Marek Olšák
4b63f57cbc radeonsi: fix a crash when unbinding sampler states
Acked-by: James Zhu <James.Zhu@amd.com>
2019-04-08 15:23:32 -04:00
Samuel Pitoiset
775191cd99 radv: fix getting the vertex strides if the bindings aren't contiguous
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349
Fixes: a66b186beb ("radv: use typed buffer loads for vertex input fetches")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-08 21:17:15 +02:00
Lionel Landwerlin
ce790c96a9 anv: implement VK_KHR_swapchain revision 70
This revision allows for images to be :

   - created by reusing image parameters from swapchain

   - bound to memory from a swapchain

v2: Add color attachment flag
    Use same implicit WSI parameters (tiling, samples, usage)

v3: Fix missing break in vk_foreach_struct_const() switch (Lionel)

v4: Fix accessing image aspects before android resolve (Tapani)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-04-08 18:27:02 +01:00
Eric Engestrom
ed91ca0629 vk/util: remove unneeded array index
This is an array of 1, so [0] is the only content, and meson already
flattens the list so this is unnecessary.
Also, all the other uses of vk_api_xml don't do that.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-04-08 17:03:00 +00:00
Samuel Pitoiset
27b8f3ecc3 ac/nir: fix intrinsic names for atomic operations with LLVM 9+
This fixes the following LLVM error when using RADV_DEBUG=checkir:
Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32
i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add

The cmpswap operation still uses the old intrinsic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-08 13:16:50 +02:00
Alyssa Rosenzweig
4209a27c61 panfrost: Remove "mali_unknown6" nonsense
This structure was used maaaany moons ago as a placeholder for the
varying meta (now unified with mali_attr_meta and essentially fully
decoded). I don't know why it's still in the file. Let's wack it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-07 16:05:42 +00:00
Alyssa Rosenzweig
b19d1a1e63 panfrost/midgard: Enable lower_find_lsb
This is exactly what the blob does.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-07 16:01:49 +00:00
Alyssa Rosenzweig
65816ad6e8 panfrost/midgard: Add ibitcount8 op
The mechanics of this opcode are a little opaque, but essentially, it's
used in 8-bit mode to do a bit count in parallel of a uint and then
doing a ton of clever iadd/imov ops to recombine.

v2: Correct opcode. Thank you to jernej on IRC for noticing this awkward
typo!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-07 16:01:12 +00:00
Alyssa Rosenzweig
6cba9acb75 panfrost/midgard: Add ilzcnt op
Used for implementing findLSB/MSB

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-07 16:00:39 +00:00
Alyssa Rosenzweig
2e7555b14b panfrost/midgard: Add umin/umax opcodes
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-07 15:59:05 +00:00
Alyssa Rosenzweig
d84ee49027 panfrost: Add tilebuffer load? branch
Also document branches better.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-07 15:58:44 +00:00
Alyssa Rosenzweig
7cccc89f80 panfrost/decode: Add flags for tilebuffer readback
These flags are set when reading back the tilebuffer from a fragment
shader via various mechanisms (including ARM_shader_framebuffer_fetch
and EXT_pixel_local_storage).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-07 15:58:19 +00:00
Karol Herbst
1aabb79bdc panfrost/midgard: use nir_src_is_const and nir_src_as_uint
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-07 15:56:10 +00:00
Jason Ekstrand
10a2fdacfa vc4: Prefer nir_src_comp_as_uint over nir_src_as_const_value
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-07 15:13:36 +02:00