Commit graph

103048 commits

Author SHA1 Message Date
Kevin Strasser
ec0a68e50d gallium/winsys/kms: Fix dumb buffer bpp
The bpp in the dumb buffer creation request is hardcoded to 32, which is an
incorrect assumption as the caller is free to pick any pipe format. Use the
bpp supplied to us through util_format_get_blocksizebits().

Fixes: 3b176c441b "gallium: Add a dumb drm/kms winsys backed swrast provider"
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-12 11:44:10 -07:00
Eric Engestrom
9996ddbb27 util/futex: fix dangling pointer use
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110901
Fixes: 7dc2f47882 "util: emulate futex on FreeBSD using umtx"
Cc: Greg V <greg@unrelenting.technology>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-12 17:27:44 +01:00
Samuel Pitoiset
d378151246 radv: fix VK_EXT_memory_budget if one heap isn't available
When the visible VRAM size is equal to the VRAM size only two
heaps are exposed.

This fixes dEQP-VK.api.info.device.memory_budget.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-12 15:52:48 +02:00
Samuel Pitoiset
2ef9d2738c radv: fix occlusion queries on VegaM
The number of render backends is 16 but the enabled mask is 0xaaaa.

As noticed by Bas, allowing disabled render backends might break
the OCCLUSION_QUERY packet. We don't use it yet but keep this in
mind.

This fixes dEQP-VK.query_pool.* and dEQP-VK.multiview.*.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-12 15:51:12 +02:00
Lionel Landwerlin
93b93e5a9d anv: do not parse genxml data without INTEL_DEBUG=bat
This significantly slows down the CTS runs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 32ffd90002 ("anv: add support for INTEL_DEBUG=bat")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-06-12 12:53:35 +03:00
Lionel Landwerlin
f80679c8e8 intel/dump: fix segfault when the app hasn't accessed the device
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-12 09:49:55 +03:00
Caio Marcelo de Oliveira Filho
48d7e7a9b8 iris: Only upload surface state for grid info when needed
Special care is needed to ensure that when we have two consecutive
calls with the same grid size, we only bail in the second one if it
either don't need the surface state or the surface state was already
uploaded.

v2: Instead of having a new bool in ice->state to know whether we had
    a surface, check whether we have state->ref.  (Ken)
    Clean up the logic a little bit by adding 'grid_updated' local. (Ken)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 17:57:37 -07:00
Caio Marcelo de Oliveira Filho
f346b277d1 iris: Create binding table slot for num_work_groups only when needed
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 17:57:37 -07:00
Rui Salvaterra
7b43362f29 r300g: implement GLSL disk shader caching
This implements GLSL disk shader caching for the R300-R500 series of AMD GPUs.

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-11 20:49:34 -04:00
Richard Thier
ffd2f948fe r300g: restore performance after RADEON_FLAG_NO_INTERPROCESS_SHARING was added
v1: Fix skipped slab allocators and the buffer cache.

v2: Use only 1 domain for texture allocation

v3: Added flag for the create_fence call too

Based on Marek v1 and v2 proposed fixes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=1107812.patch

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-11 20:45:27 -04:00
Marek Olšák
ec0956a194 radeonsi: don't test SDMA perf if SDMA is disabled/unsupported 2019-06-11 20:05:21 -04:00
Marek Olšák
993bf52977 radeonsi: always interpolate PrimID as flat 2019-06-11 20:05:21 -04:00
Marek Olšák
7f7ffa0883 radeonsi: move color clamping to si_llvm_export_vs to unify the code 2019-06-11 20:05:21 -04:00
Marek Olšák
4773f5a293 radeonsi: use the ac helper for index buffer stores in the culling shader 2019-06-11 20:05:21 -04:00
Marek Olšák
579003e7bd radeonsi: use the ac helper for image stores 2019-06-11 20:05:21 -04:00
Marek Olšák
deef3833f8 radeonsi: use the ac helper for SSBO stores 2019-06-11 20:05:21 -04:00
Marek Olšák
e5fe38484a radeonsi: fixes for vec3 buffer stores in LLVM 9 2019-06-11 20:05:21 -04:00
Caio Marcelo de Oliveira Filho
9c81db8adb iris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
This avoids lowering of CS system values by GLSL (configured by state
tracker).  In i965 we don't use that lowering, and we also shouldn't
need that in Iris.

Using it cause some unnecessary round trip between values, e.g.:
shader uses gl_LocalInvocationIndex, GLSL rewrites it in terms of
gl_LocalInvocationID, then driver rewrites those in terms of
gl_LocalInvocationIndex again.  Copy propagation can make some of
those go away, but not all as seen below.

Intel SKL shader-db results:

    total instructions in shared programs: 15595189 -> 15594556 (<.01%)
    instructions in affected programs: 74880 -> 74247 (-0.85%)
    helped: 81
    HURT: 4
    helped stats (abs) min: 2 max: 172 x̄: 7.88 x̃: 4
    helped stats (rel) min: 0.19% max: 5.66% x̄: 1.71% x̃: 1.23%
    HURT stats (abs)   min: 1 max: 2 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.45% max: 1.65% x̄: 0.76% x̃: 0.46%
    95% mean confidence interval for instructions value: -11.56 -3.34
    95% mean confidence interval for instructions %-change: -1.91% -1.28%
    Instructions are helped.

    total loops in shared programs: 4831 -> 4831 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 372136618 -> 372145628 (<.01%)
    cycles in affected programs: 9218230 -> 9227240 (0.10%)
    helped: 131
    HURT: 86
    helped stats (abs) min: 1 max: 798 x̄: 39.79 x̃: 12
    helped stats (rel) min: <.01% max: 6.75% x̄: 0.42% x̃: 0.13%
    HURT stats (abs)   min: 2 max: 2442 x̄: 165.38 x̃: 6
    HURT stats (rel)   min: <.01% max: 20.83% x̄: 0.74% x̃: 0.12%
    95% mean confidence interval for cycles value: -2.07 85.11
    95% mean confidence interval for cycles %-change: -0.22% 0.30%
    Inconclusive result (value mean confidence interval includes 0).

    total spills in shared programs: 11956 -> 11950 (-0.05%)
    spills in affected programs: 77 -> 71 (-7.79%)
    helped: 3
    HURT: 0

    total fills in shared programs: 25619 -> 25549 (-0.27%)
    fills in affected programs: 593 -> 523 (-11.80%)
    helped: 4
    HURT: 0

    LOST:   0
    GAINED: 0

    Total CPU time (seconds): 1695.69 -> 1706.03 (0.61%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 15:12:17 -07:00
Caio Marcelo de Oliveira Filho
46de3beab1 gallium: Add PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
Tells whether or not the driver can handle gl_LocalInvocationIndex and
gl_GlobalInvocationID.  If not supported (the default), state tracker
will lower those on behalf of the driver.

v2: Add case to u_screen.c.  (Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 15:12:17 -07:00
Caio Marcelo de Oliveira Filho
f03b21ae69 st/glsl: Perform some var optimizations
Perform those before some derefs are gone when we lower the buffers
after the st_nir_opts() call.

Intel SKL shader-db results:

    total instructions in shared programs: 15593685 -> 15590708 (-0.02%)
    instructions in affected programs: 378078 -> 375101 (-0.79%)
    helped: 777
    HURT: 44
    helped stats (abs) min: 1 max: 68 x̄: 4.07 x̃: 4
    helped stats (rel) min: 0.04% max: 31.58% x̄: 2.88% x̃: 1.37%
    HURT stats (abs)   min: 1 max: 24 x̄: 4.20 x̃: 2
    HURT stats (rel)   min: 0.17% max: 8.00% x̄: 1.60% x̃: 1.27%
    95% mean confidence interval for instructions value: -4.02 -3.23
    95% mean confidence interval for instructions %-change: -2.93% -2.35%
    Instructions are helped.

    total loops in shared programs: 4815 -> 4815 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 371965528 -> 371788566 (-0.05%)
    cycles in affected programs: 184190307 -> 184013345 (-0.10%)
    helped: 3650
    HURT: 2855
    helped stats (abs) min: 1 max: 59400 x̄: 99.45 x̃: 15
    helped stats (rel) min: <.01% max: 43.18% x̄: 2.60% x̃: 1.02%
    HURT stats (abs)   min: 1 max: 16362 x̄: 65.16 x̃: 10
    HURT stats (rel)   min: <.01% max: 66.22% x̄: 2.78% x̃: 0.81%
    95% mean confidence interval for cycles value: -53.73 -0.68
    95% mean confidence interval for cycles %-change: -0.39% -0.08%
    Cycles are helped.

    total spills in shared programs: 11936 -> 11956 (0.17%)
    spills in affected programs: 443 -> 463 (4.51%)
    helped: 0
    HURT: 8

    total fills in shared programs: 25644 -> 25619 (-0.10%)
    fills in affected programs: 2306 -> 2281 (-1.08%)
    helped: 24
    HURT: 2

    LOST:   7
    GAINED: 16

    Total CPU time (seconds): 1679.04 -> 1695.69 (0.99%)

shader-db results radeonsi (VEGA64):

    Totals from affected shaders:
    SGPRS: 180160 -> 179552 (-0.34 %)
    VGPRS: 115368 -> 114544 (-0.71 %)
    Spilled SGPRs: 5627 -> 5603 (-0.43 %)
    Spilled VGPRs: 0 -> 0 (0.00 %)
    Private memory VGPRs: 0 -> 0 (0.00 %)
    Scratch size: 0 -> 0 (0.00 %) dwords per thread
    Code Size: 7808364 -> 7803268 (-0.07 %) bytes
    LDS: 192 -> 192 (0.00 %) blocks
    Max Waves: 19202 -> 19340 (0.72 %)
    Wait states: 0 -> 0 (0.00 %)

Radeonsi results provided by Timothy.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 14:53:54 -07:00
Ville Syrjälä
6230bfeb65 anv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7
Modern DXVK requires event support [1], but looks like it only
uses vkCmdSetEvent() + vkGetEventStatus(). So we can just
borrow the relevant code from gen8, leaving CmdWaitEvents still
unimplemented.

[1] 8c3900c533

v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-11 16:25:07 -05:00
Ian Romanick
39f4dc23a5 intel/fs: Mark source 0 of bcsel as needing Boolean resolve
The other sources of the bcsel behave like the sources of an and or
other logical operation.  However, source zero behaves differently.
It is evaluated as a Boolean, so it needs to be resolved.

No shader-db changes, but the tests mentioned in the bug get a couple
instructions added back.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-11 12:12:07 -07:00
Rob Clark
f9f89df8bc freedreno/a5xx: enable a540
Tested-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-11 12:03:10 -07:00
Rob Clark
832010f6ac freedreno/a6xx: enable UBWC by default
Flip the FD_MESA_DEBUG flag to a disable rather than enable, drop the
obsolete comment (and bonus, drop unused softpin debug flag)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
81cc555e9a freedreno/a6xx: disallow UBWC for z24s8
This is slightly annoying because it *mostly* works.. but we have some
issues to sort out about how to blit z24s8/x24s8/z24x8 with UBWC before
we can enable UBWC by default.  For now it is a step forward to at least
enable it for non-z/s while we figure out how to blit z24s8+UBWC.

(The basic issue is that pretending z24s8 is an equivalently sized rgba
format for the purpose of blitting falls apart when UBWC is in the
picture.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
4f1319a17d freedreno/a6xx: use correct UBWC reg builders
No functional change, the registers have the same layout as MRT flags
pitch reg.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
d42ce659ed freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
490baa6974 freedreno/a6xx: disable UBWC for some formats
An older blob claims to support UBWC w/ r32ui an r32i, but not r32f.
Results from deqp indicate that it doesn't work with r32ui and r32i.

This *could* also just mean that use as "IBO" (image) is more limited
than as texture, although blob also doesn't seem to bother to try to use
UBWC with images at all, so hard to know for sure.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
8ddffa75c0 freedreno/a6xx: handle non-UWC-compatible image views
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
dac3bc9862 freedreno/a6xx: handle non-UBWC-compatible texture views
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
fe5c7b2b75 freedreno: add helper to uncompress UBWC resource
We'll need this for a few edge cases, like image/sampler view that uses
a format that UBWC does not support with a resource originally created
in a format that UBWC does support.

NOTE we *could* in some cases do an in-place uncompress.  But that has
a couple potential sharp edges:

 1) the uncompressed buffer could have different layout, ie. a5xx
    with meta and pixel data of layers/levels interleaved.

 2) if it comes mid-batch, it would force flush, or somehow fixing
    up cmdstream for draws already emitted.  But with the resource
    shadowing approach we can rely on batch re-ordering to avoid
    splitting things.. older draws see the older compressed version,
    newer draws see the new uncompressed version of the rsc.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
846b8a76bd freedreno: handle images in rebind_resource()
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
c6ae354299 freedreno: allow null discard box in shadow path
When uncompressing a UBWC buffer, we don't want to discard anything.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
12201d7a8b freedreno: swap UBWC state in shadow path
It doesn't come up yet, as so far we only hit this path with linear
buffers.  But it will when we start re-using the shadow path for
uncompressing UBWC buffers.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
3c9a31eb50 freedreno: add modifier param to fd_try_shadow_resource()
To uncompress UBWC, I want to re-use the shadow path, but we'll need a
way to request that the new buffer is not compressed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
3b05a120a3 freedreno: correct modifier for UBWC buffers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Chia-I Wu
15323c14fd virgl: consider newly created resources idle
A newly created resource can be regarded as idle.  We don't care if
the RESOURCE_CREATE command has been retired, unless it is used for
fencing.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-11 10:03:54 -07:00
Chia-I Wu
9e4452cfd9 virgl: make resource_wait/resource_is_busy cheaper
The round trip to the kernel is expensive.  Add a local cache to
avoid it when possible.

There is a race condition when two contexts access the same resource
at the same time (e.g., ctx1 submits a cmdbuf that accesses a
resource while ctx2 maps the resource).  But that is probably an app
bug in the first place.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-11 10:03:54 -07:00
Chia-I Wu
ddc90be907 virgl: add virgl_drm_{alloc,free,clear}_res_list
Helpers to work with resource list.  virgl_drm_release_all_res is
removed.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-11 10:03:54 -07:00
Chia-I Wu
71465fe569 virgl: do not cache external resources
We should not reuse a resource for other purposes when it can still
be accessed by another process or device.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-11 10:03:54 -07:00
Alyssa Rosenzweig
7d43999e63 panfrost: Enable AFBC on depth/stencil
This seems to be a performance win, but more rigorous testing is
necessary to figure out the exact circumstances when this is good/bad.
Incidentally, this fixes non-aligned ZS.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:46:43 -07:00
Alyssa Rosenzweig
15f62b8e7c panfrost: Linear depth/stencil should be aligned
We might render to it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:46:43 -07:00
Alyssa Rosenzweig
d7ad29ce25 panfrost/midgard: Decode LOD/bias registers
For constant LODs/biases, we can use an immediate embedded in the
texture (already decoded); for non-constant, we have to use a register
squeezed into the usual immediate field, which is decoded here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
b4a3296e77 panfrost/midgard: Decode texture offset register swizzle
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
4e9e42cc56 panfrost/midgard/disasm: include textureGather()
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
6c18ae33bc panfrost/midgard: Support negative immediate offsets
It's not at all clear why this work for texelFetch but not texture.
Maybe the top bits are dual-purpose on other texturing ops...?

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
4d8157f12d panfrost/midgard: Fix redunant mask redundancy
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
3dee556c4e panfrost/midgard/disasm: Print LOD for texelFetch
Its encoding differs slightly from the LOD used in normal texture calls.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
cda9f32909 panfrost/midgard: Identify the in_reg_full field
This is clear for texelFetch, hence the confusion with Bifrost's filter
field, but it's much more general in reality.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
445a7b523f panfrost/midgard/disasm: Correctly dump bias/LOD
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00