Commit graph

118352 commits

Author SHA1 Message Date
Kenneth Graunke
0f2f561a10 anv: Enable Gen11 Color/Z write merging optimization
TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:46 -08:00
Kenneth Graunke
5cc7636993 iris: Enable Gen11 Color/Z write merging optimization
TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood.  I didn't see any improvements
at out-of-the-box power management settings, however.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:43 -08:00
Kenneth Graunke
0b74f85870 intel/genxml: Add a partial TCCNTLREG definition
TCCNTLREG contains additional cache programming settings.  In
particular, there are several write combining controls we'd like to use.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:33 -08:00
Kenneth Graunke
74665eaf3a util: Detect use-after-destroy in simple_mtx
This makes simple_mtx_destroy set the counter to an invalid canary
value and then makes lock/unlock assert that the value is legal.

That way, calling lock/unlock after destroy will assert fail,
rather than deadlocking or potentially even working.

This has caught real deadlocks in dEQP multithreaded tests (in st/mesa
shader variant zombie list handling), which have since been fixed.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-12-10 23:48:40 +00:00
Rob Clark
fc97643c57 freedreno/a6xx: enable LRZ by default
Now that dEQP should be happy, lets flip the switch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Rob Clark
1b4c12d3ee freedreno/a6xx: fix LRZ logic
In particular, we need to invalidate the LRZ state when we cannot be
confident in what the Z state would be during rendering:

1) depth test modes not supported by LRZ
2) stencil test, which would require full rasterization and stencil
   test in the binning pass (whereas LRZ normally just needs to
   determine the min and max z value in an 8x8 quad)

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Rob Clark
3c479849c5 freedreno/a6xx: fix LRZ layout
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Rob Clark
6cf101402d freedreno/a5xx+a6xx: split LRZ layout to per-gen
Seems to be a bit different for a6xx, so let's split this out.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Rob Clark
3b074a2e53 freedreno/a6xx: disable LRZ when blending
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Marek Olšák
a305543c8d radeonsi: don't rely on CLEAR_STATE to set PA_SC_GENERIC_SCISSOR_*
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:37 -05:00
Marek Olšák
aced18aa61 radeonsi/gfx10: simplify the tess_turns_off_ngg condition
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:36 -05:00
Marek Olšák
42f921387b radeonsi/gfx10: disable vertex grouping
based on PAL.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:34 -05:00
Marek Olšák
75ce078a0a radeonsi: enable NIR by default and document GL 4.6 support
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 15:48:58 -05:00
Marek Olšák
42b28e7ac3 st/dri: assume external consumers of back buffers can write to the buffers
This was reverted needlessly because if was part of another series.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
2019-12-10 15:37:37 -05:00
Jason Ekstrand
41691ac016 ANV: Stop advertising smoothLines support on gen10+
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
2019-12-10 20:13:56 +00:00
Dylan Baker
85a9698ac3 meson/broadcom: libbroadcom_cle needs expat headers
Fixes: 1ae8018a6a
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-10 10:48:38 -08:00
Lionel Landwerlin
5fdea9f401 anv: fix incorrect VMA alignment for CCS main surfaces
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.

Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:06:54 +00:00
Lionel Landwerlin
dcfe1903c3 anv: fix missing gen12 handling
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 181be14d43 ("anv: Build for gen12")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:06:54 +00:00
Eric Engestrom
865f4b193f docs: reword a bit and list HTTPS before FTP
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-12-10 15:21:23 +00:00
Eric Engestrom
d90e656fa7 meson: drop intel_ prefix on imgui_core
Again, no real effect, just the name of a temporary build file.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-12-10 15:16:02 +00:00
Eric Engestrom
2b0e3e9fd1 meson: drop duplicate lib prefix on libiris_gen*
This has no real effect other than the names of the temporary files in
the build folder.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-12-10 15:16:02 +00:00
Samuel Pitoiset
e4c8491bdf radv: implement VK_KHR_separate_depth_stencil_layouts
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 13:16:17 +01:00
Samuel Pitoiset
48ee62178f radv: initialize HTILE for separate depth/stencil aspects
It either clears the whole HTILE buffer or part of it depending
on the HTILE mask parameter.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 13:09:29 +01:00
Samuel Pitoiset
41cebfc9c1 radv: do not init HTILE as compressed state when dst layout allows it
I don't think this makes much differences and a potential clear
following the initialization will overwrite HTILE anyways.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 13:09:26 +01:00
Samuel Pitoiset
b603cc8c84 radv: synchronize after performing a separate depth/stencil fast clears
For depth+stencil images, the driver might use an optimized path
if only one aspect is cleared. It either clears the depth or the
stencil part of HTILE. Because the two separate aspects might use
the same HTILE memory we have to synchronize.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 13:09:22 +01:00
Michel Dänzer
dadd609664 gitlab-ci: Don't exclude any piglit quick_shader tests
Now that we're running these with process isolation enabled, their
results will hopefully be stable.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-10 11:19:11 +00:00
Krzysztof Raszkowski
cfe00a52f0 gallivm: add TGSI bit arithmetic opcodes support
Add TGSI_OPCODE_BFI, TGSI_OPCODE_POPC, TGSI_OPCODE_LSB,
TGSI_OPCODE_IMSB, TGSI_OPCODE_UMSB, TGSI_OPCODE_IBFE,
TGSI_OPCODE_UBFE, TGSI_OPCODE_BREV support.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
2019-12-10 10:34:18 +00:00
Samuel Pitoiset
008fe909ca radv: fix possibly wrong PA_SC_AA_CONFIG value for conservative rast
PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.

Found by inspection, doesn't fix anything known.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 11:04:43 +01:00
Samuel Pitoiset
4f659224c8 radv: move emission of two PA_SC_* registers to the pipeline CS
They don't have to be updated dynamically.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 11:04:40 +01:00
Pierre-Eric Pelloux-Prayer
87f7ec8a2c st/dri: use st->flush callback to flush the backbuffer
Previously the flush was done before the call to st->flush but
could lead to problems as FLUSH_VERTICES could push some work
that would change the backbuffer (or modify it).

With this commit, all the backbuffer flushing code is executed
right before the call to st_flush.

Closes: https://gitlab.freedesktop.org/drm/amd/issues/842
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205049

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 09:25:28 +01:00
Pierre-Eric Pelloux-Prayer
cc0d0afe3b st/mesa: add a notify_before_flush callback param to flush
The new callback is called right before the flush is done to allow
users of st->flush to do some work after all the previous work has
been flushed.

This will be used by dri_flush in the next commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 09:25:28 +01:00
Pierre-Eric Pelloux-Prayer
f5c1cb2383 radeonsi: dcc dirty flag
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 09:25:28 +01:00
Pierre-Eric Pelloux-Prayer
e3e91cebcd radeonsi: fix multi plane buffers creation
When using 3 planes, the sequence produces this chain:
  plane0 -> plane2
This commit fixes this to produce:
  plane0 -> plane1 -> plane2

Fixes: 86e60bc265 ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 08:52:16 +01:00
Pierre-Eric Pelloux-Prayer
ff0f108666 radeonsi: use gfx9.surf_offset to compute texture offset
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2177
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 08:52:07 +01:00
Sonny Jiang
6c901f0675 radeonsi: use compute shader for clear 12-byte buffer
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-09 23:25:57 -05:00
Marek Olšák
38e9eb9561 st/mesa: release the draw shader properly to fix driver crashes (iris)
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 22:41:41 -05:00
Marek Olšák
41118246c6 draw, st/mesa: generate TGSI for ffvp/ARB_vp if draw lacks LLVM
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
a3de63fbb3 st/mesa: don't generate VS TGSI if NIR is enabled
it's no longer needed

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
a90f4453fe st/mesa: remove struct st_vp_variant in favor of st_common_variant
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
6299b90fd4 st/mesa: remove st_vp_variant::num_inputs
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
bc99b22a30 st/mesa: use a separate VS variant for the draw module
instead of keeping the IR indefinitely in st_vp_variant.

This trivially fixes Selection/Feedback/RasterPos for NIR.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
17e8839a2f st/mesa: support shader images for Selection/Feedback/RasterPos
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
b7393f1115 st/mesa: support SSBOs for Selection/Feedback/RasterPos
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
e91b044bd8 st/mesa: support samplers for Selection/Feedback/RasterPos
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
2891c4b2e2 st/mesa: save currently bound vertex samplers and sampler views in st_context
for st_draw_feedback.c

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
226e7aee70 st/mesa: support UBOs for Selection/Feedback/RasterPos
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
60db75cb77 gallivm: implement LOAD with CONSTBUF but don't enable it for llvmpipe
This is already used in st_draw_feedback.c, because it uses shaders
generated for drivers.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-12-09 21:09:28 -05:00
Marek Olšák
525c8b90c7 llvmpipe: implement TEX_LZ and TXF_LZ opcodes
gallivm receives these opcodes anyway because st_draw_feedback.c uses
shaders that were assembled for drivers, not llvmpipe.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-12-09 21:09:28 -05:00
Gurchetan Singh
3c8ddc8f4b drirc: set allow_higher_compat_version for Faster Than Light
With 781a78 ("mesa: enable ARB_direct_state_access in compat for
GL3.1+), it's possible to have DSA with GL3.1+.

FTL creates a GL3.1 compat context, but fails the
_mesa_has_geometry_shaders(..) check in frame_buffer_texture.

Bump the compat version to pass the check.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-09 15:27:02 -08:00
Roland Scheidegger
23f1b78e8f util/atomic: Fix p_atomic_add for unlocked and msvc paths
Braces mismatch (flagged by CI, untested).

Fixes: 385d13f26d "util/atomic: Add a _return variant of p_atomic_add"

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-09 15:02:58 -08:00