Commit graph

118001 commits

Author SHA1 Message Date
Dylan Baker
a24d6fbae6 meson: Add -Werror=gnu-empty-initializer to MSVC compat args
Only clang has this argument (at least as of clang 8 and gcc 9), which
errors when using the gcc empty initializer syntax in C:

```C
struct foo f = {};
```

GCC has a warning for this, but only when using -Wpedantic, which is a
lot of noise to lose useful warnings in.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-11-26 12:48:11 -08:00
Dylan Baker
25e58e3718 gallium/auxiliary: Fix uses of gnu struct = {} extension
Most of these will never actually be compiled by windows, but in the
interest of being able to make using struct foo = {}; an error and
avoiding breaking windows removing a handful of safe uses seems like a
good trade off.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-11-26 12:48:11 -08:00
Marek Olšák
ed1ff99da7 st/mesa: add st_variant base class to simplify code for shader variants
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
b8772a559a st/mesa: don't use ** in the st_nir_link_shaders signature
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
adbba2142d st/mesa: simplify looping over linked shaders when linking NIR
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
8567e06046 st/mesa: propagate gl_PatchVerticesIn from TCS to TES before linking for NIR
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
e8f0a39d45 st/mesa: don't call ProgramStringNotify in glsl_to_nir
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
5a714531f7 st/mesa: don't use redundant stp->state.ir.nir
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
6cf011fcc8 st/mesa: don't serialize all streamout state if there are no SO outputs
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Kenneth Graunke
3fdf2bb313 iris: Disable VF cache partial address workaround on Gen11+
The vertex cache uses the full 48-bit address on Gen11+.  See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.

Interestingly, the docs don't mention index buffers as needing a
workaround at all.  So either we've been overzealous, or the docs
never got updated to record that.  Which begs the question of whether
the issue there was fixed, if there was one...

Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
2019-11-26 12:13:34 -08:00
Rob Clark
8d9f5a28e3 freedreno: switch to layout helper
The slices table and most of the other layout fields in the
freedreno_resource moves into fdl_layout.

v2: Changes by anholt to not have duplicate fields, which was introducing
    a surprising behavior change in resource layout (using the
    level_linear helper before the setup of the shadowed fields)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:08 +00:00
Eric Anholt
997b8d4749 freedreno/a6xx: Log the tiling mode in resource layout debug.
This was important for figuring out what went wrong with the layout
refactor.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
2e62a622e7 freedreno: Convert the slice struct to the new resource header.
This gets the worst of the sed required for shared resource layout out of
the way.  The texture layout comment is dropped now that we're referencing
the shared header, which has a more complete description.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
930432577f freedreno: Introduce a resource layout header.
This will be used for sharing resource layout code between freedreno and
tu.  Mostly copied from a commit by Rob, with a new location and the slice
struct renamed for consistency.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
2ec420b264 freedreno: Introduce a fd_resource_tile_mode() helper.
Multiple places were doing the same thing to get the tile mode of a level,
so refactor it out.  This will make the shared resource helper transition
cleaner.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
6b09227ede freedreno: Introduce a fd_resource_layer_stride() helper.
This factors out a bit of duplicated code, but will also make the shared
resource layout transition process clearer.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Rob Clark
9e9a26c768 freedreno: use rsc->slice accessor everywhere
This will make it easier to extract the slice table out into a layout
helper.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
d845dca0f5 nir: Make algebraic backtrack and reprocess after a replacement.
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:

result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);

->

temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);

nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.

Back in 2016 in 7be8d07732 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first.  This helped his cases, but I
believe introduced this failure mode.  Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things).  There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.

Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):

Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
  outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
  -65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
  -68.8066% +/- 6.49523% (was 4.8s)

v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
    tricky code.  Runtime of uniform_api.random.3 down -0.790299% +/-
    0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
    fixing infinite loops.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-26 10:13:46 -08:00
Eric Anholt
90ad6304bf nir: Refactor algebraic's block walk
My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-26 10:13:40 -08:00
Connor Abbott
305d1300f9 nir: Maintain the algebraic automaton's state as we work.
In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-26 10:13:19 -08:00
Jonathan Marek
2da4a58ed9 etnaviv: support 3d/array/integer formats in texture descriptors
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-26 19:07:04 +01:00
Jonathan Marek
7806e058c9 etnaviv: blt: fix partial ZS clears with TS
If not all bits are cleared, then BLT needs to be given the current clear
value and not the new one.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-26 19:04:51 +01:00
Daniel Schürmann
7cd548d352 aco: don't value-number instructions from within a loop with ones after the loop.
Fixes:
Wolfenstein:Youngblood (w/o shader_ballot)
dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-26 14:39:27 +00:00
Rhys Perry
46420dd294 aco: set dlc/glc correctly for image loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-11-26 14:39:27 +00:00
Rhys Perry
37843e454e aco: allow constant offsets for global/scratch instructions on GFX10
I don't think the bug applies for global/scratch instructions and
load_barycentric_at_sample selection expects this feature to work.

Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-11-26 14:39:27 +00:00
Bas Nieuwenhuizen
02375b8436 radv: Enable VK_KHR_buffer_device_address.
Still no capture/replay or multi device support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-26 11:59:52 +00:00
Samuel Pitoiset
34dd4251e2 radv: fix reporting subgroup size with VK_KHR_pipeline_executable_properties
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-26 10:48:48 +01:00
Bas Nieuwenhuizen
25bc9102d8 radv: Allocate cmdbuffer space for buffer marker write.
Fixes: 946193ae00 "radv: add support for VK_AMD_buffer_marker"
Reviewed-by:  Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-26 09:35:02 +00:00
Gert Wollny
e41958e344 r600: Disable eight bit three channel formats
Commit 0899bf55 made some deqp-gles3 tests related to RGB8 PBOs fail
on R600 because it exposed PIPE_FORMAT_R8G8B8_UNORM and R600 doesn't
propely handle this. Disabling this format also for buffers fixes the
issue.

In addition, disabling also the related RGB8 integer formats for buffers
fixes some deqp-gles3 tests:

  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb8ui_cube
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_2d
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_cube
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_2d
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_cube
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_2d_array
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_3d
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_2d_array
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_3d
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_2d_array
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_3d
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_2d_array
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_3d

Fixes: 0899bf55
  st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM

Closes #2118

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-26 09:28:52 +01:00
Samuel Pitoiset
f6770b9726 ac/llvm: fix warning in ac_build_canonicalize()
../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 4567 |  return ac_build_intrinsic(ctx, intr, type, params, 1,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 4568 |       AC_FUNC_ATTR_READNONE);
      |       ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-26 08:35:10 +01:00
Tapani Pälli
5d58fea660 mapi: add GetInteger64vEXT with EXT_disjoint_timer_query
From EXT_disjoint_timer_query spec:

   "Interaction: This extension adds GetInteger64vEXT if
    OpenGL ES 3.0 is not supported"

See https://github.com/KhronosGroup/OpenGL-Registry/issues/326.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2090
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-11-26 07:41:24 +02:00
Jason Ekstrand
200a3301e2 vulkan: Update the XML and headers to 1.1.129
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-26 02:48:42 +00:00
Jason Ekstrand
854859fefa anv/entrypoints: Better handle promoted extensions
In the case of promoted extensions we can end up with an entrypoint that
we support being an alias of an entrypoint we do not support.  For
instance, if an extension gets promoted from EXT to KHR, the EXT entry-
points may be aliases of the KHR ones.  We want to leave everything as
EXT until we get around to advertising the KHR so that we don't break
things when we update the XML and headers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-26 02:48:42 +00:00
Jason Ekstrand
121551bfdb vulkan/enum_to_str: Handle out-of-order aliases
The current code can only handle enum aliases if the original enum is
declared first followed by the alias as we walk the XML in a linear
fashion.  This commit allows us to handle aliases where the alias
declaration comes before the thing it's aliasing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-26 02:48:42 +00:00
Kenneth Graunke
f6aa51103b iris: Update SURFACE_STATE addresses when setting sampler views
We may have replaced the backing storage for a texture buffer while it
was unbound, at which point iris_rebind_buffer would not have caught it
and updated it.  We need to ensure that the current resource's address
matches the one our SURFACE_STATE points at.  If not, update addresses
and re-upload the SURFACE_STATE.

Shader images and buffers do not suffer from this problem because we
re-stream the surface state on every set call, since there isn't a
created CSO object for those with a saved SURFACE_STATE.  Constant
buffers are also currently re-streamed (we pitch the SURFACE_STATE
on every set_constant_buffer call).  Surfaces would need this
treatment (as they're created CSOs) except that we never swap out
their backing storage today (we only do it for buffers), so it's OK
for now.

Fixes misrendering in Unreal 4 demos (Elemental, Matinee Fight Scene).
Huge thanks to Andrii Simiklit for tracking down the problem - it was
quite difficult to find!  Also fixes Andrii's new Piglit test for the
bug, 'arb_texture_buffer_object-re-init'.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1365
2019-11-25 15:54:54 -08:00
Kenneth Graunke
060a2c52fa iris: Maintain CPU-side SURFACE_STATE copies for views and surfaces.
When replacing the backing storage for texture buffers, image buffers,
and so on, we may need to update the "Surface Base Address" field in
any corresponding SURFACE_STATE.  This is easier to accomplish if we
have a copy on the CPU - we can just compare the current field, update
it, and re-upload.

This patch adds a CPU-side copy to the new iris_surface_state wrapper
struct, and reworks allocation and upload to fill things out on the
CPU copy first, then upload that to the GPU when finished.

This will be necessary to fix iris_invalidate_resource bugs shortly.

Technically, we never replace the backing storage for pipe_surfaces
(render targets), so we don't need to make this change there.  However,
it's nice to have surfaces, sampler views, and image views handled
similarly.  Plus, if we ever wanted to swap out backing storage for
busy textures, we'd need this infrastructure.

v2: Properly free memory (caught by Andrii Simiklit)
2019-11-25 15:54:54 -08:00
Kenneth Graunke
2b09e818dc iris: Create an "iris_surface_state" wrapper struct
Today, we only have a state reference to the GPU buffer containing our
uploaded SURFACE_STATEs.  However, we're going to want a CPU-side copy
soon.  Making a wrapper struct means we can talk about both together,
and also put both in the field called "surface_state".
2019-11-25 15:54:54 -08:00
Kenneth Graunke
4c1f81ad62 iris: Drop 'old_address' parameter from iris_rebind_buffer
We can just compare the VERTEX_BUFFER_STATE address field to the
current BO's address.  When calling rebind, we've already updated
the resource to the new buffer, but the state will have the old
address.
2019-11-25 15:54:54 -08:00
Kenneth Graunke
518be59c1a iris: Stop mutating the resource in get_rt_read_isl_surf().
Mutating fields of global resources is generally not safe, and the only
reason we were doing it was to avoid passing an extra parameter to
the fill_surface_state helper.
2019-11-25 15:54:54 -08:00
Marek Olšák
b02e0d2604 radeonsi/nir: don't run si_nir_opts again if there is no change
0.3% less overhead

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-25 16:48:27 -05:00
Marek Olšák
4675cb2019 radeonsi: initialize the per-context compiler on demand
This takes a noticable amount of time in piglit and some tests don't
need it.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-25 16:48:27 -05:00
Marek Olšák
f671cc4d95 ac: set swizzled bit in cache policy as a hint not to merge loads/stores
LLVM now merges loads and stores for all opcodes, so this must be set.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 16:48:27 -05:00
Eric Anholt
8afab607ac nir: Add a scheduler pass to reduce maximum register pressure.
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable.  A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.

Results for v3d:

total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)

v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
    patch, include SSA defs in live values so we can switch to bottom-up
    if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
    successfully register allocate on GLES3.1 dEQP.  Make sure that
    discards don't move after store_output.  Comment spelling fix.
2019-11-25 21:12:21 +00:00
Jonathan Marek
5159db60fc etnaviv: implement 64bpp clear
At the same time, update etna_clear_blit_pack_rgba to work with integer
formats.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-25 20:23:22 +01:00
Jonathan Marek
2214f99c07 etnaviv: avoid using RS for 64bpp formats
At the same time, this change allows using BLT for 8bpp formats

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-25 20:22:43 +01:00
Christian Gmeiner
92d5e3c692 etnaviv: add support for extended pe formats
Use the extended format if an such a format was passed.

v1 -> v2:
 - set FORMAT_MASK bit when using ext PE format as suggested
   by Wladimir J. van der Laan

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-11-25 20:12:52 +01:00
Christian Gmeiner
396818fd9d etnaviv: handle 8 byte block in tiling
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-11-25 20:11:30 +01:00
Samuel Pitoiset
2af39c719e radv: select the depth decompress path based on the aspect mask
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 16:29:23 +01:00
Samuel Pitoiset
905c005561 radv: create decompress pipelines for separate depth/stencil layouts
No functional changes as the driver still uses the depth+stencil
pipeline.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 16:29:21 +01:00
Samuel Pitoiset
faa58201f3 radv: rework creation of decompress/resummarize meta pipelines
This refactoring will help for creating more decompress pipelines.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-25 16:29:18 +01:00