Commit graph

118029 commits

Author SHA1 Message Date
Dave Airlie
c879efec09 gallivm: split out the flow control ir to a common file.
We can share a bunch of flow control handling between NIR and TGSI.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-11-28 14:47:54 +10:00
Marek Olšák
754c7b8939 radeonsi: enable SPIR-V and GL 4.6 for NIR
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:35 -05:00
Marek Olšák
cf240ea6a5 radeonsi/nir: support interface output types to fix SPIR-V xfb piglits
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:34 -05:00
Marek Olšák
1b45da15a9 radeonsi/nir: fix location_frac handling for TCS outputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:32 -05:00
Marek Olšák
268e42e4f8 radeonsi/nir: don't rely on data.patch for tess factors
GLCTS SPIR-V tests have this issue.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:30 -05:00
Marek Olšák
59daac686d radeonsi/nir: validate is_patch because SPIR-V doesn't set it for tess factors
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:29 -05:00
Marek Olšák
272f1369ec radeonsi: simplify get_tcs_tes_buffer_address_from_generic_indices
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:28 -05:00
Marek Olšák
1e3aab4cd0 radeonsi: simplify the interface of get_dw_address_from_generic_indices
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:26 -05:00
Marek Olšák
756fc9f1bb radeonsi/nir: implement subgroup system values for SPIR-V
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:23 -05:00
Marek Olšák
42318f9197 ac/nir: don't rely on data.patch for tess factors
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-27 19:28:10 -05:00
Kenneth Graunke
51cc380894 drirc: Set vs_position_always_invariant for Shadow of Mordor on Intel
When drawing the main character in Shadow of Mordor, the game appears
to draw Talion with one vertex shader, and the Wraith with another.
If the compiler optimizes those in different ways which lead to slight
imprecisions, then the resulting positions may not line up, leading to
Z-fighting occurring as the game decides which of the two are in front.

brw_nir_opt_peephole_ffma looks at usages of multiply adds across the
entire shader, and may make different decisions between the two, leading
to such imprecisions and Z-fighting.  This started happening recently
after a NIR change to eliminate unnecessary MOVs (7025dbe7), but that
change simply exposed the existing problem.

Improves performance on Skylake GT4e by 1.22945% +/- 0.398672% (n=3),
likely due to the fixed rendering.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1985
Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-11-27 18:48:04 +00:00
Kenneth Graunke
9b577f2a88 driconf, glsl: Add a vs_position_always_invariant option
Many applications use multi-pass rendering and require their vertex
shader position to be computed the same way each time.  Optimizations
may consider, say, fusing a multiply-add based on global usage of an
expression in a shader.  But a second shader with the same expression
may have different code, causing that optimization to make the other
choice the second time around.

The correct solution is for applications to mark their VS outputs
'invariant', indicating they need multiple shaders to compute that
output in the same manner.  However, most applications fail to do so.

So, we add a new driconf option - vs_position_always_invariant - which
forces the gl_Position output in vertex shaders to be marked invariant.

Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-11-27 18:48:04 +00:00
Eric Anholt
424d5e4e11 turnip: Disable timestamp queries for now.
They're not implemented, and not critical to bring up immediately.  Avoids
failures in the CTS when nothing gets written to the query.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 10:05:59 -08:00
Jonathan Marek
080c92e7d4 freedreno/perfcntrs/fdperf: add missing a2xx case in select_counter
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-11-27 12:11:57 -05:00
Jonathan Marek
98d7125b36 freedreno/perfcntrs/fdperf: add missing a20x compatible
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-11-27 12:11:57 -05:00
Jonathan Marek
24cde37e8d freedreno/perfcntrs/fdperf: fix u64 print on 32-bit builds
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-11-27 12:11:57 -05:00
Jonathan Marek
baab4017b9 freedreno/perfcntrs: add a2xx MH counters
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-11-27 12:11:57 -05:00
Jonathan Marek
0d0c8a9e82 freedreno/registers: add missing MH perfcounter enum for a2xx
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-11-27 12:11:57 -05:00
Michel Dänzer
a3b3d3bfcc gitlab-ci: Put HTML summary in artifacts for failed piglit jobs
This will make it easier to look at details of failed / skipped tests.

Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-27 10:20:31 +01:00
Michel Dänzer
07c1346113 gitlab-ci: Stop storing piglit test results as JUnit
Since we're not reporting test results as JUnit anymore, we can use the
default JSON format.

This affects how test results are summarized, update the reference files
accordingly.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-27 10:19:22 +01:00
Michel Dänzer
c9cdb7cef0 gitlab-ci: Stop reporting piglit test results via JUnit
It was basically useless in this form, and processing the JUnit data in
the GitLab backend was pretty expensive.

Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-27 10:18:33 +01:00
Iago Toral Quiroga
18a09e788d v3d: fix indirect BO allocation for uniforms
We were always ensuring a minimum size of 4 bytes for uniforms
for the case where we don't have any, to account for hardware pre-fetching
of the uniform stream, however, pre-fetching could also lead to to out
of bounds reads when have read the last uniform in the stream, so we
probably want to have the extra 4 bytes to prevent the kernel from
observing invalid memory accesses when the uniform stream sits right at
the end of a page.

This seems to fix MMU exceptions reported with a Linux 5.4 kernel.

Credit goes to Phil Elwell for identifying the problem and narrowing
it down to memory accesses in the uniform stream.

Reported-by: Phil Elwell <phil@raspberrypi.org>
Tested-by: Phil Elwell <phil@raspberrypi.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-27 08:43:13 +01:00
Samuel Pitoiset
a24f1c8f7f radv: enable VK_KHR_shader_subgroup_extended_types on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:42:44 +01:00
Samuel Pitoiset
0812dbd403 ac: add 8-bit and 16-bit supports to ac_build_permlane16()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:42:42 +01:00
Samuel Pitoiset
c9aa843961 radv/gfx10: fix implementation of exclusive scans
This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl

This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.

Fixes: 227c29a80d ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:39:26 +01:00
Samuel Pitoiset
86a5fbfd4a radv: fix enabling sample shading with SampleID/SamplePosition
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.

Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:22:54 +01:00
Jonathan Marek
62ff90cc5e turnip: fix integer render targets
Add missing required bits.  Fixes at least:

dEQP-VK.pipeline.render_to_image.dedicated_allocation.1d.small.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.pipeline.render_to_image.dedicated_allocation.2d.mipmap.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.renderpass.dedicated_allocation.attachment.4.401
dEQP-VK.renderpass2.suballocation.formats.r16_uint.load.draw
dEQP-VK.synchronization.op.single_queue.barrier.write_draw_read_copy_image_to_buffer.image_128x128_r16_uint

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-26 16:01:19 -08:00
Jason Ekstrand
a8965c076b anv: Push constants are relative to dynamic state on IVB
Fixes: aecde2351 "anv: Pre-compute push ranges for graphics pipelines"
Closes: #2136
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-26 22:15:54 +00:00
Dylan Baker
a24d6fbae6 meson: Add -Werror=gnu-empty-initializer to MSVC compat args
Only clang has this argument (at least as of clang 8 and gcc 9), which
errors when using the gcc empty initializer syntax in C:

```C
struct foo f = {};
```

GCC has a warning for this, but only when using -Wpedantic, which is a
lot of noise to lose useful warnings in.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-11-26 12:48:11 -08:00
Dylan Baker
25e58e3718 gallium/auxiliary: Fix uses of gnu struct = {} extension
Most of these will never actually be compiled by windows, but in the
interest of being able to make using struct foo = {}; an error and
avoiding breaking windows removing a handful of safe uses seems like a
good trade off.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-11-26 12:48:11 -08:00
Marek Olšák
ed1ff99da7 st/mesa: add st_variant base class to simplify code for shader variants
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
b8772a559a st/mesa: don't use ** in the st_nir_link_shaders signature
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
adbba2142d st/mesa: simplify looping over linked shaders when linking NIR
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
8567e06046 st/mesa: propagate gl_PatchVerticesIn from TCS to TES before linking for NIR
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
e8f0a39d45 st/mesa: don't call ProgramStringNotify in glsl_to_nir
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
5a714531f7 st/mesa: don't use redundant stp->state.ir.nir
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Marek Olšák
6cf011fcc8 st/mesa: don't serialize all streamout state if there are no SO outputs
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-26 15:14:10 -05:00
Kenneth Graunke
3fdf2bb313 iris: Disable VF cache partial address workaround on Gen11+
The vertex cache uses the full 48-bit address on Gen11+.  See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.

Interestingly, the docs don't mention index buffers as needing a
workaround at all.  So either we've been overzealous, or the docs
never got updated to record that.  Which begs the question of whether
the issue there was fixed, if there was one...

Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
2019-11-26 12:13:34 -08:00
Rob Clark
8d9f5a28e3 freedreno: switch to layout helper
The slices table and most of the other layout fields in the
freedreno_resource moves into fdl_layout.

v2: Changes by anholt to not have duplicate fields, which was introducing
    a surprising behavior change in resource layout (using the
    level_linear helper before the setup of the shadowed fields)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:08 +00:00
Eric Anholt
997b8d4749 freedreno/a6xx: Log the tiling mode in resource layout debug.
This was important for figuring out what went wrong with the layout
refactor.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
2e62a622e7 freedreno: Convert the slice struct to the new resource header.
This gets the worst of the sed required for shared resource layout out of
the way.  The texture layout comment is dropped now that we're referencing
the shared header, which has a more complete description.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
930432577f freedreno: Introduce a resource layout header.
This will be used for sharing resource layout code between freedreno and
tu.  Mostly copied from a commit by Rob, with a new location and the slice
struct renamed for consistency.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
2ec420b264 freedreno: Introduce a fd_resource_tile_mode() helper.
Multiple places were doing the same thing to get the tile mode of a level,
so refactor it out.  This will make the shared resource helper transition
cleaner.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
6b09227ede freedreno: Introduce a fd_resource_layer_stride() helper.
This factors out a bit of duplicated code, but will also make the shared
resource layout transition process clearer.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Rob Clark
9e9a26c768 freedreno: use rsc->slice accessor everywhere
This will make it easier to extract the slice table out into a layout
helper.

Acked-by: Rob Clark <robdclark@chromium.org>
2019-11-26 18:46:07 +00:00
Eric Anholt
d845dca0f5 nir: Make algebraic backtrack and reprocess after a replacement.
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:

result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);

->

temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);

nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.

Back in 2016 in 7be8d07732 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first.  This helped his cases, but I
believe introduced this failure mode.  Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things).  There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.

Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):

Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
  outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
  -65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
  -68.8066% +/- 6.49523% (was 4.8s)

v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
    tricky code.  Runtime of uniform_api.random.3 down -0.790299% +/-
    0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
    fixing infinite loops.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-26 10:13:46 -08:00
Eric Anholt
90ad6304bf nir: Refactor algebraic's block walk
My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-26 10:13:40 -08:00
Connor Abbott
305d1300f9 nir: Maintain the algebraic automaton's state as we work.
In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-26 10:13:19 -08:00
Jonathan Marek
2da4a58ed9 etnaviv: support 3d/array/integer formats in texture descriptors
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-26 19:07:04 +01:00
Jonathan Marek
7806e058c9 etnaviv: blt: fix partial ZS clears with TS
If not all bits are cleared, then BLT needs to be given the current clear
value and not the new one.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-26 19:04:51 +01:00