Commit graph

39232 commits

Author SHA1 Message Date
Marek Olšák
33a8eab7a9 radeonsi: don't use lp_build_if for the prim discard compute shader 2019-07-30 22:06:23 -04:00
Marek Olšák
5562b6b067 radeonsi: don't use lp_build_if for the wrapping if block in the VS prolog 2019-07-30 22:06:23 -04:00
Marek Olšák
0ef4c1c04d radeonsi: don't use lp_build_if for the wrapping if block in merged shaders 2019-07-30 22:06:23 -04:00
Marek Olšák
6ec7d603f5 radeonsi: don't use lp_build_if (in most common places) 2019-07-30 22:06:23 -04:00
Marek Olšák
3406a57ff3 radeonsi: don't use lp_build_alloca 2019-07-30 22:06:23 -04:00
Marek Olšák
9234275320 radeonsi/nir: implement FBFETCH for KHR_blend_equation_advanced 2019-07-30 22:06:23 -04:00
Marek Olšák
925161c84c radeonsi/nir: set input_interpolate_loc for color inputs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-07-30 22:06:23 -04:00
Marek Olšák
5787bbf90d radeonsi/nir: set tgsi_shader_info::num_memory_instructions 2019-07-30 22:06:23 -04:00
Marek Olšák
0993dbcbef radeonsi/nir: accurately set input_usage_mask for doubles (v2)
v2: fix doubles

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-30 22:06:23 -04:00
Marek Olšák
56e3c70b56 radeonsi/nir: accurately set output_usagemask (v2)
v2: fix doubles
2019-07-30 22:06:23 -04:00
Marek Olšák
37527f8a11 radeonsi/nir: accurately set reads_*_outputs for TCS 2019-07-30 22:06:23 -04:00
Marek Olšák
6697e42c3c radeonsi/nir: clean up gather_intrinsic_load_deref_input_info 2019-07-30 22:06:23 -04:00
Marek Olšák
5f16fdefdf radeonsi/nir: add an option to convert TGSI to NIR
Use at your own risk.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-30 22:06:23 -04:00
Marek Olšák
eb43559bb8 radeonsi/nir: clean up some nir_scan_shader code
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-30 22:06:23 -04:00
Marek Olšák
34dc6ed2a5 radeonsi/gfx10: disable DCC image stores
Uncompressed image stores are usually faster.

Also, the driver didn't set WRITE_COMPRESS_ENABLE, so I don't know
what the hw did for image stores.
2019-07-30 22:06:23 -04:00
Marek Olšák
17021efc74 radeonsi: adjust RB+ blend optimization settings
based on PAL
2019-07-30 22:06:23 -04:00
Connor Abbott
11a49f289d lima/gp: Support exp2 and log2
log2 is tricky because there cannot be a move between complex1 and
postlog2. We can't guarantee that scheduling complex1 will succeed when
we schedule postlog2, so we try to schedule complex1 and if it fails we
back out by rewriting the postlog2 as a move and introducing a new
postlog2 so that we can try again later.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-30 23:01:15 +02:00
Connor Abbott
c2f48d8f32 lima/gpir: Always schedule complex2 and *_impl right after complex1
See https://gitlab.freedesktop.org/lima/mesa/issues/94 for the gory
details of why this is needed. For *_impl this is easy, since it never
increases register pressure and it goes in the complex slot hence it
never counts against max nodes. It's a bit more challenging for
complex2, since it does count against max nodes, so we need to change
the reservation logic to reserve an extra slot for complex2 when
scheduling complex1. This second part isn't strictly necessary yet, but
it will be for exp2.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-30 23:00:41 +02:00
Matt Turner
c9b86cf526 meson: Test for program_invocation_name
program_invocation_name and program_invocation_short_name are both GNU
extensions. I don't believe one can exist without the other, so only
check for program_invocation_name.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-30 11:49:09 -07:00
Matt Turner
9cc4311d86 st/nine: Drop preprocessor guards for glibc-2.12
Same rationale as the previous patch, but additionally these checks just
seem entirely unnecessary. pthread_self() has been used in Mesa since at
least 1999.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-30 11:49:09 -07:00
Sagar Ghuge
587a497529 iris: Enable EXT_texture_shadow_lod
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Sagar Ghuge
adb9e18348 gallium: Add PIPE_CAP_TEXTURE_SHADOW_LOD
v2: Line wrap to 80 char (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Jan Zielinski
4d2890e8f7 swr/rasterizer: Add memory tracking support
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-07-30 15:58:36 +02:00
Jan Zielinski
5dd9ad1570 swr/rasterizer: Better implementation of scatter
Added support for avx512 scatter instruction. Non-avx512 will
now call into a C function to do the scatter emulation.

This has better jit compile performance than
the previous approach of jitting scalar loops.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-07-30 13:39:19 +00:00
Jan Zielinski
ad9aff5528 swr/rasterizer: cleanups for tessellation
This commit introduces small fixes in preparation for tessellation
support.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-07-30 13:39:18 +00:00
Jan Zielinski
c5c05979f7 rasterizer/swr: move BucketMgr to SwrContext
This move gets us back to parity  with global manager
in that we can dump render context buckets now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-07-30 13:39:18 +00:00
Alejandro Piñeiro
cda4c62893 v3d: take into account separate_stencil when checking if stencil should be cleared
In most cases this is not needed because the usual is that when a
separate stencil is written, the parent resource is also written.

This is needed if we have a separate stencil, no depth buffer, and the
source and destination is the same, as in that case the stencil can be
updated, but not the parent source (like if you are blitting only the
stencil buffer). On that situation, the following access to the
stencil buffer would clear the stencil buffer (so overwritting the
previous blitting) cleared because the parent source has
v3d_resource.writes to 0.

As far as I see, that situation only happens with the
GL_DEPTH32F_STENCIL8 format.

Note that one alternative would consider that if the separate_stencil
has been written, the parent should also be considered written (and
update its "writes" field accordingly). But I found this patch more
natural.

Fixes the following piglit tests:
   spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-blit
   spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-copypixels

the latter regressed when internally glCopyPixels implementation
started to use blitting. So:

Fixes: 131d40cfc9 ("st/mesa: accelerate glCopyPixels(STENCIL)")

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-30 12:05:23 +02:00
Kenneth Graunke
44e713eddb iris: Fix SO offset to be 32-bit in DrawTransformFeedback handling
We accidentally started copying a full 64-bit value rather than copying
a 32-bit offset and zeroing the top 32-bits.  This caused us to compute
bogus vertex counts which could lead to GPU hangs in some cases.

Thanks to Clayton Craft for catching the regressions!

Fixes: 0e24d10ff5 ("iris: Use gen_mi_builder to handle CS ALU operations.")
2019-07-29 16:38:19 -07:00
Jason Ekstrand
4bb6e6817e intel: Use a system value for gl_FragCoord
It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input.  It also makes zero sense because we have to
special-case it in the back-end.

Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 23:30:26 +00:00
Rob Clark
010d255656 freedreno/a6xx: fix MSAA resolve hangs
Seems like RB_BLIT_SCISSOR needs to be aligned to (minimum?) tile size.

Fixes intermittent GPU hangs triggered by some of the three.js samples
on https://threejs.org/

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-07-29 15:15:31 -07:00
Leo Liu
8d7f2e2221 radeon/vcn/vp9: add Arcturus VP9 support
Arcturus CHIP enum is less than Navi10, since it's still gfx9,
but its VCN version belongs to VCN2.x

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:58 -04:00
Leo Liu
a439863918 radeon/vcn: add Arcturus decode support
different internal registers offset from previous HW

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:56 -04:00
Marek Olšák
417ab8ef6b radeonsi: add AMD_DEBUG=nogfx for testing
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:53 -04:00
Marek Olšák
19d04191c4 radeonsi: add support for compute-only chips
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:51 -04:00
Sonny Jiang
c82f338855 gallium/auxiliary/vl: add compute shaders for deint yuv
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:49 -04:00
Sonny Jiang
ef77a92bca gallium/auxiliary/vl: don't call gfx functions on compute-only chips
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:46 -04:00
James Zhu
b618b65c98 gallium/auxiliary/vl: add PIPE_CAP_GRAPHICS check for vl compositor
Init graphic shader Only when PIPE_CAP_GRAPHICS is true.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:42 -04:00
Marek Olšák
187cc07d05 gallium: create multimedia contexts as compute-only if graphics is unsupported
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:41 -04:00
Marek Olšák
ea7646dc13 gallium: add PIPE_CAP_GRAPHICS
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:39 -04:00
Eric Anholt
65aeeae670 freedreno: Fix helgrind complaint on shader-db key setup.
If the variable's going to be static, we shouldn't be memsetting it
from every thread and instead just have it in the data section.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:50:49 -07:00
Gert Wollny
4ee638cd78 softpipe: Don't draw when rasterizer_discard is set
Fixes:
  dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.basic.write_stencil_points
  dEQP-GLES3.functional.rasterizer_discard.fbo.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.fbo.write_stencil_points
  dEQP-GLES3.functional.rasterizer_discard.scissor.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.scissor.write_stencil_points

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-29 15:47:34 +02:00
Gert Wollny
45ac0dfad4 softpipe: Fix cube arrays layer selection
To select the correct layer the z-coordinate must be rounded before it
is multiplied by six.

Fixes a number of tests out of
   dEQP-GLES31.functional.texture.filtering.cube_array.formats.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-29 15:47:34 +02:00
Connor Abbott
6fc7384fd4 lima/gpir/sched: Handle more special ops in can_use_complex()
We were missing handling for a few other ops that rearrange their
sources somehow in codegen, namely complex2 and select.

This should fix spec@glsl-1.10@execution@built-in-functions@vs-asin-vec3
and possibly other random regressions from the new scheduler which were
supposed to be fixed in the commit right after.

Fixes: 54434fe670 ("lima/gpir: Rework the scheduler")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-28 23:38:31 +02:00
Connor Abbott
af95f80a24 lima/gp: Clean up lima_program_optimize_vs_nir() a little
Remove an unnecessary nir_lower_regs_to_ssa as that should be done by
the state tracker, and add a missing DCE pass after running copy
propagation in order to remove the dead copies. This shouldn't fix
anything but the second part will reduce shader sizes.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-28 23:38:31 +02:00
Connor Abbott
d26d8c5617 lima/gpir/sched: Don't try to spill when something else has succeeded
In try_node(), we assume that the node we pick can still be scheduled
successfully after speculatively trying all the other nodes. Normally we
always undo every node after speculating it, so that when we finally
schedule best_node the scheduler state is exactly the same and it
succeeds. However, we also try to spill nodes, which can change the
state and in a corner case that can make scheduling best_node fail. In
particular, the following sequence of events happened with piglit
shaders@glsl-vs-if-nested: a partially-ready node N was spilled and a
register store node S, which is a use of N, was created and then later
the other uses of N were scheduled, so that S is now ready and N is
partially ready. First we try to schedule S and succeed, then we try to
schedule another node M, which fails, so we try to spill the remaining
uses of N. This succeeds, but scheduling M still fails so that best_node
is still S. However since one of the uses of N is one cycle ago, and
therefore we inserted a read dependent on S one cycle ago when spilling
N, S can no longer be scheduled as read-after-write latency is three
cycles.

While we could ad-hoc try to catch cases like this, or (the best option
but very complicated) treat the spill as speculative and roll it back if
we decide not to schedule the node, a simpler solution is to just
give up on spilling if we've already successfully speculatively
scheduled another node. We'd give up a few cases where we discover that
by spilling even harder we could schedule a more desirable node, but
that seems like it would be pretty rare in practice. With this we
guarantee that nothing has been touched after best_node was successfully
scheduled. We also cut down on pointless spilling, since if we already
scheduled a node it's unlikely that spilling harder will let us schedule
an even better node, and hence any spilling at this point is probably
useless.

While we're here, clean up the code around spilling by flattening the
two if's and getting rid of the second unnecessary check for INT_MIN.

Fixes: 54434fe670 ("lima/gpir: Rework the scheduler")
Acked-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2019-07-28 23:38:31 +02:00
Ilia Mirkin
de17922b8a nv50/ir: don't consider the main compute function as taking arguments
With OpenCL, kernels can take arguments and return values (?). However
in practice, there is no more TGSI compute implementation, and even if
there were, it would probably have named functions and no explicit main.

This improves RA considerably for compute shaders, since temps are not
kept around as return values.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-07-27 18:24:11 -04:00
Ilia Mirkin
3e468ff2fe nv50/ir: handle insn not being there for definition of CVT arg
This can happen if it's e.g. a uniform or a function argument.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2019-07-27 18:24:11 -04:00
Ilia Mirkin
23dfff0669 nouveau: flip DEBUG -> !NDEBUG
The meson conversion chose to change the meaning of DEBUG to "used for
debugging" to be "used for expensive things for debugging", primarily
for nir_validate. Flip things over so that we get nice things with
optimizations enabled.

While we're at it, also kill off nouveau_statebuf.h which is unused (and
has a mention of DEBUG which is how I found it).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-07-27 18:24:11 -04:00
Ilia Mirkin
9f8ed5aa67 nvc0: allow a non-user buffer to be bound at position 0
Previously the code only handled it for positions 1 and up (as would be
for UBO's in GL). It's not a lot of trouble to handle this, and vl or
vdpau want this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2019-07-27 18:24:11 -04:00
Ilia Mirkin
c52b057e00 nv50,nvc0: update sampler/view bind functions to accept NULL array
Apparently vl (or vdpau) wants to pass that in now. Handle it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2019-07-27 18:24:11 -04:00