Commit graph

39979 commits

Author SHA1 Message Date
Boyuan Zhang
26099bc35d radeon/vcn: adding engine type for new fw interface
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:33 -04:00
Marek Olšák
936e9fa951 radeonsi: use the correct buffer size in si_vid_clear_buffer
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Jeremy Newton
666ea30017 pipe-loader: use radeonsi for MM if amdgpu dri is used
The amdgpu dri is used for the closed source AMD driver. Since this driver
does not implement multimedia, we fall back to radeonsi in mesa to do
multimedia. This corrects the dri driver name for when it is set to amdgpu.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-19 19:59:02 -04:00
Eric Engestrom
085c3abf27 util: use standard name for vsnprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
dffeaa55dd util: use standard name for snprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
00e23cd969 util: use standard name for vasprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
59c2dd1b8c util: use standard name for sprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
88ddb2e186 util: use standard name for strncmp()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
27b9eea557 util: use standard name for strncat()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
3ba199abd1 util: use standard name for strdup()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
09a8a39940 util: use standard name for strchrnul()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Erico Nunes
32ced14bad lima/ppir: handle all node types in ppir_node_replace_child
ppir_node_replace_child is used by the const lowering routine in ppir.
All types need to be handled here, otherwise the src node is not updated
properly when one of the lowered nodes is a const, which results in, for
example, regalloc not assigning registers correctly.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-07-19 16:01:45 +00:00
Erico Nunes
2292f0c4b5 lima/ppir: branch regalloc fixes
The branch instruction has sources which must be handled in src handling
paths so that regalloc assigns registers to them properly.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-07-19 16:01:45 +00:00
Timothy Arceri
80c2c17e1e iris: change last_vue_stage() to look at uncompiled shaders
This allows us to find the last vue stage before we have compiled
the shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-19 09:25:47 +10:00
Alyssa Rosenzweig
0395b58c92 panfrost: Set rt_count
This doesn't quite work yet, but it illustrates how MRT is implemented
in the MFBD: rt_count is set appropriately based on the number of render
targets, while additional render target descriptors are appended on with
an index variable in them (not quite decoded since there's some aspects
we don't understand there, but conceptually this should be right).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
871ad7789f panfrost: Trace invisible BOs
Helps make the decode a little more readable (names instead of
addresses).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
e797caa0dd panfrost: Zero polygon list body size for clears
There's no polygons, so you can't have any size to the polygon list,
although there is a minimal header.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
f475b79980 panfrost/mfbd: Unify depth-only with masked FBO path
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
629c7366a7 panfrost: Simplify set_framebuffer_state
Most of the ad hoc logic is already in Gallium.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
227c395c00 panfrost: Check for NULL surface in places
Fixes a bunch of NULL dereferences, although it does cause GPU faults of
course.

This is caused by color buffers masked out in MRT, which we'll
eventually have to solve the right way... one thing at a time.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
79b13b4376 panfrost: Expose 4 render targets
Hidden behind deqp flag as usual.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
d56f92502e panfrost: Shrink tiler heap
128MB is excessive and 16MB is still plenty. Saves 112MB/context on
kernels without growable/heap support.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:16 -07:00
Eric Anholt
c0640035fb vc4: Convert vc4_nir_lower_txf_ms to nir_shader_lower_instructions().
Cuts out a bunch of boilerplate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Alyssa Rosenzweig
1bced0fad2 panfrost: Handle Z24 textures
Just use the Z32 code.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
f29c084960 panfrost/ci: Update expectations
We just fixed some stencil tests.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
fad76470d5 panfrost: Make scissor test more robust
See v3d implementation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
5c554e235d panfrost: Use correct NO_DITHER field on MFBD
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
676b9339dd panfrost: Implement Z32F(_S8) support
Z32F uses a dediacted float path. Z32F_S8 uses separate stencil planes
in the hardware, lowered via u_transfer_helper.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
65d89097b8 panfrost: Copy stencil front to back if back disabled
When backside stenciling is disabled, backfacing primitives just do the
same thing as frontfacing primitives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Jan Zielinski
6f7306c029 swr/rast: Refactor memory API between rasterizer core and swr
This commit cleans up API between the core of the rasterizer and swr.
Some formatting changes are also done.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-18 16:17:00 +02:00
Andreas Baierl
4627a0c4eb lima/ppir: Add gl_PointCoord handling
Treat gl_PointCoord as a system value and
add the necessary bits for correct codegen.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
3523233027 gallium: Add PIPE_CAP_TGSI_FS_POINT_IS_SYSVAL
This adds an option to treat gl_PointCoord as a system value.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
3349a60f6f nir/tgsi: Extend tgsi_to_nir.c to support gl_PointCoord as a system value.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Connor Abbott
b178fdf486 lima/gp: Fix problem with complex moves
When writing the scheduler, we forgot that you can't read the complex
unit in certain sources because it gets overwritten to 0 or 1. Fixing
this turned out to be possible without giving up and reducing
GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't
expect. There can be at most 4 next-max nodes that can't have moves
scheduled in the complex slot, so it actually isn't a problem for
getting the number of next-max nodes at 5 or lower. However, it is a
problem for stores. If a given node is a next-max node whose move cannot
go in the complex slot *and* is used by a store that we decide to
schedule, we have to reserve one of the non-complex slots for a move
instead of all the slots, or we can wind up in a situation where only
the complex slot is free and we fail the move. This means that we have
to add another term to the reservation logic, for stores whose children
cannot be in the complex slot.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
54434fe670 lima/gpir: Rework the scheduler
Now, we do scheduling at the same time as value register allocation. The
ready list now acts similarly to the array of registers in
value_regalloc, keeping us from running out of slots. Before this, the
value register allocator wasn't aware of the scheduling constraints of
the actual machine, which meant that it sometimes chose the wrong false
dependencies to insert. Now, we assign value registers at the same time
as we actually schedule instructions, making its choices reflect reality
much better. It was also conservative in some cases where the new scheme
doesn't have to be. For example, in something like:

1 = ld_att
2 = ld_uni
3 = add 1, 2

It's possible that one of 1 and 2 can't be scheduled in the same
instruction as 3, meaning that a move needs to be inserted, so the value
register allocator needs to assume that this sequence requires two
registers. But when actually scheduling, we could discover that 1, 2,
and 3 can all be scheduled together, so that they only require one
register. The new scheduler speculatively inserts the instruction under
consideration, as well as all of its child load instructions, and then
counts the number of live value registers after all is said and done.
This lets us be more aggressive with scheduling when we're close to the
limit.

With the new scheduler, the kmscube vertex shader is now scheduled in 40
instructions, versus 66 before.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
12645e8714 lima/gp: Mark more add-only nodes as maybe-two-slot
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
16de3dd7a6 lima/gpir: Fix some bugs in instruction handling
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
cc78a42577 lima: Reintroduce the standalone compiler
I used this to test things without needing to have a device handy.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Gert Wollny
45951452aa softpipe: Clamp border colors when needed
unorm and snorm require that the border color values are clamped, so when
picking the sampler view copy/clamp the border color from the sampler and
use these adjusted values.

Fixes:

  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_compressed_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_snorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_srgb_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_unorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_compressed_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_snorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_srgb_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth_uint_stencil_sample_depth

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:49:00 +02:00
Gert Wollny
230b99ce2f softpipe: set a lower minimum clamp value for texture coordinate border clamp
The value of -0.5f is not small enough to produce negative coordinates,
so lower the minimum clamp value to -1.0f. This fixes a number of tests
from
   dEQP-GLES31.functional.texture.border_clamp.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:47:23 +02:00
Gert Wollny
eae4c6df8d softpipe: Correct repeat-mirror evaluation
when mirroring the texture corrdinates the indices must be mirrored as
well and the half pixel shift must be applied in reverse.

Fixes a number of tests from:
  dEQP-GLES31.functional.texture.gather.offset.*
  dEQP-GLES31.functional.texture.gather.offsets.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:47:23 +02:00
Gert Wollny
fff624fca4 softpipe: Also mark textures as dirty when updating the framebuffer state
At this point all the draw caches are flushed to the old attached textures,
so the read caches of these textures will need to be updated too.

Fixes:
   dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:33:59 +02:00
Jonathan Marek
08514a9721 etnaviv: set DITHER_MODE
This fixes a rendering glitch observed in SDL testscale test, where alpha
blending samples with value (1.0, 1.0, 1.0, 0.0) whitens the target instead
of having no effect.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
aaf0c47c76 etnaviv: update headers from rnndb
Update to etna_viv commit a16a418.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
76adf041f2 etnaviv: fix blend color on newer GPUs
Newer GPUs use the half float ALPHA_COLOR_EXT register.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
5f73726013 etnaviv: fix alpha blending cases
We need to check rgb_func/alpha_func when determining if blend or separate
alpha is required.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:35 -04:00
Jonathan Marek
6c3c05dc38 etnaviv: fix polygon offset
Dividing the fui result by 65535 is obviously wrong, and from testing, on
GC7000L at least there is no division by 65535.

Fixes dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:07 -04:00
Eric Anholt
9689407c54 freedreno/a6xx: Drop the WFI in the program update stateobj.
Rob Clark thinks this was likely a workaround for our const buffer
update bugs, and now that it's passing tests, we should be able to
drop it.

renderdoc-traces results:

traces/android/clashofclans.rdc:  +6.1% +/-   1.1%
traces/android/candycrush.rdc:    +5.2% +/-   1.6%

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-17 16:20:12 -07:00
Eric Anholt
2170822603 freedreno/a6xx: Drop the WFI in constant uploads.
Now that the bin vs render constlen is fixed, we can skip these waits.

Improves webgl aquarium performance at 10k fish from 27fps to 33.
Some highlights from renderdoc-traces:

traces/android/minecraft.rdc:             +17.1% +/-   3.4%
traces/glmark2/ideas-speed=duration.rdc:  +11.6% +/-   2.4%
traces/android/candycrush.rdc:            +5.4%  +/-   1.1%
traces/android/clashofclans.rdc:          +4.4%  +/-   1.3%

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-17 16:20:12 -07:00
Eric Anholt
85bbdaff6c freedreno: Assert that we don't exceed constlen.
We actually could go up to vs->constlen in the binning shader on a6xx,
but for sanity let's make sure that we're always under constlen.  This
would have caught the bug fixed in 572c76fd88 ("freedreno: Clamp UBO
uploads to the constlen decided by the shader.")

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-17 16:20:12 -07:00