Commit graph

105005 commits

Author SHA1 Message Date
Alyssa Rosenzweig
d56f92502e panfrost: Shrink tiler heap
128MB is excessive and 16MB is still plenty. Saves 112MB/context on
kernels without growable/heap support.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:16 -07:00
Caio Marcelo de Oliveira Filho
b6d4753568 nir/large_constants: De-duplicate constants
If a function has a constant and is called more than once, after
inlining we may end up with different variables representing the same
constant.  This commit look into the data and de-duplicate them.

The first pass now will collect the constant data in a per variable
buffer, then de-duplication happens (by sorting then linear walk), and
the second pass will use the data in var->data.location.

One side-effect of the current implementation is that constants will
be reordered.  If this turns out to be a problem is something that can
be fixed.

An alternative strategy considered was to perform this in a
per-function basis and then merge the results, the problem is that we
would have to fix up the offsets during the merge.  Given the data we
have, the current patch is good enough.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 12:24:24 -07:00
Caio Marcelo de Oliveira Filho
d9b67ad079 nir/large_constants: Use ralloc for var_infos
This will be used later on to allocate constant data for each
variable (and then deduplicate).  Also drop initializing found_read,
as it is already implicitly false in the literal.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 12:24:24 -07:00
Eric Anholt
0d8a4c67cf freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper.
Cuts a bunch of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
56f4ede73d freedreno: Convert load_barycentric_at_sample to the NIR lowering helper.
Cuts out a ton of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
61098baf42 freedreno: Convert load_barycentric_at_offset to the NIR lowering helper.
Cuts out a ton of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
cdc359c58e v3d: Use nir_shader_lower_instructions() for txf_ms lowering.
Cuts out a bunch of boilerplate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
251c64a53d nir: Allow internal changes to the instr in nir_shader_lower_instructions().
v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in
NIR, but doesn't generate a new txf_ms instructions as replacement.  It's
pretty easy to allow that in nir_shader_lower_instructions, and it may be
common in lowering passes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 11:28:56 -07:00
Eric Anholt
c0640035fb vc4: Convert vc4_nir_lower_txf_ms to nir_shader_lower_instructions().
Cuts out a bunch of boilerplate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
40e7609603 v3d: Fix assertion failures in debug builds.
nir_lower_io leaves around deref_var instructions after lowering away
deref intrinsics.  This ends up breaking validation after v3d_nir_lower_io
removes variables not actually being stored by the shader's
store_output()s.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Alyssa Rosenzweig
1bced0fad2 panfrost: Handle Z24 textures
Just use the Z32 code.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
f29c084960 panfrost/ci: Update expectations
We just fixed some stencil tests.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
fad76470d5 panfrost: Make scissor test more robust
See v3d implementation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
5c554e235d panfrost: Use correct NO_DITHER field on MFBD
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
676b9339dd panfrost: Implement Z32F(_S8) support
Z32F uses a dediacted float path. Z32F_S8 uses separate stencil planes
in the hardware, lowered via u_transfer_helper.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
479185a1cd panfrost/decode: Don't disassemble NULL shaders
It is legal to load a shader from a NULL address, particularly when the
TILER job is used strictly for effects on the Z/S buffer with 0x0 color
mask. Don't crash the decoder in this case.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
65d89097b8 panfrost: Copy stencil front to back if back disabled
When backside stenciling is disabled, backfacing primitives just do the
same thing as frontfacing primitives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Jan Zielinski
6f7306c029 swr/rast: Refactor memory API between rasterizer core and swr
This commit cleans up API between the core of the rasterizer and swr.
Some formatting changes are also done.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-18 16:17:00 +02:00
Andreas Baierl
4627a0c4eb lima/ppir: Add gl_PointCoord handling
Treat gl_PointCoord as a system value and
add the necessary bits for correct codegen.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
3523233027 gallium: Add PIPE_CAP_TGSI_FS_POINT_IS_SYSVAL
This adds an option to treat gl_PointCoord as a system value.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
3349a60f6f nir/tgsi: Extend tgsi_to_nir.c to support gl_PointCoord as a system value.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
f5804f1768 nir: Add gl_PointCoord system value
gl_PointCoord handling needs some special bits set in lima/ppir code
generation. Treating gl_PointCoord as a system value makes it easier
to distinguish from a regular varying.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
24af57407c glsl: Optionally declare gl_PointCoord as a system value
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Connor Abbott
b178fdf486 lima/gp: Fix problem with complex moves
When writing the scheduler, we forgot that you can't read the complex
unit in certain sources because it gets overwritten to 0 or 1. Fixing
this turned out to be possible without giving up and reducing
GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't
expect. There can be at most 4 next-max nodes that can't have moves
scheduled in the complex slot, so it actually isn't a problem for
getting the number of next-max nodes at 5 or lower. However, it is a
problem for stores. If a given node is a next-max node whose move cannot
go in the complex slot *and* is used by a store that we decide to
schedule, we have to reserve one of the non-complex slots for a move
instead of all the slots, or we can wind up in a situation where only
the complex slot is free and we fail the move. This means that we have
to add another term to the reservation logic, for stores whose children
cannot be in the complex slot.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
54434fe670 lima/gpir: Rework the scheduler
Now, we do scheduling at the same time as value register allocation. The
ready list now acts similarly to the array of registers in
value_regalloc, keeping us from running out of slots. Before this, the
value register allocator wasn't aware of the scheduling constraints of
the actual machine, which meant that it sometimes chose the wrong false
dependencies to insert. Now, we assign value registers at the same time
as we actually schedule instructions, making its choices reflect reality
much better. It was also conservative in some cases where the new scheme
doesn't have to be. For example, in something like:

1 = ld_att
2 = ld_uni
3 = add 1, 2

It's possible that one of 1 and 2 can't be scheduled in the same
instruction as 3, meaning that a move needs to be inserted, so the value
register allocator needs to assume that this sequence requires two
registers. But when actually scheduling, we could discover that 1, 2,
and 3 can all be scheduled together, so that they only require one
register. The new scheduler speculatively inserts the instruction under
consideration, as well as all of its child load instructions, and then
counts the number of live value registers after all is said and done.
This lets us be more aggressive with scheduling when we're close to the
limit.

With the new scheduler, the kmscube vertex shader is now scheduled in 40
instructions, versus 66 before.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
12645e8714 lima/gp: Mark more add-only nodes as maybe-two-slot
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
16de3dd7a6 lima/gpir: Fix some bugs in instruction handling
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
cc78a42577 lima: Reintroduce the standalone compiler
I used this to test things without needing to have a device handy.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
4423552ff0 nir/lower_viewport: Check variable mode first
The location is unused for shader_temp and function_temp variables, and
due to the way we nir_lower_io_to_temproraries demotes shader_out
variables to shader_temp variables, it happened to equal
VARYING_SLOT_POS for the gl_Position temporary, which made this pass
fail with the offline compiler due to this coming before vars_to_ssa.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:21:41 +02:00
Samuel Pitoiset
6e5e4bf050 radv/gfx10: set BREAK_WAVE_AT_EOI if TES or GS enable the primitive ID
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-18 10:37:10 +02:00
Samuel Pitoiset
8c692ff512 radv/gfx10: move emitting VGT_PRIMITIVEID_EN into the NGG path
And do not emit VGT_GS_MODE which is unnecessary on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-18 10:36:38 +02:00
Samuel Pitoiset
8315dbe419 radv/gfx10: do not always execute a barrier before the second shader
With NGG, empty waves may still be required to export data.

This fixes dEQP-VK.ycbcr.format.*_unorm.geometry_*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-18 10:06:34 +02:00
Samuel Pitoiset
63d670e350 radv: fix VGT_GS_MODE if VS uses the primitive ID
Found by inspection.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-18 10:03:12 +02:00
Iago Toral Quiroga
c23fa1ca07 v3d: emit correct lowering for logic operations with MSAA render targets
v2:
 - Drop the writemask from the per-sample color intrinsic (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
93d05c1c1f v3d: handle nir_intrinsic_store_tlb_sample_color_v3d
v2:
 - Move handling of output intrinsics to ntq_emit_intrinsic() (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
50016d7718 nir: add a V3D-specific intrinsic for per-sample color writes
For per-sample color writes we need the output intrinsic to pack the
sample index, which is not provided with regular store_output intrinsics
unless we figured out a way to encode it into the base or the offset.

v2:
 - Drop the writemask (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
ba520b00c4 v3d: implement per-sample tlb color writes
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
b96c2219ca v3d: refactor the tlb color write code
We want to split the tlb specifier setup from the color writes, because when
we implement per-sample color writes we want to do the latter for all the
samples, but the former only once.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
fd3ec6f55d v3d: move tlb color write emission to a helper function
We will soon be adding per-sample color writes which means additional
complexity and more indentation (we will need another loop to emit
the writes for each individual sample), so this will help keeping
things simple and a bit more readable.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
0c9919710e v3d: implement per-sample tlb color reads
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Lionel Landwerlin
3adc32df92 anv: fix format mapping for depth/stencil formats
anv_format is supposed to have a pointer back to the associated
VkFormat, we were missed this for depth/stencil formats.

This doesn't fix anything afaict, but will be needed for future
changes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 465de47bad ("anv: associate vulkan formats with aspects")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 09:40:01 +03:00
Dave Airlie
a68f593a0e radv: put back VGT_FLUSH at ring init on gfx10
I can find no evidence that removing this is a good idea.

Fixes: 9b116173b6 ("radv: do not emit VGT_FLUSH on GFX10")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-18 16:24:44 +10:00
Gert Wollny
45951452aa softpipe: Clamp border colors when needed
unorm and snorm require that the border color values are clamped, so when
picking the sampler view copy/clamp the border color from the sampler and
use these adjusted values.

Fixes:

  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_compressed_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_snorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_srgb_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_unorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_compressed_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_snorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_srgb_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth_uint_stencil_sample_depth

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:49:00 +02:00
Gert Wollny
230b99ce2f softpipe: set a lower minimum clamp value for texture coordinate border clamp
The value of -0.5f is not small enough to produce negative coordinates,
so lower the minimum clamp value to -1.0f. This fixes a number of tests
from
   dEQP-GLES31.functional.texture.border_clamp.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:47:23 +02:00
Gert Wollny
eae4c6df8d softpipe: Correct repeat-mirror evaluation
when mirroring the texture corrdinates the indices must be mirrored as
well and the half pixel shift must be applied in reverse.

Fixes a number of tests from:
  dEQP-GLES31.functional.texture.gather.offset.*
  dEQP-GLES31.functional.texture.gather.offsets.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:47:23 +02:00
Gert Wollny
fff624fca4 softpipe: Also mark textures as dirty when updating the framebuffer state
At this point all the draw caches are flushed to the old attached textures,
so the read caches of these textures will need to be updated too.

Fixes:
   dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:33:59 +02:00
Jonathan Marek
08514a9721 etnaviv: set DITHER_MODE
This fixes a rendering glitch observed in SDL testscale test, where alpha
blending samples with value (1.0, 1.0, 1.0, 0.0) whitens the target instead
of having no effect.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
aaf0c47c76 etnaviv: update headers from rnndb
Update to etna_viv commit a16a418.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
76adf041f2 etnaviv: fix blend color on newer GPUs
Newer GPUs use the half float ALPHA_COLOR_EXT register.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
5f73726013 etnaviv: fix alpha blending cases
We need to check rgb_func/alpha_func when determining if blend or separate
alpha is required.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:35 -04:00