Commit graph

113039 commits

Author SHA1 Message Date
Alejandro Piñeiro
691cee751a nir/linker: add ubo/ssbo to the program resource list
v2: "nir/linker: Use the stageref when adding UBO/SSBO resources"
     squashed on this one (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
a638971929 nir/linker: Fill the uniform's BLOCK_INDEX
Binding comparison is used to determine the block the uniform is part
of. Note that to do the binding comparison we need the information in
UniformBlocks[] and ShaderStorageBlocks[] to be available, so we have
to call gl_nir_link_uniform_blocks() before linking the uniforms.

v2: add missing break (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Samuel Pitoiset
f239e22813 radv/gfx10: enable 1D textures
Mirror RadeonSI. This also fixes crashes in addrlib.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 18:25:45 +02:00
Andres Gomez
f4d2be03b1 intel/compiler: remove abandoned comments
c8665005: ("intel/compiler: Don't always require precise lowering of flrp")
forgot to remove some comments that didn't apply any more after the
change.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
2019-07-12 16:15:20 +00:00
Andres Gomez
9aadd5d688 nir/compiler: keep same bit size when lowering with flrp
This was probably not caught before because no supported test was
exercising the flrp lowering with other bit size different than 32.

With the arrival of VK_KHR_shader_float_controls we will have some of
those and, unless we keep the bit size, we will end with something
like:

../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed.

Fixes: 158370ed2a ("nir/flrp: Add new lowering pass for flrp instructions")
Fixes: ae02622d8f ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
2019-07-12 16:15:20 +00:00
Jason Ekstrand
16842b2391 anv: Properly compute image usage in CreateImageView
With separate stencil usage, we can't just grab the usage from the image
directly and have to consider the per-aspect usage instead.

Fixes: 1be38f9178 "anv:Use VK_EXT_separate_stencil_usage to avoid..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-12 16:13:48 +00:00
Samuel Pitoiset
b393b2ce95 radv/gfx10: emit DISABLE_CONSERVATIVE_ZPASS_COUNTS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:12 +02:00
Samuel Pitoiset
8cc4e4a81e radv/gfx10: init more registers in the graphics preamble
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:12 +02:00
Samuel Pitoiset
e68b55f5e3 radv/gfx10: set HS/GS/CS.WGP_MODE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:12 +02:00
Samuel Pitoiset
5d5e26230a radv/gfx10: emit GE_PC_ALLOC
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Samuel Pitoiset
df062afa03 radv/gfx10: enable vertex shaders without export parameters
GFX10 allows this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Samuel Pitoiset
3f76c0f47c radv/gfx10: launch 2 compute waves per CU before going onto the next CU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Samuel Pitoiset
e631d65fc6 radv: use ac_get_compute_resource_limits()
No behaviour change.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Samuel Pitoiset
e510c5ee3b ac: import ac_get_compute_resource_limits() from RadeonSI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Alyssa Rosenzweig
5f4f8aec74 panfrost: Initialize shift/extra_flags
Don't rely on them being preinitialized to zero; this can cause junk to
appear on the wire.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 07:38:37 -07:00
Alyssa Rosenzweig
6d8490f900 panfrost: Fix build warnings
A bunch of these are from asserts not being compiled in 32-bit mode
(once Erik's ASSERTABLE stuff is merged, we'll want to switch).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 07:38:37 -07:00
Samuel Pitoiset
37aefb2be1 radv/gfx10: invalidate everything in L2 when shaders read data
This includes metadata as well. On GFX10, we have to invalidate
the L2 metadata cache when shaders read DCC.

Note that we still have to implement GFX10 coherency by
introducing INV_L2_METATADA but for now just flush L2.

This fixes a corruption with DCC and Talos.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 14:08:12 +02:00
Samuel Pitoiset
4e38322dd8 radv/gfx10: fix wrong emission of GE_CNTL
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 12:15:08 +02:00
Samuel Pitoiset
219d6939df radv: add more assertions to make sure packets are correctly emitted
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 12:15:06 +02:00
Alejandro Piñeiro
85b78f96a6 v3d: use inc/dec tmu operation with image atomic sub/add of 1
This allows to remove a mov of 1/-1, as it is implicit with the
operation.

As with atomic inc/dec/add, usual shader-db set doesn't include any
GLES shader using it. So using as workaround vk-gl-cts shaders, we get
this:

total instructions in shared programs: 1217013 -> 1217006 (<.01%)
instructions in affected programs: 53 -> 46 (-13.21%)
helped: 2
HURT: 0

One of the helped shader went from 40 to 34 instructions.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:51:22 +02:00
Alejandro Piñeiro
2e22879115 v3d: refactor some code from v3d40_vir_emit_image_load_store
And moved to new auxiliar method v3d40_image_load_store_tmu_op,
equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:49:29 +02:00
Alejandro Piñeiro
934ce48db8 v3d: use inc/dec tmu operation with atomic sub/add of 1
Among other things, this avoid the need of loading 1/-1 constants (so
one less operation).

The removed comment suggest the option of adding support on NIR for
inc/dec. Intel just uses an auxiliar method to get which hw operation
is needed, so no lowering is needed. And at the same time, being so
small, seems unreasonable to try to add a general one on NIR
itself. It is more easy to just adapt the method here (that is what
the patch does right now).

It is worth to note that we are not getting any change on shader-db
stats because all those methods are used on the usual shader-db set
with shaders needing GLSL > 4.2. In general there aren't too many GLSL
ES 3.1 tests.

As an alternative, we captured the GLES3/GLSL31/GLS32 used on
vk-gl-cts, even if that is not a real life usage of shaders. With
those we get the following:

total instructions in shared programs: 1217022 -> 1217013 (<.01%)
instructions in affected programs: 117 -> 108 (-7.69%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1
helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09%
95% mean confidence interval for instructions value: -2.07 -0.93
95% mean confidence interval for instructions %-change: -10.54% -5.64%
Instructions are helped.

Note that the shaders helped are really low because most of the
vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute
shaders. Although right now there is a branch around with CS support,
the usual is doing the stats against master.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:40 +02:00
Alejandro Piñeiro
3912a32a79 v3d: remove redefinition of tmu operations on nir_to_vir
They are already defined, although is a slightly different format on
the generated packet headers, so it was needed to change how it is
used on nir_to_vir.

In addition to allow to remove some duplicated headers, it will allow
to define just one get_op_for_atomic_add aux method later to support
using inc/dec instead of add of 1/-1.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:17 +02:00
Alejandro Piñeiro
c2ff38d2df v3d: tweak initial comment on pack generator script
As the files it mentions to use as reference has slightly different
names.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:09 +02:00
Yevhenii Kolesnikov
8c5692b696 glsl/link_varyings: Fix hash table leak
Hash tables were not destroyed at return.

v2: Use ralloc_context (Eric Anholt)

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-12 11:07:08 +03:00
Kenneth Graunke
712ac83033 iris: Simplify devinfo access in calculate_result_on_gpu()
We have devinfo, no need for screen->devinfo.
2019-07-12 00:33:19 -07:00
Iago Toral Quiroga
10d50f2904 v3d: remove unused definitions
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
8e50a9f6cf v3d: move implementation of some intrinsics to separate helpers
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
d69184204e v3d: emit correct lowering for logic ops with RGB10A2 render targets
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
7bf3676845 v3d: emit correct lowering for logic ops with integer render targets
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
e540775f0c v3d: add lowering for OpenGL logic operations
This implements support for OpenGL logic operations by emitting code to read
from the TLB if needed and blending the fragment output accordingly. It is
similar to VC4's blend lowering pass, but exclusive to logic operations, since
blending is otherwise supported in hardware.

The pass doesn't handle MSAA targets yet.

Fixes the following piglit tests:
spec/!opengl 1.0/gl-1.0-logicop/*
spec/!opengl 1.1/gl-1.1-xor
spec/!opengl 1.1/gl-1.1-xor-copypixels

It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which
is rendered via glamor using the XOR logic operation.

v2: fix checks for allowed variable location and maximum render target (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
7c1d708911 v3d: acquire scoreboard lock before first tlb read
Until now we have always been emitting our scoreboard locks on the last thread
switch to improve parallelism. We did this by emitting our last thread switch
right before our tlb writes at the very end of the program, where we know that
we are outside control flow.

Unfortunately, this strategy is not valid when we have tlb color reads too, as
these will happen before this point in the program and can happen inside
control flow.

To fix this we always emit a thread switch before the first tlb load and if we
see additional thread switches after that point, we change the strategy to lock
on the first thread switch.

v2: change the solution so it is expected to work in more scenarios (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
47d7c80dc7 v3d: implement tile buffer color read intrinsic
We will be emitting this intrinsic to signal TLB color loads when we implement
OpenGL logic operations, where we need to blend the fragment shader color
output with the existing color in the render target.

Per-sample TLB reads are not supported yet.

v2: fix the offset into the color_reads array (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
b0eec9e27d nir: add a new v3d-specific intrinsic for tile buffer color reads
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.

v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
6af1bdefa9 v3d: fix size of color_reads and sample_colors arrays
We need to scale the size of these arrays to consider up to
V3D_MAX_DRAW_BUFFERS render targets and 4 components per color.

v2: we want to store each color component separately, so scale by 4 too.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
0279ac6e51 v3d: add color formats and swizzles to the fragment shader key
We are going to need these very soon to emit correct reads from the tlb
to implement logic operations.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
d26b35ba44 v3d: add helpers to emit ldtlb and ldtlbu signals
The ldtlbu version will read an implicit uniform with the TLB read
specifier and should be used for the first read in a sequence
of TLB reads (unless the default configuration is valid, in which
case we can use ldtlb). The ldtlb version is used for any subsequent
TLB read in the sequence.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
aff8885cf9 v3d: handle tlb read dependency tracking as if they were writes
Tile buffer reads are emitted as ordered sequences and cannot be reordered.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
4793e2c888 v3d: instructions with the ldtlb and ldtlbu signals are tlb instructions
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
83a66e10de v3d: tlb loads cannot be removed
Loads from the tile buffer are emitted in ordered sequences so
we cannot eliminate or reorder any of them.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
08f4dc3adc v3d: the ldtlbu signal reads an implicit uniform
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
271bc8acfb v3d: handle ldtlb and ldtlbu signals during disassembly
We already have code to print these signals but the early return in the code
that checks if any signals are present present was missing the checks for them,
so it would skip printing them unless they were paired with other signals.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Samuel Pitoiset
958ee4c21a radv: report shader stage name when dumping LLVM IR
For debugging purposes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
2b6a089813 radv: tidy up radv_get_shader_name() and add NGG stages
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
ffd6a979bf radv/gfx10: update OVERWRITE_COMBINER_{MRT_SHARING,WATERMARK}
DCC related, mirror RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
c6fa4de15d radv/gfx10: do not set alignment on the ngg_emit pointer
This is invalid and this fixes a crash in LLVM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
df0a23ad1e radv/gfx10: fix exporting clip/cull distances for GS
This fixes dEQP-VK.clipping.user_defined.clip_distance.*geom*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
edcd2bc833 radv/gfx10: fix exporting the subpass view index for GS
This fixes dEQP-VK.multiview.*geometry*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:20 +02:00
Timothy Arceri
3043908ccb mesa: save/restore SSO flag when using ARB_get_program_binary
Without this the restored program will fail the pipeline validation
checks when we attempt to use an SSO program.

Fixes: c20fd744fe ("mesa: Add Mesa ARB_get_program_binary helper functions")

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111010
2019-07-12 09:26:53 +10:00
Alyssa Rosenzweig
fe783c5b0c pan/midgard: Correct component count clamping PSIZ
Kind of a funky corner case that does not (as far as I know) apply to
organic shaders from GLES but does pop up in generated shaders from the
fixed-function desktop pipeline.

Fixes: bb483a9166 ("panfrost: Clamp point size")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-11 13:30:55 -07:00