Commit graph

33673 commits

Author SHA1 Message Date
Jan Vesely
78673b614b clover: Fix build after llvm r325155 and r325160
r325155 ("Pass a reference to a module to the bitcode writer.")
and
r325160 ("Pass module reference to CloneModule")

change function interface from pointer to reference.

v2: Fix indentation (tab instead of spaces)

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-02-15 18:18:53 -05:00
Dylan Baker
2ab1ce30c4 meson: fix xvmc target linkage
This needs to link the state tracker with --whole-archive to expose the
right symbols.

v4: - Always add libswdri and libswkmsdri to the link_with list

Fixes: 22a817af8a ("meson: build gallium xvmc state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:38:43 -08:00
Dylan Baker
0b73c329bc meson: Fix xa target linkage
This needs to use --whole-archive (link_whole in meson) to properly
expose symbols.

v4: - Always add libswdri and libswkmsdri to link_with list

Fixes: 0ba909f0f1 ("meson: build gallium xa state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:36:31 -08:00
Dylan Baker
91a59b6287 meson: Fix omx-bellagio target linkage
This needs to use --whole-archive (link_whole in meson) to properly
expose symbols.

v4: - Always add libswdri and libswkmsdri to link_with

Fixes: 1d36dc674d ("meson: build gallium omx state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:36:26 -08:00
Dylan Baker
2e4be28fb2 meson: fix va target linkage
The state tracker needs to be linked with whole-archive (like
autotools). As a result there are symbols from libswdri and libswkmsdri
that are needed, so link those as well.

v4: - Always add libswdri and libswkmsdri to link_with list

Fixes: 5a785d51a6 ("meson: build gallium va state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:36:16 -08:00
Dylan Baker
90d361753c meson: fix vdpau target linkage
The VDPAU state tracker needs to be linked with whole-archive (autotools
does this). Because we are linking the whole archive we alos need to
link with libswdri and libswkmsdri if those have been enabled.

v4: - Always add libswdri and libswkmsdri to link_with list

Fixes: 68076b8747 ("meson: build gallium vdpau state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:36:09 -08:00
Dylan Baker
7023b373ec meson: link dri3 xcb libs into vlwinsys instead of into each target
This makes the dependencies easier to manage, since each media target
doesn't need to worry about linking to half a dozen libraries.

Fixes: b1b65397d0 ("meson: Build gallium auxiliary")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:35:51 -08:00
Dylan Baker
424e654cb0 meson: use va-api version reported by pkg-config
Fixes: 5a785d51a6 ("meson: build gallium va state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:35:47 -08:00
Dylan Baker
8eb608df61 meson: add libswdri and libswkmsdri to dri link_with
Fixes: b154b44ae3 ("meson: build radeonsi gallium driver")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:35:42 -08:00
Dylan Baker
be879f9f29 meson: add libswdri and libswkmsdri to d3dadaptor link_with
v5: - Fix libswdi -> libswdri typo

Fixes: 6b4c7047d5 ("meson: build gallium nine state_tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:35:36 -08:00
Dylan Baker
d672084ba2 meson: define empty variables for libswdri and libswkmsdri
This allows these variables to unconditionally included in `link_with`
lists, even if they're not used. This allows deleting duplicated logic
in nearly every gallium target implemented in meson today. This also
removes the now useless `build_by_default` flag from swdri and swkmsdri.

v4: - add this patch

Fixes: 66c94b9313
       ("meson: build gallium winsys for dri, null, and wrapper")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-15 10:35:23 -08:00
Brian Paul
64a1223a80 svga: replace gotos with else clauses
Simple clean-up.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-02-15 09:49:06 -07:00
Brian Paul
fa901768a4 svga: s/unsigned/enum pipe_shader_type/
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-02-15 09:05:09 -07:00
Brian Paul
8b54299c34 svga: move duplicated code for setting fillmode/flatshade state
Move the calls to svga_hwtnl_set_fillmode() and svga_hwtnl_set_flatshade()
out of the two retry_draw_*() functions to the svga_draw_vbo() function.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-02-15 09:05:09 -07:00
Brian Paul
072df89a79 svga: move svga_update_state() call in draw code
This fixes a few Piglit transform feedback regressions caused by
commit 7a1401938b.

In that change I moved the moved svga_update_state() into the loops,
after the calls to svga_hwtnl_set_flatshade().  But
svga_hwtnl_set_flatshade() actually depends on some derived shader
state.  This patch moves the svga_update_state() call into
svga_draw_vbo() so it's not duplicated in two places.

Fixes: 7a1401938b ("svga: clean up retry_draw_range_elements(),
retry_draw_arrays()")

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-02-15 09:05:08 -07:00
Brian Paul
6f0aec5671 svga: call tgsi_scan_shader() for dummy shaders
If we fail to compile the normal VS or FS we fall back to a simple/
dummy shader.  We need to rescan the the shader to update the shader
info.  Otherwise, this can lead to further translations failures
because the shader info doesn't match the actual shader.

Found by adding some extra debug assertions in the state-update code
while debugging something else.

v2: also update shader generic_inputs/outputs, etc. per Charmaine

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-02-15 09:05:01 -07:00
Karol Herbst
7bc15090fc nvc0: disable MS Images for sample_count == 1 on Maxwell
fixes KHR-GL45.multi_bind.dispatch_bind_textures on Maxwell

Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-15 11:14:46 +01:00
Timothy Arceri
7be5f30bb1 radeonsi/nir: fix si_nir_load_tcs_varyings() for outputs
We were incorrectly using the input info for outputs.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-15 09:02:41 +11:00
Timothy Arceri
6acab18828 radeonsi/nir: fix shader ballot return value bitsize
Fixes cts test:
KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-15 09:02:41 +11:00
Samuel Pitoiset
141db61509 ac: remove nir_to_llvm_context from ac_nir_translate()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-14 11:53:14 +01:00
Dave Airlie
b9d2ff05a6 r600: fix regression in gl_FragColor drawing
This fixes a regression in the broadcast color to all color bufs case.

Fixes: 6c691081a (r600: fixup sparse color exports.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-14 14:02:41 +10:00
Dave Airlie
9c9a9bee44 r600: fix array spill if temp[0] is before all arrays
I found a shader with
DCL TEMP[0], LOCAL
DCL TEMP[1..256], ARRAY(1), LOCAL
DCL TEMP[257..512], ARRAY(2), LOCAL
DCL TEMP[513..768], ARRAY(3), LOCAL
DCL TEMP[769], LOCAL

This would remap badly, as it would add up all the spilled sizes
and subtract it from the temp for 0. If the current temp is less
than the array start break out.

Fixes: 1d871aa6 (r600g: Implement spilling of temp arrays (v2))
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-14 13:37:59 +10:00
Dave Airlie
8f2656c75b virgl: add ARB_sample_shading support.
This enable ARB_sample_shading if the renderer supports it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-14 13:06:07 +10:00
Dave Airlie
9b95b70719 virgl: add ARB_draw_indirect support.
This relies on the renderer code landing first.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-14 13:06:07 +10:00
Roland Scheidegger
f6718baabc tgsi: Recognize RET in main for tgsi_transform
Shaders coming from dx10 state trackers have a RET before the END.
And the epilog needs to be placed before the RET (otherwise it will
get ignored).
Hence figure out if a RET is in main, in this case we'll place
the epilog there rather than before the END.
(At a closer look, there actually seem to be problems with control
flow in general with output redirection, that would need another
look. It's enough however to fix draw's aa line emulation in some
internal bug - lines tend to be drawn with trivial shaders, moving
either a constant color or a vertex color directly to the output).

v2: add assert so buggy handling of RET in main is detected

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-02-14 02:06:54 +01:00
Dave Airlie
9ddacd9af4 gallium: drop all the guard band float caps.
Nobody queries these and nobody sets them to anything useful,
the docs say TODO.

Drop them until a use appears.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-14 08:50:08 +10:00
Stéphane Marchesin
5e4a2b394e virgl: Support v2 caps struct (v2)
This struct allows us to report:
- accurate max point size/line width.
- accurate texel and texture gather offsets
- vertex/geometry limits.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-13 14:23:54 +10:00
Timothy Arceri
b6cf898ec2 radeonsi: make si_declare_compute_memory() more generic and call for nir
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-13 14:43:05 +11:00
Eric Anholt
7a83be4b28 gallium/llvmpipe: Fix compiler warnings about ddx/ddy/ddmax.
My gcc doesn't figure out that dims >= 1 (seems reasonable), and doesn't
notice that ddmax is used from the same no_rho_opt as its initialization.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-02-12 20:48:18 +00:00
Samuel Pitoiset
e32f374944 ac: remove unused parameters in abi::load_tess_coord()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-12 11:54:17 +01:00
Samuel Pitoiset
ecf229706f ac: add load_sample_mask_in() to the ABI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-12 11:54:11 +01:00
Rob Clark
831fb29252 freedreno: small fix for flushing dependent batches
Flush a resource's previous write_batch synchronously.  Because a
resource's associated batches are not updated until after the flush
thread submits rendering to the kernel, this was causing a bit of
confusion in the following loop.  This fixes a bug that appeared with
recent stk.

Perhaps we need to re-work things a bit to clear out dependent patches
in the ctx's thread and use a fence to deal with the period between
when a flush is queued and when it is submitted to the kernel.  But
this will do until time permits a larger refactor.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
c57ed8e01c freedreno/ir3: intra-block scheduling
Because of loops, we can't schedule all of a block's predecessors first.
Instead just assume that the result consumed in a block was written far
enough away in all paths into a block.  And do an intra-block scheduling
pass to figure out if there are any cases where we need to insert extra
nop's.  This works out better than always assuming the worst case (ie.
that a value live into a block was written in the last instruction in
the predecessor block).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
2a2099a875 freedreno/ir3: "boost" the depth of if/else condition
Account for the move to predicate register, to try to avoid needing to
insert extra NOPs later.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
ffb00f6841 freedreno/ir3: account for arrays in delayslot calc
Normally false-deps are not something to consider, since they mostly
exist for delay-slot related reasons:

 * barriers
 * ordering writes after read
 * SSBO/image access ordering

The exception is a false-dependency on an array store.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
f54d2b4f10 freedreno/ir3: more clever legalize algorithm
Previously we didn't handle flow control in legalize, and instead just
set (ss)(sy) on the first instruction in every block.  Which isn't very
clever.

Instead, consider output state of all predecessor blocks, so we only
set a sync bit if needed for any possible path leading into a block.
Because of loops, we can't require that all successor blocks are
legalized before a given block, so instead run in a loop until results
converge.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
015afb6a38 freedreno/ir3: track block predecessors
Useful in the following patches.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
76440fcca9 freedreno/ir3: clean up dangling false-dep's
Maybe there is a better way for this..  where it comes useful is "array"
loads, which end up as a false-dep for a later array store.

If all the uses of an array load are CP'd into their consumer, it still
leaves the dangling array load, leading to funny things like:

  mov.u32u32 r5.y, r0.y
  mov.u32u32 r5.y, r0.z

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
aea223741f freedreno/ir3: handle IMMED for mad 2nd src special case
Consider also immediates for swapping the first two srcs, because they
can be lowered to constant.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
242a8a1957 freedreno/ir3: remove ir3 phi instruction
Now that we convert phi webs to ssa, we can drop all this.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
a7b569d60c freedreno/ir3: remove lower_if_else pass
Now that it is unused.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
268ab05484 freedreno/ir3: add experimental GCM pass
Generally seems to do worse on instruction count and register usage,
according to shader-db.  But shader-db also doesn't do a very good job
of weighting loop bodies, so that might not be totally valid.

So add an env variable to enable GCM pass for easier experimentation.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
4c15c53d91 freedreno/ir3: change opt passes
There are more useful nir passes added since initial conversion to nir.
But ir3 was never updated to use them.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
ec8bc54ad2 freedreno/ir3: use peephole select pass
Agressively lowering all if/else to selects in some extreme cases
results in much higher register pressure.  Using peephole select instead
with a modest threshold speeds up alu2 4x!

16 seems like a good limit, low enough to help alu2 but not too low that
it penalizes everything else.  With a bit better scheduling of the
instruction that moves a value into a predicate register, we might be
able to lower this limit a bit more in the future, but since we need 6
cycles from the move to predicate register to predicated branch, that
puts some sort of lower bound on how far we can lower this threshold.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
a7ea2b4eba freedreno/ir3: lower phi webs to regs
nir's from_ssa pass is much better at avoiding inserting extra moves
than our logic is.  And lowering phi webs to regs just treats anything
involved in a phi web as an array of length=1.  Which with previous
array related fixes in RA/etc ends up working out quite well.  This cuts
down on extra instructions and also helps with register pressure.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
0a6ddf964f freedreno/ir3: separate arrays from groups
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
55f14a1ac4 freedreno/ir3: make block/instruction serialno per-shader
Makes it easier to compare values seen in-game (where there are many
shaders) to cmdline standalone compiler.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
5a7de94392 freedreno/ir3: add spirv support to cmdline compiler
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
942341bcd0 freedreno/ir3: don't lower fsat
Instead, if possible fold (sat) flag into src, otherwise use:

  (sat)max.f rD, rS, rS

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark
b2fc94f074 freedreno/ir3: add encoding/decoding for (sat) bit
Seems to be there since a3xx, but we always lowered fsat.  But we can
shave some instructions, especially in shaders that use lots of
clamp(foo, 0.0, 1.0) by not lowering fsat.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00