Commit graph

16656 commits

Author SHA1 Message Date
Rob Clark
e1c1c40cbc gallium: make shader_buffers const
Be consistent with the rest of the "set_xyz" state interfaces.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-20 12:36:20 -04:00
Nicolai Hähnle
1167905c41 radeonsi: use trapezoid distribution for tess on Fiji and Polaris
This yields a small performance improvement in Unigine Heaven.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:29:55 +02:00
Nicolai Hähnle
650137a9c8 radeonsi/sid: add Fiji+ tesselation distribution mode
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:29:15 +02:00
Nicolai Hähnle
32fd92e028 radeonsi: emit PA_SC_RASTER_CONFIG_1 only once
It is the same for all SEs.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:28:34 +02:00
Nicolai Hähnle
c95175581e radeonsi: fix calculation of valid RB mask per SE
The old calculation treated too many RBs as disabled.

Cc: 11.0 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:28:31 +02:00
Nicolai Hähnle
6c2e636982 radeonsi: raise SI_PM4_MAX_DW
The old limit, introduced in commit afa752d3f0,
was exceeded by 4 SE configurations which hit si_write_harvested_raster_configs.

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:28:17 +02:00
Ilia Mirkin
154c0a42a2 nvc0: don't make use of push hint if there are no non-const user vbos
This makes the check match up what we do on nv50 as well - there's no
point in switching over the push path if everything's in managed
buffers. This can happen when a shader uses a vertex without an enabled
array - we end up passing it a constant attribute.

This also has the effect of "fixing" some flickering in Talos. I have no
idea why. I've stared at the push logic forwards, backwards, and
sideways. By always forcing the push path (which is slow), the
flickering also goes away, but other rendering is still wrong
(specifically draw 383068 as identified in the bug). However by not
switching over to the push path, draw 383068 is correct.

Note that other flickering remains in Talos, like the red/green
walls/floors. This takes care of the shadow flickering though.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90513
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-19 10:14:57 -04:00
Ilia Mirkin
1804aa0b80 gk104/ir: fix tex use generation to be more careful about eliding uses
If we have a loop, instructions before the tex might be added as tex
uses, and those may in fact dominate all other uses of the tex results.
This however doesn't mean that we don't need a texbar after the tex.
Only check if uses dominate each other they are dominated by the tex.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96565
Fixes: 7752bbc44 (gk104/ir: simplify and fool-proof texbar algorithm)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-19 10:14:46 -04:00
Ilia Mirkin
194bcb49d1 nv50: add support for GL_EXT_window_rectangles
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-18 13:38:30 -04:00
Ilia Mirkin
b21a00d129 nvc0: add support for GL_EXT_window_rectangles
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-18 13:38:30 -04:00
Ilia Mirkin
07fcb06fe0 gallium: add PIPE_CAP_MAX_WINDOW_RECTANGLES to all drivers
This says how many window rectangles are supported by the
implementation, although it may not exceed PIPE_MAX_WINDOW_RECTANGLES.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-18 13:38:29 -04:00
Samuel Pitoiset
b214e0d2fb nv50/ir: add missing strings for some recent sysvals
This is pretty useful for debugging purposes and those should
not be omitted.

Fixes: 517a93b3 ("nvc0: add ARB_shader_draw_parameters support")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-18 18:34:50 +02:00
Bruce Cherniak
6b0ac95c28 swr: Update screen->context pointer with multiple contexts.
A pipe pointer in the screen allows for access to current device context
 in flush_frontbuffer and resource_destroy.  This wasn't tracking current
context in multi-context situations.

v2: More caffeine.  Corrected compare, removed unnecessary set of
screen-pipe in create_context, and added a few comments.
2016-06-17 13:56:03 -05:00
Tim Rowley
5a64549f54 swr: switch from overriding -march to selecting features
Acked-by: Chuck Atkins <chuck.atkins@kitware.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
2016-06-17 10:34:17 -05:00
Rob Herring
067c5b10b6 vc4: fix vc4_resource_from_handle() stride calculation
The expected stride calculation is completely wrong. It should
ultimately be multiplying cpp and width rather than dividing. The width
also needs to be aligned to the tiling width first before converting to
stride bytes.

The whole stride check here is possibly pointless. Any buffers which
were allocated outside of vc4 may have strides with larger alignment
requirements.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-06-15 14:54:38 -07:00
Nicolai Hähnle
44e0c0e6ec radeonsi: fix undefined left-shift into sign bit
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-15 09:27:56 +02:00
Marek Olšák
6ef50efc10 gallium/radeon: num-cs-flushes query should display per-frame average
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák
4140afd04b gallium/radeon: add driver queries for compute/dma call stats and spills
also print the average count per frame

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák
8fc688c303 radeonsi: don't generate "ret void undef"
Use LLVMBuildRetVoid in epilogs and the GS copy shader and
si_llvm_build_ret otherwise.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-14 20:22:16 +02:00
Marek Olšák
4eea710b0d radeonsi: try to hit direct hw MSAA resolve by changing micro mode in clear
We could also do MSAA resolve in a compute shader like Vulkan and remove
these workarounds.

v2: comment the magic numbers

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák
373060652c radeonsi: clarify the MSAA resolve limitation with scanout
this is the correct hw requirement

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák
789618e3b4 gallium/radeon: add micro_tile_mode to radeon_surf
for easier access

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Roland Scheidegger
f4184d5450 llvmpipe: hack-fix bugs due to bogus bind flags
The gallium contract would be that bind flags must indicate all possible
bindings a resource might get used, but fact is the mesa state tracker does
not set bind flags correctly, and this is more or less unfixable due to GL.

This caused a bug with piglit arb_uniform_buffer_object-rendering-dsa
since 6e6fd911da - the commit is correct,
but it caused us to miss updates to fs UBOs completely, since the
corresponding buffer didn't have the appropriate bind flag set (thus we
wouldn't check if it is indeed currently bound).
See the discussion about this starting here:
https://lists.freedesktop.org/archives/mesa-dev/2016-June/119829.html

So, update the bind flags when we detect such usage.
Note we update this value for now only in places which matter for us - that
is creating sampler/surface view, or binding constant buffer. There's plenty
more places (setting streamout buffers, vertex/index buffers, ...) where
things can be set with the wrong bind flags, but the bind flags there never
matter.

While here also make sure we only set dirty constant bit when it's a fs
constant buffer - totally doesn't matter if it's vs/gs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-14 17:03:34 +02:00
Rob Clark
243417810b freedreno: support start param for sampler views/states
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-14 11:00:59 -04:00
Rob Clark
b8eb1493a9 freedreno: only do extra vertex-buffer state logic on a2xx
Possibly this should move into an fd2 wrapper fxn, similar to the
texture state tracking done for fd3/fd4 (clamp emulation, etc)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-14 11:00:59 -04:00
Rob Clark
26d0efa9ce freedreno: use util_copy_constant_buffer() helper
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-14 11:00:59 -04:00
Stephan Bergmann
0140938b26 nv50/ir: make Graph destructor virtual
Avoid ASan new-delete-type-mismatch when Function::domTree is created as
DominatorTree in Function::convertToSSA but destroyed only as base
Graph in ~Function.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-13 22:55:11 -04:00
Samuel Pitoiset
7f257abc1b nvc0/ir: clamp the UBO index for compute on Kepler
We already check that the address is not "too far", but we should also
clamp the UBO index in order to avoid looking at the wrong place in the
driver cb. This is a pretty rare situation though.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-13 20:12:48 +02:00
Marek Olšák
6e1b12c788 radeonsi: enable scratch coalescing
This makes one particular compute shader 8x faster.

Latest LLVM git is required.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-13 18:13:51 +02:00
Rob Herring
112e988329 Android: move libdrm settings to top-level Android.common.mk
Fix warnings like these due to HAVE_LIBDRM being inconsistently defined:

external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition]
typedef struct drm_clip_rect drm_clip_rect_t;

HAVE_LIBDRM needs to be set project wide to fix this. This change also
harmlessly links libdrm with everything, but simplifies the makefiles a
bit.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Emil Velikov
fcb5a75a66 swr: automake: add missing -I flag
When building from a release tarball (where the generated/built files
are in srcdir) in an OOT fashion we need to have both builddir and
srcdir in the includes list.

Otherwise we'll error out, as the file (header gen_knobs.h in this case)
won't be in the location where we are looking.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:24 +01:00
Chuck Atkins
c86fcaca72 swr: Add missing headers for package inclusion
CC: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:24:44 +01:00
Ilia Mirkin
3f48548a6f nv50: reinstate dedicated constbuf push path
This was disabled due to occasionally incorrect behavior when trying to
upload data. It later became apparent that nvc0 also had a similar but
slightly different issue, which was resolved in commit e50c01d5. This
takes the same logic as nvc0 and applies it to nv50 (which has somewhat
different interfaces).

Unfortunately I did not note down precisely what was broken with UBOs
when removing the support from nv50, but I've tested a bunch of local
traces, and none of them appear to regress. This should hopefully
improve performance when UBOs are used, but this was not directly
verified.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-11 12:18:43 -04:00
Ilia Mirkin
f47845596b nv50: enable indirect addressing of fragment shader inputs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-11 11:50:42 -04:00
Brian Paul
e9b86bb92c llvmpipe: turn on pipe cap for GL_ARB_copy_image support
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
2db747cf26 llvmpipe: don't use 3-component formats, except 32-bit x 3 formats
This basically disallows all 8-bit x 3 and 16-bit x 3 formats for
textures and render targets.  Some 3-component formats were already
disallowed before.  This avoids problems with GL_ARB_copy_image.

v2: the previous version of this patch disallowed all 3-component formats

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
672e92a146 softpipe: turn on pipe cap for GL_ARB_copy_image support
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
d8fe6332d8 softpipe: don't use 3-component formats
Mesa and gallium don't have a complete set of matching 3-component
texture formats.  For example, 8-bit sRGB unorm.  To fully support
the GL_ARB_copy_image extension we need to have support for all of
these formats: RGB8_UNORM, RGB8_SNORM, RGB8_SRGB, RGB8_UINT, and
RGB8_SINT using the same component order.  Since we don't have that,
disable the 3-component formats for now.

v2: Simplify 3-component format check, per Marek.
Also check that target != PIPE_BUFFER.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Dave Airlie
f550b6d296 radeonsi: convert to 64-bitness checks instead of doubles.
This converts to testing for 64-bit types and renames some things
in anticipation of 64-bit integer support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:21 +10:00
Jose Fonseca
320d1191c6 gallivm: Use llvm.fmuladd.*.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 13:47:35 +01:00
Bas Nieuwenhuizen
54f755fa0f radeonsi: Reinitialize all descriptors in CE preamble.
This fixes a problem with the CE preamble and restoring only stuff in the
preamble when needed.

To illustrate suppose we have two graphics IB's 1 and 2, which  are submitted in
that order. Furthermore suppose IB 1 does not use CE ram, but IB 2 does, and we
have a context switch at the start of IB 1, but not between IB 1 and IB 2.

The old code put the CE RAM loads in the preamble of IB 2. As the preamble of
IB 1 does not have the loads and the preamble of IB 2 does not get executed, the
old values are not load into CE RAM.

Fix this by always restoring the entire CE RAM.

v2: - Just load all descriptor set buffers instead of load and store the entire
      CE RAM.
    - Leave the ce_ram_dirty tracking in place for the non-preamble case.

v3: - Fixed parameter alignment.
    - Rebased to master (Nicolai's descriptor series).

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-10 12:18:29 +02:00
Tim Rowley
2c85128e01 swr: implement clipPlanes/clipVertex/clipDistance/cullDistance
v2: only load the clip vertex once

v3: fix clip enable logic, add cullDistance

v4: remove duplicate fields in vs jit key, fix test of clip fixup needed

v5: fix clipdistance linkage for slot!=0,4

v6: support clip+cull; passes most piglit clip (failures understood)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-09 13:28:35 -05:00
Marek Olšák
26b69ad250 radeonsi: improve the computation and comment of scratch_waves
2% isn't much. If you think the number should be decreased, please speak up.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:28:25 +02:00
Marek Olšák
1d9c1d9386 radeonsi: print the number of spilled VGPRs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:28:25 +02:00
Marek Olšák
2b18d67a1e gallium/radeon: remove dead code creating LLVMTargetMachine
This was for some old unsupported LLVM version.
Only si_create_context creates the target machine now.
r600g doesn't use this function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:23:42 +02:00
Marek Olšák
a343ab55f7 radeonsi: don't enable scratch just for SGPR spills
Diff from shader-db:
  Scratch: 3221504 -> 17408 (-99.46 %) bytes per wave

v2: add "break;"

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:23:41 +02:00
Marek Olšák
95288277d5 Revert "radeonsi: allow direct hw MSAA resolve for scanout surfaces"
This reverts commit ffd54d1936.

No, it doesn't work. The test case is "glxgears -samples 2".
2016-06-08 19:21:55 +02:00
Marek Olšák
f39439d166 radeonsi: re-enable PBO ReadPixels acceleration
disabled by 4f1cccf570

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 00:22:45 +02:00
Marek Olšák
7c6e88b643 radeonsi: allow MSAA resolving into a texture that has DCC enabled
Since DCC is enabled almost everywhere now, it's important not to disable
this fast path.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
9a472a3e0b gallium/radeon: move DCC clearing into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00