As reported by Clang, TGSI_OPCODE_DFMA (defined magic number 118) is
currently initialized twice for Cayman and Evergreen.
When Jan Vesely added double precision FMA opcode it did make sense
to locate it immediately after TGSI_OPCODE_DMAD, although this is
out of order.
This change cleans up the prior magic number definition and ensures
any later reordering of this struct will not create problems.
Prior change was:
commit 015e2e0fce
Author: Jan Vesely <jan.vesely@rutgers.edu>
Date: Sat Jul 2 16:14:54 2016 -0400
r600g: Add double precision FMA ops
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782
Fixes: 54c4d525da ("r600g: Enable FMA on chips that support it")
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: James Harvey <lothmordor@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Noticed this error in a debug message whilst reviewing
https://bugs.freedesktop.org/show_bug.cgi?id=97477
This patch doesn't go towards fixing that bug, but at
least may clarify future debug output.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Implement SCATTERPS as a dynamic loop based on mask set bits
instead of a static compile time loop.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- use per-primitive viewports throughout the pipeline.
- track whether all available scissor rects are tile aligned.
Causes failures, so not taken into account when choosing rasterizer yet.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
We were allocating global variables for the maximum LDS size
which made the compiler think we were using all of LDS, which
isn't the case.
Reviewed-By: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v1 → v2:
- Fixed indentation (noted by Brian Paul)
- Removed second assert from nouveau's switch statements (suggested by
Brian Paul)
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Use temporary buffers so that we don't read and write to the
same surface at the same time. We don't need to use linear
layout now.
v2: rebase the patch against reverted change
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This reverts commit 09dff7ae2e.
Turned out this can cause some artifacts in the output. Let's revert
it for now until we have sorted out all issues.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
This is a direct port of Marek Olšáks patch
"radeonsi: increase performance for DRI PRIME
offloading if 2nd GPU is CIK or VI" to r600.
It uses SDMA for the detiling blit from renderoffload VRAM
to GTT, as SDMA is much faster for tiled->linear blits from
VRAM to GTT.
Testing on a dual Radeon HD-5770 setup reduced the time
for the render offload gpu to get its rendering into
system RAM from approximately 16 msecs for simple rendering
at 1920x1080 pixel 32 bpp to 5 msecs, a > 3x speedup!
This was measured using ftrace to trace the time the radeon kms
driver waited on the dmabuf fence of the renderoffload gpu to
complete.
All in all this brought the time for a flip down from 20 msecs
to 9 msecs, so the prime setup can display at full 60 fps instead
of barely 30 fps vsync'ed.
The current r600 implementation supports SDMA on Evergreen and
later, but not R600/R700 due to some bugs apparently present
in their SDMA implementation.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This file was supposed to be added with the previous "svga: add guest
statistic gathering interface" patch but went MIA for some reason.
Reviewed-by: Brian Paul <brianp@vmware.com>
SDMA is much faster for tiled->linear blits from VRAM to GTT.
I have Bonaire in my second PCIe slot.
$ glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD TONGA ...
$ DRI_PRIME=1 glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD BONAIRE ...
Without SDMA:
$ DRI_PRIME=1 glxgears
8796 frames in 5.0 seconds = 1759.074 FPS
8899 frames in 5.0 seconds = 1779.672 FPS
With SDMA:
$ DRI_PRIME=1 glxgears
12765 frames in 5.0 seconds = 2552.788 FPS
12888 frames in 5.0 seconds = 2577.495 FPS
The 1st GPU is irrelevant. The improvement should be much lower at 60 fps,
but definitely measurable.
SI will get this once we add SDMA blit support for it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>