This is a direct port of Marek Olšáks patch
"radeonsi: increase performance for DRI PRIME
offloading if 2nd GPU is CIK or VI" to r600.
It uses SDMA for the detiling blit from renderoffload VRAM
to GTT, as SDMA is much faster for tiled->linear blits from
VRAM to GTT.
Testing on a dual Radeon HD-5770 setup reduced the time
for the render offload gpu to get its rendering into
system RAM from approximately 16 msecs for simple rendering
at 1920x1080 pixel 32 bpp to 5 msecs, a > 3x speedup!
This was measured using ftrace to trace the time the radeon kms
driver waited on the dmabuf fence of the renderoffload gpu to
complete.
All in all this brought the time for a flip down from 20 msecs
to 9 msecs, so the prime setup can display at full 60 fps instead
of barely 30 fps vsync'ed.
The current r600 implementation supports SDMA on Evergreen and
later, but not R600/R700 due to some bugs apparently present
in their SDMA implementation.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This file was supposed to be added with the previous "svga: add guest
statistic gathering interface" patch but went MIA for some reason.
Reviewed-by: Brian Paul <brianp@vmware.com>
SDMA is much faster for tiled->linear blits from VRAM to GTT.
I have Bonaire in my second PCIe slot.
$ glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD TONGA ...
$ DRI_PRIME=1 glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD BONAIRE ...
Without SDMA:
$ DRI_PRIME=1 glxgears
8796 frames in 5.0 seconds = 1759.074 FPS
8899 frames in 5.0 seconds = 1779.672 FPS
With SDMA:
$ DRI_PRIME=1 glxgears
12765 frames in 5.0 seconds = 2552.788 FPS
12888 frames in 5.0 seconds = 2577.495 FPS
The 1st GPU is irrelevant. The improvement should be much lower at 60 fps,
but definitely measurable.
SI will get this once we add SDMA blit support for it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This regression is caused because of commit 3190c7ee97
Regression caused by following OpenGL 4.4 spec rules relates to
GL_FRAMEBUFFER_SRGB in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes piglit spec/ext_transform_feedback/overflow-edge-cases segfaults
because the query's fence pointer was null.
Tested with Piglit, Sauerbraten, ETQW.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We don't want to flush the command buffer or sync on the fence when ending
a query (that kind of defeats the whole purpose of async queries). Do that
instead in get_query_result().
Tested with Piglit, arbocclude, Sauerbraten game, Nobel Clinician Viewer,
ETQW.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch avoid emitting redundant DXSetSamplers command.
Tested with Lightsmark2008, Heaven, MTT piglit, glretrace, viewperf.
Reviewed-by: Brian Paul <brianp@vmware.com>
Put all the clearing related functions in svga_init_clear_functions()
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
define svga_init_clear_functions()
and svga_clear_texture as svga->pipe.clear_texture. This is part of
ARB_clear_texture extension
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
To clear texture this function can be used. This is part of
ARB_clear_texture extension. Basically this extension allows you to
clear texture with given color values.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Saving all blitter states will be done in begin_blit() so that
begin_blit() can be used before performing any blit operation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
With this patch, guest statistic gathering interface is added to
svga winsys interface that can be used to gather svga driver
statistic. The winsys module can then share the statistic info with
the VMX host via the mksstats interface.
The statistic enums used in the svga driver are defined in
svga_stats_count and svga_stats_time in svga_winsys.h
Reviewed-by: Brian Paul <brianp@vmware.com>
If the shader has indirect access to non-indexable temporaries,
convert these non-indexable temporaries to indexable temporary array.
This works around a bug in the GLSL->TGSI translator.
Fixes glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
on DX11Renderer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Jason suggested adding an assert(function->impl) here. All callers
of this function actually want ->impl, so I decided just to change
the API.
We also change the nir_lower_io_to_temporaries API here. All but one
caller passed nir_shader_get_entrypoint(), and with the previous commit,
it now uses a nir_function_impl internally. Folding this change in
avoids the need to change it and change it back.
v2: Fix one call I missed in ir3_compiler (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We need the source to be in r0-r3, so make a new register class for it.
It will be up to the surrounding passes to make sure that the r0-r3
allocation of its source won't conflict with anything other class
requirements on that temp.
We can take advantage of the fact that multi_fence does the obvious thing
with NULL fences.
This fixes unflushed fences that can get stuck due to empty IBs.
radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL
interop and this is the only way to make it coherent with the current
context. It can optionally be set to NULL.
Reviewed-by: Brian Paul <brianp@vmware.com>