Current automake build does not try to resolve undefined
symbols thus we could end up with a broken library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With the introduction of the pipe_loader_sw_probe_dri helper we
require the sw/dri winsys during linking stage despite it being
unused by any of the targets. This will cause a minor increase
in the resulting library which will be cleaned up via linker
options with upcoming patches.
v2: Link with libswdri.la only when available.
Reported-and-tested-by: Tom Stellard <thomas.stellard@amd.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The above function implies using the the xlib winsys, which
has additional library dependencies that should not be forced.
Make the software xlib pipe loader optional thus avoid all
the dependency hell. A user that wishes to use the particular
pipe-loader would need to set the following within configure.ac.
enable_gallium_xlib_loader=yes
v2:
- Wrap sw/xlib/xlib_sw_winsys.h to handle compilation on systems
lacking X11 headers. Spotted by Christian Prochaska.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75356
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
When one builds without gallium_drm_loader, the above function will
not be available, thus we'll segfault in gallium_screen_create due
to memory access violation.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75335
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Using generic shaders caused a measurable fps drop, which was isolated to
use of full precision (vs half precision) output. This is an attempt to
regain that lost performance by using half precision solid/blit shaders
(when the output format is not float32).
Note: for the built-in shaders, I would not expect them to be register
starved. And in fact it is the solid frag shader that seems to have the
biggest impact. So I suspect you get double the pixel pipe units (or
half the cycles) when the output is half precision. So there may be
some gain to using half precision output for application shaders as
well, even though the rest of register usage is still full precision.
But for half precision to work for more complex shaders, we need to deal
with some constraints, like cat2 needing same precision for it's two src
registers. So for now it is not enabled by default except for the
built-in shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Start putting in place infrastructure to deal with multiple shader
variants. Initially we'll use this for two sided color (frag) and
binning pass (vert) shaders. Possibly need for others later (such
as YUV vs RGB eglImage?).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Easier than making more extensive use of rpt, and the more compact
shaders seem to bring some bit of performance boost. (Perhaps repeat
flag benefits are more than just instruction cache, possibly it saves
on instruction decode as well?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead in the common code, construct these shaders from TGSI. For now
we let a2xx keep it's hand coded shaders, as it's compiler isn't quite
up to the job yet. All the same it is a net drop in code size and gets
rid of special cases.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Make things configurable, and tweak the API a bit to avoid an extra
tgsi_shader_scan(). Getting closer to something generic which can be
moved out of freedreno and shaderd by other drivers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Unfortunately there's only one RT_ARRAY_MODE setting for all
attachments, so clears were previously truncated to the minimum number
of layers any attachment had. Instead set the RT_ARRAY_MODE to 512 (the
max number of layers) before doing the clear. This fixes
gl-3.2-layered-rendering-clear-color-mismatched-layer-count.
Also fix clears of individual layered rt/zeta, in case it ever happens.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Use tex->bo_format instead of zs->format in ilo_blitter_rectlist_clear_zs()
because the latter may be combined depth/stencil format. hiz_can_clear_zs()
is no-op for GEN7+, but move the GEN check so that the assertions are tested.
Finally, call the fast depth clear function from ilo_clear().
This patch fixes this SCons build error introduced with commit
4f37e52f37.
Compiling src/gallium/targets/libgl-xlib/xlib.c ...
src/gallium/targets/libgl-xlib/xlib.c:35:42: fatal error: state_tracker/xlib_sw_winsys.h: No such file or directory
#include "state_tracker/xlib_sw_winsys.h"
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75347
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
v2: Handle null_sw_create failure, add missing function return type
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
Will be used in the following commits.
v2: Link gallium tests against the library.
v3: Handle dri_create_sw_winsys failure
v4: Rebase on top of the targets/xa changes
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v2)
Will be used in the upcoming patches.
v2: handle xlib_create_sw_winsys failure, drop unneeded header
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
v2: Rebase on top of the rendernode changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v1)
v2: Rebase on top of vl_winsys_xsp.c removal
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
Currently HAVE_PIPE_LOADER_XCB is defined, rather than being set to 1/0.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The sw pipe-loader implicitly handles winsys_create, thus we
it would make sense to implicitly destroy it upon releasing
the loader.
Currently we leak the sw_winsys when releasing the pipe-loader.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Define some additional convenience operators, clean up the
implementation slightly, and rename it to 'intrusive_ptr' for reasons
that will be obvious in the next commit.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Fixes a build break in state_tracker/st_program.c
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75278
Reviewed-by: Dave Airlie <airlied@redhat.com>
The previous code relied on cpu denorm support for converting small float
formats (such r11g11b10_float and r16_float) to floats, otherwise denorms
are flushed to zero. We worked around that in llvmpipe blend code by
reenabling denorms, but this did nothing for texture sampling. Now it would
be possible to reenable it there too but I'm not really a fan of messing
with fpu flags (and it seems we can't actually do it reliably with llvm in
any case looking at some bug reports). (Not to mention if you actually have
a lot of denorms in there, you can expect some order-of-magnitude slowdown
with x86 cpus.)
So instead use code which adjusts exponents etc. directly hence not relying
on cpu denorm support for the rescaling mul.
(We still need the fpu flag handling as we can't do float-to-smallfloat
without using cpu denorms at least for now - I actually wanted to keep
both the old and new code and using one or the other depending on from where
it's called but that didn't work out as the parameter would have to be passed
through too many layers than I'd like.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
Since we are now consuming two ringbuffers at a time, we probably want a
pool larger than 4.. but we don't need each individual ringbuffer to be
so large, so offset the pool size increase by reducing rb size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It seems the write-after-read hazard that applies to texture fetch
instructions, also applies to sfu instructions.
Also, cat5/cat6 instructions do not have a (ss) bit, so in these
cases we need to insert a dummy nop instruction with (ss) bit set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes radeonsi emitting command streams to the kernel even when there
have been no draw calls before a flush, potentially powering up the GPU
needlessly.
Incidentally, this also cuts the runtime of piglit gpu.py in about half
on my Kaveri system, probably because an X11 client going away no longer
always results in a command stream being submitted to the kernel via
glamor.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65761
Cc: "10.1" mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Required for libdrm 2.4.37 and earlier. Both scons and automake
require version 2.4.38 now so that guard is not longer needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If built without llvm, the following error occurs with mplayer:
Failed to open VDPAU backend .../libvdpau_r600.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
[vo/vdpau] Error when calling vdp_device_create_x11: 1
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>