Going to need this soon (not going to bother with avx2 intrinsics at this time
but don't want to do workarounds for true vector shifts if llvm itself can use
them just fine and won't need the gazillion instruction emulation).
Not really tested other than my cpu returns 0 for these features...
(I have no idea if llvm actually would emit avx2/xop instructions neither...)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
While so far this only causes some harmless test failures, there's lots more
cpus with DAZ. All 64bit capable ones can do it (particularly relevant for
AMD cpus as they supported sse3 very very late) but if really necessary we
can check support for that for real with some more magic.
(In fact just about ANY cpu with sse2 can support DAZ, I believe the only
exception are first gen P4 (Willamette) and from those only early steppings
which can't do it it's almost like intel forgot to add it... - a real pity
though docs say you can't just try to set it as they will throw a GPF.)
While this was meant to address https://bugs.freedesktop.org/show_bug.cgi?id=67672
it does not fix it. Most likely the tests need fixing as I don't think
there's any guarantee about denorm handling in the reference math library
functions if the flags aren't set to standard values. Nevertheless enabling
DAZ on all cpus which can do it should be the right thing to do.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
For AVX it's not sufficient to only rely on the cpuid flags. If the CPU
supports these extensions, but the OS doesn't, issuing these insns will
trigger an undefined opcode exception.
In addition to the AVX cpuid bit we also need to:
* test cpuid for OSXSAVE support
* XGETBV to check if the OS saves/restores AVX regs on context switches
See "Detecting Availability and Support" at
http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Not used yet but there's a couple of places in llvmpipe which should use this
(occlusion count is currently very inefficent if there's no cpu popcnt
instruction).
Without this, llvmpipe ends up giving a zero size to all uncompressed textures
on non-x86 systems, since align() cannot handle a 0 alignment.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.
Reviewed-by: Brian Paul <brianp@vmware.com>
dump_cpu is used only when DEBUG is defined.
Fixes the following GCC warning on builds without DEBUG defined.
util/u_cpu_detect.c:76: warning: 'debug_get_option_dump_cpu' defined but not used
check_os_katmai_support checks that the operating system running on a
SSE-capable processor supports SSE. This is necessary for unpatched
2.2.x and earlier kernels. 2.4.x and later kernels support SSE.
check_os_katmai_support will disable SSE capabilities for 32-bit x86
operating systems for which there is no code path. Currently, this
function handles Linux, Windows, and several BSDs. Mac OS, Cygwin, and
Solaris are several operating systems with no code paths.
Rather than add code for the unhandled operating systems, remove this
function altogether. This will fix SSE detection on all recent 32-bit
x86 operating systems. This completely breaks functionality on unpatched
2.2.x and earlier kernels, although there are likely no Gallium3D users
on such operating systems.
This commit silences the printing off most of the debug information
when running debug builds. The big culprits are: the tgsi sanity checker
that gets run on all shaders on debug; all the options; and
finaly the cpu caps printer.
As we are compiling with -D_BSD_SOURCE, sigjmp_buf and siglongjmp
should be replaced by the non-sig functions (see man 3 setjmp).
Tested on linux/cell.
Fixes a segfault and better code. Unfortunately using an arbitrary
register ("=r") causes the gcc to abort when the code is optimized saying
it can't satisfy the constraint. Setting seems to do the trick.
I was waiting for the need to use this code to arise, and it finally came.
I've tested building this on Linux and Windows, both x86 and x64_64. But
it might break other platforms. Please bear with me and help me fix it.
Many thanks to Dennis Smit who submitted this, and Eric Anholt whose
work this was based on.
It still needs a slight code massasing to integrate with the rest of
gallium (namely mapping the OS_* ARCH_* defines), but I'm commiting anyway
so that it is available to be used when somebody needs it.