Commit graph

18861 commits

Author SHA1 Message Date
José Fonseca
b844c8e039 util/u_math: Define NAN/INFINITY macros for MSVC.
Untested. But should hopefully fix the build.
2013-07-20 00:31:18 +01:00
Zack Rusin
f59cb67376 llvmpipe/tests: update arith test to check for edge cases
Test infs, zeros and nans with our arith functions to assure
correct/defined behavior with those values.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-19 16:29:18 -04:00
Zack Rusin
f7c06785d0 gallivm: add a log function that handles edge cases
Same as log2_safe, which means that it can handle infs, 0s and
nans.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-19 16:29:18 -04:00
Zack Rusin
018c69ac56 gallivm: export unordered/ordered cmp to a common function
Only the floating point operarators change everything else
is the same so it makes sense to share the code.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-19 16:29:18 -04:00
Zack Rusin
192c68b85a gallivm: handle -inf, inf and nan's in sin/cos instructions
sin/cos for anything not finite is nan and everything else has
to be between [-1, 1].

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-19 16:29:17 -04:00
Zack Rusin
13e2cd2f2c gallivm: add a version of log2 which handles edge cases
That means that if input is:
 * - less than zero (to and including -inf) then NaN will be returned
 * - equal to zero (-denorm, -0, +0 or +denorm), then -inf will be returned
 * - +infinity, then +infinity will be returned
 * - NaN, then NaN will be returned
It's a separate function because the checks are a little bit costly
and in most cases are likely unnecessary.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-19 16:29:17 -04:00
Zack Rusin
7b672c1503 gallivm: fix edge cases in exp2
exp(0) has to be exactly 1, exp(-inf) has to be 0, exp(inf) has
to be inf and exp(nan) has to be nan, this fixes all of those
cases.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-19 16:29:17 -04:00
Zack Rusin
ab47bbecd6 gallivm: handle nan's in min/max
Both D3D10 and OpenCL say that if one the inputs is nan then
the other should be returned. To preserve that behavior
the patch fixes both the sse and the non-sse paths in both
functions and adds helper code for handling nans.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-19 16:29:17 -04:00
José Fonseca
719000bd7d scons: Disallow undefined symbols in Xlib libGL.so.
It's not the first time that, due to missing build dependencies or
incomplete commits, we end up with a broken libGL.so that's missing
symbols, causing all tests to fail catastrophically.

Instead try to catch this sort of issues earlier.
2013-07-19 13:08:07 +01:00
Roland Scheidegger
4ef19f7fec llvmpipe: clamp inputs for srgb render buffers
Usually with fixed point renderbuffers clamping is done as part of conversion.
However, since we blend in float format, we essentially skip all conversion
steps pre-blend but since this is still a fixed point renderbuffer we must
still clamp the inputs in this case. Makes no difference for piglit though.
Obviously we could skip this if fragment color clamping is enabled, but a)
this is deprecated in OpenGL (d3d never had it) and b) we don't support it
natively so it gets baked into the shader.
Also add some comment about logic ops being broken for srgb, luckily no test
tries to do that as there's no easy fix...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-07-18 19:04:20 +02:00
Roland Scheidegger
e57b98bad3 llvmpipe: fix blending with SRC_ALPHA_SATURATE with some formats without alpha
We were fixing up the blend factor to ZERO, however this only works correctly
with fixed point render buffers where the input values are clamped to 0/1
(because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped
inputs). Haven't seen any failure anywhere due to that with fixed point SNORM
buffers (which clamp inputs to -1/1) but it should apply there as well (snorm
blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all,
d3d10 requires them but they are not blendable).
Doesn't look like piglit hits this though (some internal testing hits the
float case at least). (With legacy OpenGL we could theoretically still use the
fixup to zero if the fragment color clamp is enabled, but we can't detect that
easily since we don't support native clamping hence it gets baked into the
shader.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-07-18 19:03:35 +02:00
Marek Olšák
0d7f087483 r600g: use WAIT_3D_IDLE before using CP DMA
I broke this with 7948ed1250 for r700 at least.
2013-07-18 14:27:34 +02:00
Jonathan Gray
0b405f364f r300g: make use of gallium's os_get_process_name()
Lets the code compile on non Linux systems.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
2013-07-18 14:04:48 +02:00
Ilia Mirkin
fbdae1ca41 nv50: H.264/MPEG2 decoding support via VP2, available on NV84-NV96, NVA0
Adds H.264 and MPEG2 codec support via VP2, using firmware from the
blob. Acceleration is supported at the bitstream level for H.264 and
IDCT level for MPEG2.

Known issues:
 - H.264 interlaced doesn't render properly
 - H.264 shows very occasional artifacts on a small fraction of videos
 - MPEG2 + VDPAU shows frequent but small artifacts, which aren't there
   when using XvMC on the same videos

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2013-07-18 07:52:32 +02:00
Roland Scheidegger
7fd30a8621 gallivm: (trivial) simplify lp_build_cos/lp_build_sin a tiny bit
Use "or" instead of "add" (this is a classic select sequence, which at
least newer llvm versions can actually recognize (3.2+?), and the "add"
might prevent that - and we really don't want an add instead of an or with
avx if it isn't recognized (even without avx logic ops might be cheaper)).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-17 18:16:34 +02:00
Roland Scheidegger
f0f9fb59c3 util/u_format_s3tc: handle srgb formats correctly.
Instead of just ignoring the srgb/linear conversions, simply call the
corresponding conversion functions, for all of pack/unpack/fetch,
both for float and unorm8 versions (though some don't make a whole
lot of sense, i.e. unorm8/unorm8 srgb/linear combinations).
Refactored some functions a bit so don't have to duplicate all the code
(there's a slight change for packing dxt1_rgb, as there will now be
always 4 components initialized and sent to the external compression
function so the same code can be used for all, the quite horrid and
ad-hoc interface (by now) should always have worked with that).

Fixes llvmpipe/softpipe piglit texwrap GL_EXT_texture_sRGB-s3tc.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-17 18:16:27 +02:00
Vadim Girlin
07baf9cfd1 r600g/sb: improve alu packing on cayman
Scheduler/register allocator in r600-sb was developed and optimized
on evergreen (VLIW-5) hardware, so currently it's not optimal for
VLIW-4 chips.
This patch should improve performance on cayman gpus due to better alu
packing, but also it tends to increase register usage, so overall positive
effect on performance has to be proven by real benchmarks yet.

Some results with bfgminer kernel on cayman:
source bytecode:       60 gprs, 3905 alu groups,
sbcl before the patch: 45 gprs, 4088 alu groups,
sbcl with this patch:  55 gprs, 3474 alu groups.

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17 18:29:56 +04:00
Vadim Girlin
ba7fa4c4c9 r600g/sb: fix handling of new multislot instructions on cayman
Ex-scalar instructions that became multislot on cayman do replicate result
to all channels - handle them similar to DOT4.

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17 18:27:31 +04:00
Vadim Girlin
033eec4145 r600g/sb: fix debug dump code in scheduler
Update the stale debug code for other changes related to debug output.

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17 18:27:31 +04:00
Vadim Girlin
44ebe7291c r600g/sb: fix initial register allocation
Mark values that are members of the 'same register' constraint as
preallocated in ra_init pass, this will prevent incorrect
reallocation in scheduler in some cases.

Should fix https://bugs.freedesktop.org/show_bug.cgi?id=66713

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17 18:27:30 +04:00
Vadim Girlin
f0d881106a r600g/sb: move chip & class name functions to sb_context
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17 18:27:30 +04:00
Vadim Girlin
96efa4cdf4 r600g/sb: fix handling of PS in source bytecode on cayman
Actually PS doesn't make sense for cayman and isn't even mentioned in
cayman docs, but llvm backend currently uses it in bytecode and, assuming
that hw seems to be mostly ok with it, this will allow sb to parse such
source bytecode correctly.

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17 18:27:30 +04:00
Vinson Lee
81d3881367 r600g/sb: Initialize ra_checker member variables.
Fixes "Uninitialized scalar field" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2013-07-17 18:27:30 +04:00
Emil Velikov
b20e0fb520 gallium/util: use explicily sized types for {un, }pack_rgba_{s, u}int
Every function but the above four uses explicitly sized types for their
src and dst arguments. Even fetch_rgba_{s,u}int follows the convention.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
2013-07-17 13:01:46 +02:00
Kyle McMartin
87c3440567 llvmpipe: use MCJIT on ARM and AArch64
MCJIT is the only supported LLVM JIT on AArch64 and ARM (the regular
JIT has bit-rotted badly on ARM and doesn't exist on AArch64.)

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2013-07-17 17:29:01 +10:00
Roland Scheidegger
dc1cc928ed llvmpipe: support sRGB framebuffers
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.

v2: prettify a bit, use separate function for packing.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-16 01:54:51 +02:00
Marek Olšák
a882067d74 Revert "r300g: allow HiZ with a 16-bit zbuffer"
This reverts commit 631c631cbf.

https://bugs.freedesktop.org/show_bug.cgi?id=66921

Cc: mesa-stable@lists.freedesktop.org
2013-07-15 23:46:01 +02:00
Marek Olšák
7969b567bd r300g/swtcl: fix a lockup in MSAA resolve
Cc: mesa-stable@lists.freedesktop.org
2013-07-15 23:45:22 +02:00
Marek Olšák
22427640b2 r300g/swtcl: fix geometry corruption by uploading indices to a buffer
The splitting of a draw call into several draw commands was broken, because
the split sometimes took place in the middle of a primitive. The splitting
was supposed to be dealing with the case when there are more indices than
the maximum size of a CS.

This commit throws that code away and uses a real index buffer instead.

https://bugs.freedesktop.org/show_bug.cgi?id=66558

Cc: mesa-stable@lists.freedesktop.org
2013-07-15 23:45:16 +02:00
Roland Scheidegger
796b73d1fe gallivm: (trivial) use constant instead of exp2f() function
Some lame compilers can't do exp2f() and as far as I can tell they can't do
exp2() (with doubles) neither so instead of providing some workaround for
that (wouldn't actually be too bad just replace with pow) and since it is
used with a constant only just use the precalculated constant.
2013-07-14 02:39:33 +02:00
Chia-I Wu
62c546bbf8 ilo: skip 3DSTATE_INDEX_BUFFER when possible
When only the offset to the index buffer is changed, we can skip the
3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add
(offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
2013-07-14 05:59:52 +08:00
Roland Scheidegger
6bcbb0dc82 gallivm: handle srgb-to-linear and linear-to-srgb conversions
srgb-to-linear is using 3rd degree polynomial for now which should be _just_
good enough. Reverse is using some rational polynomials and is quite accurate,
though not hooked into llvmpipe's blend code yet and hence unused (untested).
Using a table might also be an option (for srgb-to-linear especially).
This does not enable any new features yet because EXT_texture_srgb was already
supported via util_format fallbacks, but performance was lacking probably due
to the external function call (the table used by the util_format_srgb code may
not be all that much slower on its own).
Some performance figures (taken from modified gloss, replaced both base and
sphere texture to use GL_SRGB instead of GL_RGB, measured on 1Ghz Sandy Bridge,
the numbers aren't terribly accurate):

normal gloss, aos, 8-wide: 47 fps
normal gloss, aos, 4-wide: 48 fps

normal gloss, forced to soa, 8-wide: 48 fps
normal gloss, forced to soa, 4-wide: 47 fps

patched gloss, old code, soa, 8-wide: 21 fps
patched gloss, old code, soa, 4-wide: 24 fps

patched gloss, new code, soa, 8-wide: 41 fps
patched gloss, new code, soa, 4-wide: 38 fps

So there's a performance hit but it seems acceptable, certainly better
than using the fallback.
Note the new code only works for 4x8bit srgb formats, others (L8/L8A8) will
continue to use the old util_format fallback, because I can't be bothered
to write code for formats noone uses anyway (as decoding is done as part of
lp_build_unpack_rgba_soa which can only handle block type width of 32).
Compressed srgb formats should get their own path though eventually (it is
going to be expensive in any case, first decompress, then convert).
No piglit regressions.

v2: use lp_build_polynomial instead of ad-hoc polynomial construction, also
since keeping both linear to srgb functions for now make sure both are
compiled (since they share quite some code just integrate into the same
function).

v3: formatting fixes and bugfix in the complicated (disabled) linear-to-srgb
path.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-13 18:42:17 +02:00
Roland Scheidegger
9b8d97e5bf gallivm: better support for fast rsqrt
We had to disable fast rsqrt before because it wasn't precise enough etc.
However in situations when we know we're not going to need more precision
we can still use a fast rsqrt (which can be several times faster than
the quite expensive sqrt). Hence introduce a new helper which does exactly
that - it is probably not useful calling it in some situations if there's
no fast rsqrt available so make it queryable if it's available too.

v2: use fast_rsqrt consistently instead of rsqrt_fast, fix indentation,
let rsqrt use fast_rsqrt.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-13 18:42:17 +02:00
Vinson Lee
b0c3c955ae r600g/sb: Initialize ra_constraint::cost.
Fixes "Uninitialized scalar field" reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2013-07-13 06:57:26 +04:00
Marek Olšák
06b38dbab2 winsys/radeon: allow a NULL cs pointer in radeon_bo_map to fix a segfault
The original idea was that cs=NULL should be allowed here, but we never used
NULL until 862f69fbe1. This fixes a segfault in CoreBreach.
2013-07-13 02:38:23 +02:00
Chia-I Wu
8d4ac98549 ilo: move a santiy check into its assert()
The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and
can be eliminated in a release build in gen6_pipeline_end().  Move the call
into the assert().
2013-07-13 07:27:28 +08:00
Chia-I Wu
bf9670270f ilo: mark some states dirty when they are really changed
The checks may seem redundant because cso_context handles them, but
util_blitter does not have access to cso_context.
2013-07-13 06:43:53 +08:00
Chia-I Wu
9047598a8d ilo: clean up ilo_blitter_pipe_begin()
Document why certain states need to be saved, and fix a bug when blitting with
scissor enabled.
2013-07-13 06:43:53 +08:00
Alex Deucher
e0a7565832 r600g: don't use the CB/DB CP COHER logic on r6xx
There are hw bugs.  Flush and inv event is sufficient.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=66837

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-12 18:07:56 -04:00
Brian Paul
bf86e0e050 nv30: fix KILL_IF breakage
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66858
2013-07-12 10:00:18 -06:00
Zack Rusin
00cd455bd5 gallium: fixup definitions of the rsq and sqrt
GLSL spec says that rsq is undefined for src<=0, but the D3D10
spec says it needs to be a NaN, so lets stop taking an absolute
value of the source which completely breaks that behavior. For
the gl program we can simply insert an extra abs instrunction
which produces the desired behavior there.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-07-11 20:19:04 -04:00
José Fonseca
a171812d27 util/u_format: Comment out half float denormal test case.
So that lp_test_format doesn't fail until we decide what should be done.
2013-07-12 15:48:38 +01:00
José Fonseca
1b0d29b5da gallivm: Eliminate redundant lp_build_select calls.
lp_build_cmp already returns 0 / ~0, so the lp_build_select call is
unnecessary.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-12 15:40:16 +01:00
Brian Paul
46205ab8cc tgsi: rename the TGSI fragment kill opcodes
TGSI_OPCODE_KIL and KILP had confusing names.  The former was conditional
kill (if any src component < 0).  The later was unconditional kill.
At one time KILP was supposed to work with NV-style condition
codes/predicates but we never had that in TGSI.

This patch renames both opcodes:
  TGSI_OPCODE_KIL -> KILL_IF   (kill if src.xyzw < 0)
  TGSI_OPCODE_KILP -> KILL     (unconditional kill)

Note: I didn't just transpose the opcode names to help ensure that I
didn't miss updating any code anywhere.

I believe I've updated all the relevant code and comments but I'm
not 100% sure that some drivers had this right in the first place.
For example, the radeon driver might have llvm.AMDGPU.kill and
llvm.AMDGPU.kilp mixed up.  Driver authors should review their code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-12 08:32:51 -06:00
Brian Paul
f501baabdb tgsi: fix-up KILP comments
KILP is really unconditional fragment kill.

We've had KIL and KILP transposed forever.  I'll fix that next.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-12 08:32:51 -06:00
Brian Paul
e7c3898725 tgsi: exec TGSI_OPCODE_SQRT as a scalar instruction, not vector
To align with the docs and the state tracker.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-12 08:32:51 -06:00
Brian Paul
f3fad24b62 tgsi: use X component of the second operand in exec_scalar_binary()
The code happened to work in the past since the (scalar) src args
effectively always have a swizzle of .xxxx, .yyyy, .zzzz, or .wwww so
whether you grab the X or Y component doesn't really matter.  Just
fixing the code to make it look right.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-12 08:32:51 -06:00
Brian Paul
9fc532a263 os: add os_get_process_name() function
v2: explicitly test for BSD/APPLE, #warning for unexpected
environments.
2013-07-12 08:32:50 -06:00
Brian Paul
919236f3a2 softpipe: silence some MSVC warnings 2013-07-12 08:19:52 -06:00
Brian Paul
76666b9394 hud: silence some MSVC warnings 2013-07-12 08:19:52 -06:00