Commit graph

19152 commits

Author SHA1 Message Date
Vadim Girlin
29ff2e907d r600g: fix color exports when we have no CBs
We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-08-30 15:51:11 +04:00
Vinson Lee
74be77a99e nvc0/ir: Initialize NVC0LegalizePostRA member variables.
Fixes "Uninitialized pointer field" defects reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2013-08-29 20:42:24 -07:00
Roland Scheidegger
a479f34025 gallivm: support per-pixel min/mag filter in SoA path
Since we can have per-pixel lod we should also honor the filter per-pixel
(in fact we didn't honor it per quad neither in the multiple quad case).
Do this by running the linear path and simply beating the weights into shape
(the sample with the higher weight is the one which should have been chosen
with nearest filtering hence adjust filter weight to 1.0/0.0 based on that).
If all pixels use nearest filter (either min and mag) then still run just a
nearest filter as this is way cheaper (probably around 4 times faster for 2d,
more for 3d case) and it should be relatively rare that pixels really need
different filtering. OTOH if all pixels would require linear don't do anything
special since the linear path with filter adjustments shouldn't really be all
that much more expensive than ordinary linear, and we think it's rare that
min/mag filters are configured differently so there doesn't seem much value
in trying to optimize this further.
This does not yet fix the AoS path (though currently AoS is only used for
single quads hence it could be considered less broken, just never honoring
per-pixel filter decision but doing it per quad).

v2: simplify code a bit (unify min linear and min nearest cases)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-30 02:16:45 +02:00
Roland Scheidegger
81cfcdbd87 gallivm: don't calculate square root of rho if we use accurate rho method
While a sqrt here and there shouldn't hurt much (depending on the cpu) it is
possible to completely omit it since rho is only used for calculating lod and
there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for
calculating lod this means we get a simple mul instead of sqrt (in case of
nearest mip filter in fact we don't need to replace the sqrt with something
else at all), only in some not very useful path this doesn't work (combined
brilinear calculation of int level and fractional lod, accurate rho calc but
brilinear filtering seems odd).
Apart from being faster as an added bonus this should increase our crappy
fractional accuracy of lod, since fast_log2 is only good for ~3bits and this
should increase accuracy by one bit (though not used if dimension is just one
as we'd need an extra mul there as we never had the squared rho in the first
place).

v2: use separate ilog2_sqrt function if we have squared rho.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-30 02:16:45 +02:00
Roland Scheidegger
10e40ad11d gallivm: refactor num_lods handling
This is just preparation for per-pixel (or per-quad in case of multiple quads)
min/mag filter since some assumptions about number of miplevels being equal
to number of lods no longer holds true.
This change does not change behavior yet (though theoretically when forcing
per-element path it might be slower with different min/mag filter since the
code will respect this setting even when there's no mip maps now in this case,
so some lod calcs will be done per-element just ultimately still the same
filter used for all pixels).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-30 02:16:45 +02:00
Vinson Lee
4a6d2f3dd7 radeonsi: Early return if no depth or stencil on release builds.
Fixes "Missing break in switch" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2013-08-29 15:49:12 -07:00
Rob Clark
de10d383d0 freedreno: pipe loader for either kgsl or msm
The downstream android kernel driver is "kgsl", the upstream drm/kms
driver is called "msm".  Since libdrm_freedreno handles the differences
between the two, we need to load the same thing for either device.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-29 17:35:05 -04:00
Rob Clark
e95b7d89b9 freedreno: updates for msm drm/kms driver
There where some small API tweaks in libdrm_freedreno to enable support
for msm drm/kms driver.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-29 17:35:05 -04:00
Rob Clark
0267f264cc freedreno/a3xx/compiler: handle sync flags better
We need to set the flag on all the .xyzw components that are written by
the instruction, not just on .x.  Otherwise a later use of rN.y (for
example) will not trigger the appropriate sync bit to be set.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-29 17:35:04 -04:00
Rob Clark
4a2b5b2384 freedreno/a3xx/compiler: better const handling
Seems like most/all instructions have some restrictions about const src
registers.  In seems like the 2 src (cat2) instructions can take at most
one const, and the 3 src (cat3) instructions can take at most one const
in the first 2 arguments.  And so on.  Handle this properly now.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-29 17:35:04 -04:00
Jonathan Gray
57cf5946ce radeonsi: Make sure libdrm_radeon headers are picked up from the right place
And remove libdrm/ from a winsys include statement.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
2013-08-29 15:37:44 +02:00
Brian Paul
4e7f1346ae draw: fix point/line/triangle determination in draw_need_pipeline()
The previous point/line/triangle() functions didn't handle GS primitives.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-08-29 07:29:31 -06:00
Christian König
aebd065a64 radeon/uvd: fix MPEG2/4 ref frame index limit
Otherwise the first few frames have an incorrect reference index.

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-08-29 08:51:12 +02:00
Vinson Lee
57684d52e9 nouveau: Copy m4x4 and m8x8 separately.
Silences Coverity "Out-of-bounds access" defect.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2013-08-28 23:23:49 -07:00
Marek Olšák
adb93e3bda r300g: enable MSAA on r300-r400, be careful about using color compression
MSAA was tested by one user on RS690 and it works for him with color
compression (CMASK) disabled. Our theory is that his chipset lacks CMASK RAM.

Since we don't have hardware documentation about which chipsets actually have
CMASK RAM, I had to take a guess based on the presence of HiZ.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-08-27 23:18:54 +02:00
Roland Scheidegger
bd3909f265 draw: clean up setting stream out information a bit
In particular noone is interested in the vertex count, so drop that,
and also drop the duplicated num_primitives_generated /
so.primitives_storage_needed variables in drivers. I am unable for now to figure
out if primitives_storage_needed in SO stats (used for d3d10) should
increase if SO is disabled, though the equivalent num_primitives_generated
used for OpenGL definitely should increase. In any case we were only counting
when SO is active both in softpipe and llvmpipe anyway so don't pretend there's
an independent num_primitives_generated counter which would count always.
(This means the PIPE_QUERY_PRIMITIVES_GENERATED count will still be wrong just
as before, should eventually fix this by doing either separate counting for this
query or adjust the code so it always counts this even if SO is inactive depending
on what's correct for d3d10.)

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-08-27 16:59:39 +02:00
Roland Scheidegger
aff2ecf09a llvmpipe: support nested/overlapping queries for all query types
There's just no way resetting the counters is working with nested/overlapping
queries.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-08-27 16:59:01 +02:00
Roland Scheidegger
4900e625bd softpipe: support nested/overlapping queries for all query types
There's just no way resetting the counters is working with nested/overlapping
queries.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-08-27 16:58:20 +02:00
Tom Stellard
f3e86d4a68 clover: Don't use PIPE_TRANSFER_UNSYNCHRONIZED for blocking copies
CC: "9.2" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-08-26 18:27:03 -07:00
Niels Ole Salscheider
ef6ed7220a st/clover: Add event to deps even if it has been triggered
The command is submitted once the event has been triggered, but it might not
have completed yet. Therefore, we have to add it to deps in order to wait on it.

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-08-26 18:25:17 -07:00
Niels Ole Salscheider
4a3505d548 st/clover: Profiling support
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2013-08-26 18:25:17 -07:00
Dave Airlie
4763a032a0 tgsi_build: fix order of arguments for ind register build
This was broken when arrayid was added.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-08-27 10:41:27 +10:00
Dave Airlie
81204d0e9c tgsi: finish declaration parsing for arrays.
I previously fixed this partly in 9e8400f4c9,
however I didn't go far enough in testing it, now when I parse a TGSI shader
with arrays in it my iterator can see the ArrayID set to the proper value.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-08-27 10:41:09 +10:00
Brian Paul
92cbfded6a svga: replace 0 with PIPE_OK in a few places 2013-08-26 15:49:16 -06:00
Michel Dänzer
46fd81e586 radeonsi: Also set the depth component mask bit for stencil-only exports
The stencil values come out wrong without this for some reason.

50 more little piglits.

Cc: mesa-stable@lists.freedesktop.org
2013-08-26 15:47:50 +02:00
Henri Verbeet
b5ddaf9975 r600g: Implement the new float comparison instructions for Cayman as well.
I assume this should have been part of commit
7727fbb7c5. This (obviously) fixes a lot tests.

Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-08-25 13:00:02 +02:00
Ilia Mirkin
bac6efe8e3 nv30: add forgotten PIPE_CAP_CUBE_MAP_ARRAY cap to list
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
2013-08-25 10:47:28 +02:00
Ilia Mirkin
293fa4e559 nouveau/video: avoid overwriting base codec init with template
Commit 53e20b8b introduced the use of a template to initialize some
common fields. Move this copying of fields to before the common vp3
fields are initialized.

Reported-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christian König <christian.koenig@amd.com>
2013-08-25 10:14:30 +02:00
Rob Clark
56ea2c4816 freedreno/a3xx: don't leak so much
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:58:01 -04:00
Rob Clark
9b9038496c freedreno/a3xx/compiler: fix SGT/SLT/etc
The cmps.f.* instruction doesn't actually seem to give a float 1.0 or
0.0 output.  It either needs a cov.u16f16 or add.s + sel.f16.  This
makes SGT/SLT/etc more similar to CMP, so handle them in trans_cmp().

This fixes a bunch of piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:23:32 -04:00
Rob Clark
572d4646f7 freedreno/a3xx/compiler: bit of re-arrange/cleanup
It seems there are a number of cases where instructions have limitations
about taking reading src's from const register file, so make
get_unconst() a bit easier to use.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:23:32 -04:00
Rob Clark
d63bbac3a5 freedreno/a3xx/compiler: make compiler errors more useful
We probably should get rid of assert() entirely, but at this stage it is
more useful for things to crash where we can catch it in a debugger.
With compile_error() we have a single place to set an error flag (to
bail out and return an error on the next instruction) so that will be a
small change later when enough of the compiler bugs are sorted.

But re-arrange/cleanup the error/assert stuff so we at least get a dump
of the TGSI that triggered it.  So we see some useful output in piglit
logs.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:23:32 -04:00
Rob Clark
4c91930a25 freedreno: fix segfault when no color buffer bound
Don't crash when no color buffer bound.  Something caught when starting
to run piglit, fixes a hanful of piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:23:32 -04:00
Rob Clark
7eeab24344 freedreno/a3xx/compiler: cat4 cannot use const reg as src
Category 4 instructions (rsq, rcp, sqrt, etc) seem to be unable to take
a const register as src.  In these cases we need to move the src to a
temporary gpr first.

This is the second case of such a restriction, where the instruction
encoding appears to support a const src, but in fact the hw appears to
ignore that bit.  So split things out into a helper that can be re-used
for any instructions which have this limitation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:23:32 -04:00
Rob Clark
2effac5a67 freedreno/a3xx/compiler: use max_reg rather than file_count
Our current (rather naive) register assignment is based on mapping
different register files (INPUT, OUTPUT, TEMP, CONST, etc) based on the
max register index of the preceding file.  But in some cases, the lowest
used register in a file might not be zero.  In which case
file_count[file] != file_max[file] + 1.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:23:32 -04:00
Rob Clark
aee1ed708a freedreno/a3xx/compiler: handle saturate on dst
Sometimes things other than color dst need saturating, like if there is
a 'clamp(foo, 0.0, 1.0)'.  So for saturated dst add the extra
instructions to fix up dst.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:23:32 -04:00
Rob Clark
8b250bb8aa freedreno/a3xx/compiler: fix CMP
The 1st src to add.s needs (r) flag (repeat), otherwise it will end up:

  add.s dst.xyzw, tmp.xxxx -1

instead of:

  add.s dst.xyzw, tmp.xyzw, -1

Also, if we are using a temporary dst to avoid clobbering one of the src
registers, we actually need to use that as the dst for the sel
instruction.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:23:32 -04:00
Rob Clark
528bee59fe freedreno/a3xx: some texture fixes
Stop hard coding bits that indicate texture type (2d/3d/cube/etc).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:21:59 -04:00
Rob Clark
fd59f3ea98 freedreno: update register headers
resync w/ rnndb database

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:12:26 -04:00
Rob Clark
c2babfccb5 freedreno: add debug option to disable scissor optimization
Useful for testing and debugging.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:11:50 -04:00
Rob Clark
ae1a3f1736 freedreno/a3xx: fix viewport on gmem->mem resolve
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:04:29 -04:00
Rob Clark
fbef4e795f freedreno/a3xx: fix color inversion on mem->gmem restore
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24 13:04:29 -04:00
Niels Ole Salscheider
288a252523 radeonsi: Handle additional PIPE_COMPUTE_CAP_*
This patch adds support for:
PIPE_COMPUTE_CAP_MAX_INPUT_SIZE
PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE

Return the values reported by the closed source driver for now.

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-08-23 17:00:01 -07:00
Niels Ole Salscheider
04349541cd radeonsi: copy r600_get_timestamp
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-08-23 16:59:55 -07:00
Niels Ole Salscheider
db6f4165f4 radeonsi: Implement PIPE_QUERY_TIMESTAMP
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-08-23 16:59:44 -07:00
Roland Scheidegger
ad9b5b9ae9 gallivm: fix min/mag switchover point for nearest/none mip filter
Previously, the min/mag switchover point when using nearest/none mip
filter was effectively -0.5 which can't be right. Looks like new OpenGL
thinks it's ok if it's always 0.0 (older versions required 0.5 in some
cases), let's hope everybody else thinks that's fine too.
Refactor this slightly and get the per-quad/per-pixel min/mag decision
values further down to sampling, though still only the first component
is used yet.
While here also fix code trying to skip lod bias application etc. when
mipfilter is none, as this is still needed for determining min/mag filter.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-23 23:46:28 +02:00
Jon Severinsson
b47bde0079 gallium/osmesa: Link, not copy, the shared library to the LIB_DIR.
Just like all other mesa libraries...

CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-08-23 12:58:48 -07:00
Jon Severinsson
aeb9c9e4b0 gallium/osmesa: Always link with the c++ linker.
Just like all other gallium targets...

CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-08-23 12:58:45 -07:00
Jon Severinsson
c811190430 gallium/osmesa: Make and install an osmesa.pc.
As of "2f142d59 build: Add --enable-gallium-osmesa flag." the pkgconfig
file from classic osmesa is no longer installed when building gallium
osmesa, so copy it to gallium osmesa and install the copy instead.

CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-08-23 12:58:30 -07:00
Roland Scheidegger
bd0b6c5180 gallivm: do per-element lod for lod bias and explicit derivs too
Except for explicit derivs with cube maps which are very bogus anyway.
Just like explicit lod this is only used if no_quad_lod is set in
GALLIVM_DEBUG env var.
Minification is terrible on cpus which don't support true vector shifts
(but should work correctly). Cannot do the min/mag filter decision (if
they are different) per pixel though, only selecting different mip levels
works.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-22 19:05:52 +02:00