Commit graph

61998 commits

Author SHA1 Message Date
Kenneth Graunke
eabfadf4af i965: Report the type of color clear in INTEL_DEBUG=blorp.
It's useful to know whether a clear is fast (MCS-based), using the
SIMD16 repdata message, or slow.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-03-23 00:32:53 -07:00
Marek Olšák
011569b5b7 radeonsi: disable fast color clear for 1D-tiled surfaces on CIK
This will be re-enabled once my kernel fix lands.
2014-03-22 18:44:58 +01:00
Kenneth Graunke
4c79f088c0 Revert "i965: For color clears, only disable writes to components that exist."
This reverts commit 2919c3fdb4.

For formats like BGRX, looping through 0..num_components works fine.
But for formats like XRGB, we'd check the color mask for X and fail to
check it for B.
2014-03-21 17:03:20 -07:00
Kenneth Graunke
2919c3fdb4 i965: For color clears, only disable writes to components that exist.
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes.  Previously, we bailed
on it if color_mask wasn't <true, true, true, true>.  However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.

WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>.  This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.

Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 40%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
2014-03-21 15:35:08 -07:00
Kenneth Graunke
a63db538ad i965: Print number of multisamples in INTEL_DEBUG=blorp output.
This lets us distinguish MSAA resolves from other ordinary blits.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2014-03-21 15:34:59 -07:00
Kenneth Graunke
9834058a91 i965: Drop BLT TexSubImage Y-tiling restriction on Gen6+.
Currently, we don't use this path on Sandybridge because we suspect
other paths will be faster.  But we potentially could.  If we do, we
should allow it to support Y-tiled BLTs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-03-21 15:31:45 -07:00
Chris Forbes
351e13c5ad i965: Enable ARB_vertex_type_10f_11f_11f_rev for Gen4/5 also.
Tested on ILK and CTG (with the GL3isms taken out of the piglits).

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-22 09:19:55 +13:00
Tom Stellard
8d8d0cb09e clover: Fix typo in validate_object()
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2014-03-21 19:12:12 +01:00
Roland Scheidegger
9477d8c862 llvmpipe: add support for b5g6r5_srgb
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-03-21 17:23:38 +01:00
Roland Scheidegger
2aa77f2777 gallium: add b5g6r5 srgb format
GL generally doesn't seem to allow srgb formats with less (or more) than 8 bit
for the rgb channels, though some hw could easily do it (typically for formats
with up to 10 bits for the rgb channels, at least for formats with less than 8
bits support is likely widespread even). While it may be true there aren't
really any benefits for such formats, we need for it for d3d, though luckily
only for b5g6r5_srgb it seems.
So add this format along with the util code for conversion - since that util
code is heavily tuned for 8bit srgb this isn't really all that well optimized
and rounding doesn't seem right but at least it should give some halfway
meaningful results.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-03-21 17:23:38 +01:00
Ilia Mirkin
19ba573a57 nvc0/ir: move sample id to second source arg to fix sampler2DMS
The nvc0 texfetch instruction expects the sample id to be in the second
source (usually used for the offset) rather than as part of the texture
coordinate.

This fixes all the sampler2DMS/Array tests on nvc0.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
2014-03-20 20:47:47 -04:00
Marek Olšák
e5f6b6d0fe st/mesa: drop the lowering of quad strips to triangle strips
This fallback to triangle strips is silly and should be done in drivers
if they need it.

This should fix the case when quad strips are used with flatshading that is
enabled by the "flat" GLSL varying modifier. It also fixes primitive restart
for quad strips.

This fixes piglit:
  NV_primitive_restart/primitive-restart-draw-mode-quad_strip

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-21 00:50:53 +01:00
Marek Olšák
2706448a10 gallium/u_gen_mipmap: remove the software fallback
The last changes to it are from 2008 and 2009.
It doesn't support most texture formats and some texture targets.
Nobody can possibly be using this.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-21 00:50:53 +01:00
Marek Olšák
db722bdcab st/mesa: fix generating mipmaps for cube arrays
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-21 00:50:53 +01:00
Marek Olšák
91df26842f mesa: fix software fallback for generating mipmaps for 3D textures
It didn't use the driver-provided src/dstRowStride at all.
This was broken for the cases when stride != width*bpp.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-21 00:50:53 +01:00
Marek Olšák
78c60d1b63 mesa: fix software fallback for generating mipmaps for cube arrays
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-21 00:50:53 +01:00
Marek Olšák
185ad78ffd mesa: allow generating mipmaps for cube arrays
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-21 00:50:53 +01:00
Marek Olšák
55cf320ed8 mesa: fix texture border handling for cube arrays
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-21 00:50:53 +01:00
Marek Olšák
54690a5f3b r600g: use more appropriate names for async DMA functions
*_dma_copy calls either *_dma_copy_buffer or *_dma_copy_tile.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-03-20 19:03:40 +01:00
Marek Olšák
6c487ff3bd r600g: deobfuscate async DMA code
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-03-20 18:56:11 +01:00
Marek Olšák
2c703ee8ad r600g: don't flush the gfx IB explicitly before doing DMA
It's flushed by calling r600_context_bo_reloc.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-03-20 18:41:18 +01:00
Marek Olšák
e914d0052f winsys/radeon: only add duplicate relocations for DMA if VM isn't supported
Also rewrite the comment for it to be readable and reorder the code.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-03-20 18:41:17 +01:00
Niels Ole Salscheider
71254732db radeonsi: Implement DMA blit
This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.

I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.

v2: Marek> removed gfx.flush in si_dma_copy_tile

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2014-03-20 17:21:16 +01:00
Niels Ole Salscheider
acf55e7325 radeon: Move r600_need_dma_space to common code
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2014-03-20 17:21:16 +01:00
Richard Sandiford
f4b3430a36 llvmpipe: Tighten check for alpha-only formats
The AoS version of ld_build_blend_factor was assuming that if the first
channel was alpha, there were no rgb components.

Fixes glean/blendFunc on System z.  No piglit regressions on x86_64.
The shortcut is still used in tests like spec/ARB_framebuffer_object/
fbo-alpha.

Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
2014-03-20 16:50:40 +01:00
Jonathan Gray
8044fd6769 nouveau: don't assume libdrm include prefix
drm headers may be installed in a different directory

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-20 08:32:12 -04:00
Jonathan Gray
8fbc9d9b6f nouveau: use DLOPEN_LIBS instead of -ldl
libdl does not exist on many platforms which have dlopen in libc.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-20 08:32:12 -04:00
Brian Paul
eaf9affa5e c11/threads: don't include assert.h if the assert macro is already defined
In the gallium code, the assert() macro could come from either the
system's assert.h file (via c11/threads.h) or from gallium's u_debug.h.
It looks like all known assert.h files unconditionally #undef assert
before defining their own version.  So the assert you get depends on
whether threads.h or u_debug.h was included last.

In the gallium code we really want to use the assert() from u_debug.h
(it behaves better on Windows).  In gallium, c11/threads.h is only
included after u_debug.h in the os_thread.h wrapper.  So Adding
an #ifndef assert test in the threads*.h files avoids using the system's
assert().

Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2014-03-19 17:13:31 -06:00
Ilia Mirkin
e58071355e nouveau: there may not have been a texture if the fbo was incomplete
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
2014-03-19 18:20:29 -04:00
Ilia Mirkin
b676df9abf nouveau: add forgotten GL_COMPRESSED_INTENSITY to texture format list
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
2014-03-19 18:17:40 -04:00
Ilia Mirkin
18690995a6 mesa/main: condition GL_DEPTH_STENCIL on ARB_depth_texture
EXT_packed_depth_stencil is supported by all drivers, but
ARB_depth_texture isn't (notably nouveau_vieux). This should avoid
passing unexpected values down to ChooseTextureFormat.

The EXT_packed_depth_stencil spec does not make any explicit references
to requiring ARB_depth_texture in order to allow textures with that
format, however if there is no dependency, ARB_depth_texture would be
practically implied by the extension.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>

Note for 10.0 backport: This will produce a conflict, the solution is to
move the surrounding if as well.
2014-03-19 18:17:40 -04:00
Ilia Mirkin
51989817e6 loader: add special logic to distinguish nouveau from nouveau_vieux
There are a lot of different pci ids supported by nouveau, and more are
added all the time. The relevant distinguisher between drivers is the
chipset id.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
2014-03-19 18:17:40 -04:00
Matt Turner
c049dd4396 glsl: Allow dot() on scalars, and throw out dotlike().
In all uses of dotlike() we're writing generic code that operates on 1-4
component vectors. That our IR requires ir_binop_dot expressions'
operands to be 2+ component vectors is an implementation detail that's
not important when implementing built-in functions with dot(), which is
defined for scalar floats in GLSL.

Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 23:20:29 -07:00
Matt Turner
6cbc64c3cb glsl: Optimize pow(x, 2) into x * x.
Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.

Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 23:20:29 -07:00
Matt Turner
9a9eaaa79a glsl: Match whitespace changes from previous patch.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 23:20:29 -07:00
Matt Turner
7988b4804f glsl: Expose pack/unpack built-ins for ARB_gpu_shader5.
ARB_gpu_shader5 and ES 3.0 expose different subsets of
ARB_shading_language_packing.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 23:20:29 -07:00
Eric Anholt
651b8baa82 i965: Drop some more dead code from the old CACHED_BATCH feature.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 14:45:09 -07:00
Eric Anholt
512c88f826 i965: Drop special case for edgeflag thanks to Marek's change to core.
As of 780ce576bb, we end up with R8_SSCALED
anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 14:45:09 -07:00
Brian Paul
f4435da940 mesa: include stdbool.h in register_allocate.h to fix build
https://bugs.freedesktop.org/show_bug.cgi?id=76331
2014-03-18 13:28:17 -06:00
Ian Romanick
f74cf5f80e i965: Enable EWA anisotropic filtering algorithm
Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA
approximation algorithm results in higher image quality than the legacy
algorithm."  Using a classic anisotropic filtering "tunnel" demo, it
appears that there is *no* anisotropic filtering on IVB without this bit
set.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 10:56:38 -07:00
Kenneth Graunke
dd2e5d3999 i965: Actually initialize simd16_unsupported and no16_msg.
I meant to include this fixes in v3 of commit
de7ad2c88f, but accidentally pushed a
previous version.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 10:50:48 -07:00
Kenneth Graunke
91f4528da6 i965/upload: Refactor open-coded ALIGN-like computations.
Sadly, we can't use actual ALIGN(), since that only supports
power-of-two values for the alignment parameter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-03-18 10:39:04 -07:00
Kenneth Graunke
b8b4e280b4 i965: Fix indentation in brw_upload_indices().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-03-18 10:38:48 -07:00
Kenneth Graunke
051edcc144 i965: Consolidate code for setting brw->ib.start_vertex_offset.
This was set identically in three places.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-03-18 10:38:44 -07:00
Kenneth Graunke
7a0fd3ca1d i965: Allocate register sets at screen creation, not context creation.
Register sets depend on the particular hardware generation, but don't
depend on anything in the actual OpenGL context.  Computing them is
fairly expensive, and they take up a large amount of memory.  Putting
them in the screen allows us to compute/allocate them once for all
contexts, saving both time and space.

Improves the performance of a context creation/destruction
microbenchmark by about 3x on my Haswell i7-4750HQ.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:35:53 -07:00
Kenneth Graunke
b3e4b769dd i965: Allocate the screen using ralloc rather than calloc.
This will allow us to use the screen as a memory context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:31:12 -07:00
Eric Anholt
41097db91b ra: Convert another bool array to bitsets.
This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1,
with no performance difference on timing short shader-db runs (n=9/10,
warmup outlier removed).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-03-18 10:20:28 -07:00
Kenneth Graunke
da1cce2d68 ra: Use a bitset for storing which registers belong to a class.
This should use 1/8 the memory.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
2014-03-18 10:15:24 -07:00
Kenneth Graunke
8d856c3937 ra: Create a reg_belongs_to_class() helper function.
This is a little easier to read.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
2014-03-18 10:15:23 -07:00
Kenneth Graunke
786a647245 ra: Use bool instead of GLboolean.
This isn't the GL API, so there's no reason to use GLboolean.

Using bool is safer: any non-zero value is treated as "true".  When
converting a value to a GLboolean, all but the low byte is discarded,
which means that values like 256 will be incorrectly rendered as false.

Done via the following vim commands:
:%s/GLboolean/bool/g
:%s/GL_TRUE/true/g
:%s/GL_FALSE/false/g
and one line of manual whitespace tidying.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-18 10:15:18 -07:00