fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-06-09 03:38:18 +02:00

Author	SHA1	Message	Date
Marek Olšák	c3e527f93d	radeonsi: only enable write confirmation on the last CP DMA packet This should improve performance for big copies that need to be split. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-07 10:22:12 +01:00
Ilia Mirkin	8e9ade7eb3	nv50/ir: allow emission of immediates in imul/imad ops Nothing actually uses this yet (due to complications), but the emission logic is right. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-07 00:42:15 -05:00
Ilia Mirkin	393d0c336b	nv50/ir: properly set the type of the constant folding result This removes the hack used for merge, which only covers a fraction of the cases. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 19:39:32 -05:00
Ilia Mirkin	2f9aaed749	nv50/ir: add support for const-folding OP_CVT with F64 source/dest Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 19:39:32 -05:00
Ilia Mirkin	76957389fc	nv50/ir: add fp64 opcode emission support for G200 (NVA0) Need to emulate rcp/rsq before providing full fp64 support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 18:36:25 -05:00
Hans de Goede	f979d3cfec	nv50/ir: Add support for 64bit immediates to checkSwapSrc01 Now that we support 64 bit immediates in insnCanLoad, we need to swap 64 bit immediate sources too for optimal effect. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 18:13:31 -05:00
Hans de Goede	9f2f8bda6e	nvc0/ir: Teach insnCanLoad about double immediates Teach insnCanLoad about double immediates, together with the "Add support for merge-s to the ConstantFolding pass" This turns the following (nvc0) code: 1: mov u32 $r2 0x00000000 (8) 2: mov u32 $r3 0x3fe00000 (8) 3: add f64 $r0d $r0d $r2d (8) Into: 1: add f64 $r0d $r0d 0.500000 (8) Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 18:13:31 -05:00
Hans de Goede	428506ece2	nv50/ir: Add support for merge-s to the ConstantFolding pass This allows later passes like LoadPropagation to properly deal with 64 bit immediates. If the new 64 bit load this introduces does not get optimized away then split64BitOpPostRA() will split this into 2 instructions again. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 18:13:31 -05:00
Ilia Mirkin	2437f00853	nv50/ir: disallow 64-bit immediates on nv50 targets No instructions are able to load short immediates like nvc0 can. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 18:13:31 -05:00
Ilia Mirkin	11e3dac36e	nv50/ir: allow movs with TYPE_F64 destinations to be split Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 18:13:31 -05:00
Hans de Goede	b487b55f7d	gm107/ir: Add support for double immediates Add support for encoding double immediates (up to 20 bits of precision) into the generated gm107 machine-code. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 17:22:40 -05:00
Hans de Goede	12c850d01c	nvc0/ir: Add support for double immediates Add support for encoding double immediates (up to 20 bits of precision) into the generated nvc0 machine-code. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 17:22:40 -05:00
Boyuan Zhang	6bad554d98	radeon/uvd: fix VC-1 simple/main profile decode v2 We just needed to set the extra width/height fields to get this working. v2 (chk): rebased, CC stable added, commit message added, fixed coding style Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>	2015-11-06 20:07:23 +01:00
Boyuan Zhang	ed55def44f	st/vaapi: fix vaapi VC-1 simple/main corruption v2 Apply the start code fix only to advanced profile. v2 (chk): add commit message Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>	2015-11-06 20:07:23 +01:00
Julien Isorce	cc1e5c972e	st/va: add support for RGBX and BGRX in VPP Before it was only possible to convert a NV12 surface to RGBA or BGRA. This patch uses the same post processing function, "handleVAProcPipelineParameterBufferType", but add definitions for RGBX and BGRX. This patch also makes vlVaQuerySurfaceAttributes more generic to avoid copy and pasting the same lines. Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-11-06 17:33:45 +00:00
Julien Isorce	42a5e143a8	vl/buffers: add RGBX and BGRX to the supported formats Useful is one wants to create RGBX or BGRX surfaces. The infrastructure is such that it required just a few definitions to support these formats. Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-11-06 17:33:38 +00:00
Julien Isorce	bf6acbb2db	st/va: properly use brackets in vlVaAcquireBufferHandle's switch In "switch (mem_type)" the brackets were surrounding "case+default" instead of "case" only. Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-11-06 17:33:16 +00:00
Julien Isorce	bfc245e9ac	st/va: properly indent buffer.c, config.c, image.c and picture.c Some lines were using 4 indentation spaces instead of 3. Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-11-06 17:33:01 +00:00
Rob Clark	6459e780ae	freedreno/a4xx: fix blend color Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-11-06 11:19:04 -05:00
Rob Clark	7465e16124	freedreno: update generated headers Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-11-06 11:18:47 -05:00
Guillaume Charifi	6f5e0c08a4	freedreno: add a305 support Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-11-06 11:17:58 -05:00
Boyan Ding	8f55ebe802	freedreno/ir3: Use nir_foreach_variable Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com> Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-11-06 11:17:53 -05:00
Ilia Mirkin	d68226087c	nvc0: reintroduce BGRA4 format support Commit `342e68dc60` (nvc0: remove BGRA4 format support) removed the support to fix a WoW trace. However after further experimentation, I was able to get the blit to work by using a different "fake" format in the 2d engine. The reason why this worked on nv50 is that nv50 falls back to the 3d blit path in case either the src or the dst aren't "faithfully" supported, while nvc0 only does it for the dst format. RG8 is better supported by the nvc0 2d engine than R16. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-06 00:47:44 -05:00
Julien Isorce	497bde6727	st/va: fix memory leak on error in vlVaCreateSurfaces2 Found by coverity: CID #1337953 Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-11-05 23:39:45 +00:00
Julien Isorce	e0b896c86c	st/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2 Some lines were using 4 indentation spaces instead of 3. Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-11-05 23:39:43 +00:00
Roland Scheidegger	5ae37ae615	llvmpipe: disable texture cache There are some weird problems with 8-wide vectors.	2015-11-05 18:00:42 +01:00
Ilia Mirkin	ba093a099a	nouveau: send back a debug message when waiting for a fence to complete Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-05 11:22:19 -05:00
Ilia Mirkin	4f6cd5fad0	nv50,nvc0: provide debug messages with shader compilation stats Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-05 11:22:19 -05:00
Ilia Mirkin	4335b28840	nouveau: add support for sending debug messages via KHR_debug Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-05 11:22:19 -05:00
Ilia Mirkin	6706cc1671	st/clover: provide a path for drivers to call through to pfn_notify Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> [ Francisco Jerez: Clean up clover::context interface by passing around a function object. ]	2015-11-05 11:22:19 -05:00
Ilia Mirkin	fc76cc05e3	gallium: expose a debug message callback settable by context owner This will allow gallium drivers to send messages to KHR_debug endpoints Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-11-05 11:22:18 -05:00
Ilia Mirkin	bb73fc4cb8	nouveau: relax fence emit space assert We also have the "reserved for kick" space available. Some of my earlier changes can probably be removed, but this is a quick fix for some of the rarer fallout. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <mesa-stable@lists.freedesktop.org>	2015-11-04 22:43:56 -05:00
Eric Anholt	6d3a24bce8	vc4: When the create ioctl fails, free our cache and try again. This greatly increases the pressure you can put on the driver before create fails. Ultimately we need to let the kernel take control of our cached BOs and just take them from us (and other clients) directly, but this is a very easy patch for the moment. Cc: "11.0" <mesa-stable@lists.freedesktop.org>	2015-11-04 14:04:14 -08:00
Eric Anholt	3f7c96c36c	vc4: Print the rounded shader size in debug output. It's surprising to see "0kb" printed for debug on short shaders, while 4kb alignment won't be suprising.	2015-11-04 13:32:07 -08:00
Eric Anholt	4a951f1c08	vc4: Fix dumping the size of BOs allocated/cached. 60MB of cached BOs are a lot less scary than 600MB.	2015-11-04 13:32:07 -08:00
Brian Paul	d31481e70a	svga: implement 'white_fragments' option for VGPU10 fragment shaders When we emulate XOR logicop mode with blend-subtract, we need to ensure that the fragment shader always emits white. We had this implemented for VGPU9, but not VGPU10. VMware bug 1545492. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2015-11-04 11:51:41 -07:00
Brian Paul	149ac1fe43	u_vbuf: minor code reformatting / line wrapping Trivial.	2015-11-04 11:51:41 -07:00
Brian Paul	e450d4371a	u_vbuf: add some const qualifiers Trivial.	2015-11-04 11:51:40 -07:00
Brian Paul	3f98c812b3	svga: use new enum indices_mode type Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2015-11-04 11:51:40 -07:00
Brian Paul	fa6efbd27d	util/indices: replace #define tokens with enum type To ease debugging in gdb. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2015-11-04 11:51:40 -07:00
Roland Scheidegger	c19443bc8b	gallivm: fix sampling for s3tc srgb formats when using texture cache This actually stored the values as 8bit linear values in the cache, then did another srgb->linear conversion... We don't want to do the former (decoding 8bit srgb values to 8bit linear completely defeats the purpose of srgb in the first place), so just decode to 8bit srgb. Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.	2015-11-04 14:21:43 +01:00
Roland Scheidegger	9285ed98f7	llvmpipe: add cache for compressed textures compressed textures are very slow because decoding is rather complex (and because there's no jit code code to decode them too for non-technical reasons). Thus, add some texture cache which holds a couple of decoded blocks. Right now this handles only s3tc format albeit it could be extended to work with other formats rather trivially as long as the result of decode fits into 32bit per texel (ideally, rgtc actually would decode to more than 8 bits per channel, but even then making it work for it shouldn't be too difficult). This can improve performance noticeably but don't expect wonders (uncompressed is unsurprisingly still faster). It's also possible it might be slower in some cases (using nearest filtering for example or if there's otherwise not many cache hits, the cache is only direct mapped which isn't great). Also, actual decode of a block relies on util code, thus even though always full blocks are decoded it is done texel by texel - this could obviously benefit greatly from simd-optimized code decoding full blocks at once... Note the cache is per (raster) thread, and currently only used for fragment shaders. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-11-04 02:51:02 +01:00
Oded Gabbay	39b4dfe6ab	llvmpipe: use simple coeffs calc for 128bit vectors There are currently two methods in llvmpipe code to calculate coeffs to be used as inputs for the fragment shader. The two methods use slightly different ways to do the floating point calculations and thus produce slightly different results. The decision which method to use is determined by the size of the vector that is used by the platform. For vectors with size of more than 128bit, a single-step method is used, in which coeffs_init_simple() + attribs_update_simple() are called. For vectors with size of 128bit or less, a two-step method is used, in which coeffs_init() + attribs_update() are called. This causes some piglit tests (clip-distance-bulk-copy, interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with 128bit vectors (such as ppc64le or x86-64 without AVX). This patch makes platforms with 128bit vectors use the single-step method (aka "simple" method) instead of the two-step method. This would make the resulting coeffs identical between more platforms, make sure the piglit tests passes, and make debugging and maintainability a bit easier as the generated LLVM IR will be the same for more platforms. The performance impact is negligible for x86-64 without AVX, and basically non-existent for ppc64le, as it can be seen from the following benchmarking results: - glxspheres, on ppc64le: - original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec - Additional 0.8% performance boost - glxspheres, on x86-64 without AVX: - original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec - Additional 0.74% performance boost - glmark2, on ppc64le: - original code: score of 58 - with my change: score of 57 - glmark2, on x86-64 without AVX: - original code: score of 175 - with the patch: score of 167 - Impact of of -4.5% on performance - OpenArena, on ppc64le: - original code: 3398 frames 1719.0 seconds 2.0 fps 255.0/505.9/2773.0/0.0 ms - with the patch: 3398 frames 1690.4 seconds 2.0 fps 241.0/497.5/2563.0/0.2 ms - 29 seconds faster with the patch, which is about 2% - OpenArena, on x86-64 without AVX: - original code: 3398 frames 239.6 seconds 14.2 fps 38.0/70.5/719.0/14.6 ms - with the patch: 3398 frames 244.4 seconds 13.9 fps 38.0/71.9/697.0/14.3 ms - 0.3 fps slower with the patch (about 2%) Additional details can be found at: http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-11-04 02:38:53 +01:00
Marek Olšák	3b37155a68	gallium/radeon: allow returning SDMA fences from pipe->flush pipe->flush never returned SDMA fences. This fixes it. This is only an issue on amdgpu where fences can signal out of order. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-04 00:43:14 +01:00
Marek Olšák	7f9122c968	gallium/radeon: always return the last SDMA fence on SDMA flush if needed Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-04 00:43:14 +01:00
Samuel Pitoiset	e887407491	nvc0: add missing compute parameters required by clover This fixes crashes with some piglit OpenCL tests. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-03 22:17:00 +01:00
Samuel Pitoiset	e640ba41ed	nvc0: handle NULL pointer in nvc0_get_compute_param() To get the size (in bytes) of a compute parameter, clover first calls get_compute_param() with a NULL data pointer. The RET() macro is based on nv50. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-03 22:16:45 +01:00
Samuel Pitoiset	00bb524716	nv50: use correct heaps for FP and GP code segments This is just a cosmetic change. Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2015-11-01 23:29:20 +01:00
Ilia Mirkin	67635a0a71	nouveau: get rid of tabs Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-10-31 19:58:14 -04:00
Dave Airlie	425d8c2578	virgl/vtest: fix extra malloc This somehow got added twice, drop the first one. Reported by Coverity. Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-10-31 18:05:33 +10:00

... 51 52 53 54 55 ...

27608 commits