fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-24 17:08:20 +02:00

Author	SHA1	Message	Date
Christian König	eaf7ec9cfc	st/va: add motion adaptive deinterlacing v2 v2: minor cleanup Signed-off-by: Christian König <christian.koenig@amd.com>	2016-01-18 10:59:32 +01:00
Michel Dänzer	ad20be1f30	gallium/radeon: Rename do_invalidate_resource to invalidate_buffer And only call it from r600_invalidate_resource for buffer resources. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-18 17:39:37 +09:00
Michel Dänzer	0491dd1deb	st/dri: Don't call invalidate_resource for NULL depth/stencil buffers Fixes crash in 4 EGL piglit tests with radeonsi. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-18 17:39:37 +09:00
Michel Dänzer	a9ab7172a6	radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2016-01-18 17:39:37 +09:00
Michel Dänzer	4297259fc8	radeonsi: Print "LLVM emitted unknown config register" warning only once Say "LLVM" instead of "Compiler" for clarity. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-18 17:39:37 +09:00
Oded Gabbay	679a654a77	llvmpipe: use vpkswss when dst is signed This patch fixes a bug when building a pack instruction. For POWER (altivec), in case the destination is signed and the src width is 32, we need to use vpkswss. The original code used vpkuwus, which emits an unsigned result. This fixes the following piglit tests on ppc64le: - spec@arb_color_buffer_float@gl_rgba8-drawpixels - shaders@glsl-fs-fogscale I've also corrected some coding style issues in the function. v2: Returned else statements to vmware style Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-18 09:45:25 +02:00
Ilia Mirkin	4ac1274caa	gm107/ir: don't do indirect frag shader inputs on GM107 Apparently the IPA op decided to stop working with offsets. Need to figure out if we need to do an AL2P situation or something similar. For now just turn it back off. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-17 16:37:04 -05:00
Ilia Mirkin	3281ae96c8	tgsi: initialize Atomic field in tgsi_default_declaration Spotted by Coverity. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-17 16:37:04 -05:00
Ilia Mirkin	5a81b48ad0	nvc0: bsp_bo can't be null We already deref it earlier. And these are all allocated on load. Spotted by Coverity. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-17 16:37:04 -05:00
Oded Gabbay	529aa8249a	llvmpipe: fix arguments order given to vec_andc This patch fixes a classic "confuse the enemy" bug. _mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the arguments are opposite. _mm_andnot_si128 performs "r = (~a) & b" while vec_andc performs "r = a & (~b)" To make sure this error won't return in another place, I added a wrapper function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-17 21:07:27 +02:00
Rob Clark	02ac91d717	freedreno/ir3: fix mad 3rd src delay calc In `fad158a0` ("freedreno/ir3: array rework") the src # (n) shifted by one, but missed updating delay-slot calc. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-17 12:21:45 -05:00
Rob Clark	2a6ec1e061	freedreno/ir3: better array register allocation Detect arrays which don't conflict with each other and allow overlapping register allocation. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:23:52 -05:00
Rob Clark	6a33c5c0df	freedreno/ir3: array offset can be negative It at least happens with some piglit tests, like $piglit/bin/vp-address-01 VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], COLOR DCL CONST[0..7] DCL ADDR[0] 0: ARL ADDR[0].x, IN[1].xxxx 1: MOV_SAT OUT[1], CONST[ADDR[0].x-1] 2: DP4 OUT[0].x, CONST[4], IN[0] 3: DP4 OUT[0].y, CONST[5], IN[0] 4: DP4 OUT[0].z, CONST[6], IN[0] 5: DP4 OUT[0].w, CONST[7], IN[0] 6: END Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:23:20 -05:00
Rob Clark	ddede497b8	freedreno/ir3: workaround bug/feature Seems like in certain cases, we cannot use c<a0.x+0> as the third src to cat3 instructions. This may be slightly conservative, we may only have this restriction when the first src is also const. This fixes, for example, +24/-0 of the variable-indexing piglit tests. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:22:43 -05:00
Rob Clark	ebd3a1fc17	ttn: use writemask for store_var Only user is freedreno, and after array-rework it can cope. Avoids generating loads for a store. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:21:52 -05:00
Rob Clark	fad158a0e0	freedreno/ir3: array rework Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:21:08 -05:00
Rob Clark	cc7ed34df9	freedreno/ir3: refactor/simplify cp If we handle separately the special case of eliminating output mov (which includes keeps and various other cases where we don't have a consuming instruction's src register to collapse things into), we can simplify the logic. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:20:46 -05:00
Rob Clark	680664dff9	freedreno/ir3: fix incorrect decoding of mov instructions Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:20:37 -05:00
Rob Clark	2809c87f90	freedreno/ir3: remove unused tgsi tokens ptr Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:18:59 -05:00
Rob Clark	fc0d2f7e02	freedreno/ir3: bit of ra refactor Shuffle things slightly, passing instr-data to ra_name() to reduce the number of places where we need to add support for array names. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:18:47 -05:00
Rob Clark	d430f443de	freedreno/ir3: cosmetic de-indent Collapse two nested if's into one to reduce indent level. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:18:33 -05:00
Rob Clark	6f0377d651	ttn: add missing writemask on store_output Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-01-16 13:35:44 -05:00
Ilia Mirkin	32a9fe013b	nv50/ir: add saturate support on ex2 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-16 00:10:56 -05:00
Jeff Muizelaar	e5fefe49f2	gallivm: avoid crashing in mod by 0 with llvmpipe This adds code that is basically the same as the code in umod, udiv and idiv. However, unlike idiv we return -1. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-16 03:36:29 +01:00
Roland Scheidegger	03f66dfb4b	llvmpipe: ditch additional ref counting for vertex/geometry sampler views The cleaning up was quite a performance hog (making pipe_resource_reference the number two in profilers on the vertex path, and 3rd overall, with its cousin pipe_reference_described not far behind) if there were lots of tiny draw calls (ipers). Now the reason was really that it was blindly calling this for all potential shader views (so 32 each for vs and gs) even though the app never touched a single one which could have been fixed, however I can't come up with a good reason why we refcount these. We've got references, of course, in the sampler views, which should be quite sufficient as we do all vertex and geometry shader execution fully synchronous. (Calling prepare_shader_sampling for all draw calls even if there were no changes looks quite suboptimal too, but generally we don't really expect vs/gs shader sampling to be used much with llvmpipe, and there's even an early exit if there aren't any views to avoid the "null loop" albeit it's now no longer always trying to loop through all 32 slots. Maybe improve another time...). Of course, if we manage to make vertex loads run asynchronously some day, we need references again, but adding that back would be the least of the problems... Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the vertex side depends on it (I suppose we'd really wanted a separate flag in any case). (Good for a 3% improvement or so in ipers under the right conditions.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-15 20:13:45 +01:00
Roland Scheidegger	2f9a325b6a	llvmpipe: fix "leaking" textures This was not really a leak per se, but we were referencing the textures for longer than intended. If textures were set via llvmpipe_set_sampler_views() (for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they were referenced in the setup state. However, the only way to unreference them was by replacing them with another texture, and not when the texture slot was replaced with a NULL sampler view. (They were then further also referenced by the scene too which might have additional minor side effects as we limit the memory size which is allowed to be referenced by a scene in a rather crude way.) Only setup destruction (at context destruction time) then finally would get rid of the references. Fix this by noting the number of textures the last time, and unreference things if the new view is NULL (avoiding having to unreference things always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked). Found by code inspection, no test... v2: rename var Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-15 20:13:45 +01:00
Ilia Mirkin	fffb559129	nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP shaders: total local used in shared programs : 54116 -> 30372 (-43.88%) Probably modest advantage to execution, but it's an imporant prerequisite to dropping some of the TGSI optimizations done by the state tracker. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-14 20:14:01 -05:00
Ilia Mirkin	e231f59b6d	nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection Previously we were treating any indirect temp array usage to mean that everything should end up in lmem. The MemoryOpt pass would clean a lot of that up later, but in the meanwhile we would lose a lot of opportunity for optimization. This helps a lot of Metro 2033 Redux and a handful of KSP shaders: total instructions in shared programs : 6288373 -> 6261517 (-0.43%) total gprs used in shared programs : 944051 -> 945131 (0.11%) total local used in shared programs : 54116 -> 54116 (0.00%) A typical case is for register usage to double and for instructions to halve. A future commit can also optimize local memory usage size to be reduced with better packing. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-14 20:13:59 -05:00
Ilia Mirkin	37b67db6ae	nvc0/ir: be careful about propagating very large offsets into const load Indirect constbuf indexing works by using very large offsets. However if an indirect constbuf index load is const-propagated, it becomes a very large const offset. Take that into account when legalizing the SSA by moving the high parts of that offset into the file index. Also disallow very large (or small) indices on most other instructions. This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests. Fixes: `abd326e81b` (nv50/ir: propagate indirect loads into instructions) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-14 18:20:27 -05:00
Ilia Mirkin	7a521ddf36	nvc0: allow fragment shader inputs to use indirect indexing Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-14 14:28:04 -05:00
Marek Olšák	dc96a18d24	radeonsi: don't miss changes to SPI_TMPRING_SIZE I'm not sure about the consequences of this bug, but it's definitely dangerous. This applies to SI, CIK, VI. Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-14 19:55:41 +01:00
Charmaine Lee	6303231a1d	svga: add DXGenMips command support For those formats that support hw mipmap generation, use the DXGenMips command. Otherwise fallback to the mipmap generation utility. Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench) v2: make sure the texture surface was created with the render target bind flag set relocation flag to SVGA_RELOC_WRITE for the texture surface Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-14 10:44:25 -07:00
Charmaine Lee	78e628ae43	svga: add num-generate-mipmap HUD query The actual increment of the num-generate-mipmap counter will be done in a subsequent patch when hw generate mipmap is supported. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-14 10:39:53 -07:00
Charmaine Lee	3038e8984d	gallium/st: add pipe_context::generate_mipmap() This patch adds a new interface to support hardware mipmap generation. PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify if this new interface is supported; if not supported, the state tracker will fallback to mipmap generation by rendering/texturing. v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers v3: add format to the generate_mipmap interface to allow mipmap generation using a format other than the resource format v4: fix return type of trace_context_generate_mipmap() Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-14 10:39:53 -07:00
Nicolai Hähnle	e976860638	gallium/radeon: do not reallocate user memory buffers The whole point of AMD_pinned_memory is that applications don't have to map buffers via OpenGL - but they're still allowed to, so make sure we don't break the link between buffer object and user memory unless explicitly instructed to. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-14 09:41:24 -05:00
Nicolai Hähnle	321140d563	gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-14 09:41:04 -05:00
Nicolai Hähnle	08c71740ad	gallium/radeon: reset valid_buffer_range on PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE This accomodates a streaming pattern where the discard flag is set when the application wraps back to the beginning of the buffer. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-14 09:40:00 -05:00
Nicolai Hähnle	654670b404	gallium: add PIPE_CAP_INVALIDATE_BUFFER It makes sense to re-use pipe->invalidate_resource for the purpose of glInvalidateBufferData, but this function is already implemented in vc4 where it doesn't have the expected behavior. So add a capability flag to indicate that the driver supports the expected behavior. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-14 09:39:38 -05:00
Nicolai Hähnle	cbcdef7b40	winsys/radeon: fix warnings about incompatible pointer types Some confusion between pb_buffer and radeon_bo as well as between radeon_drm_winsys and radeon_winsys. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-14 09:33:58 -05:00
Marek Olšák	4ea0febcb0	radeonsi: move POSITION and FACE fragment shader inputs to system values And FACE becomes integer instead of float. Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-13 12:27:28 +01:00
Marek Olšák	caf3c2abea	radeonsi: simplify gl_FragCoord behavior It will become a system value, not an input. Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-13 12:27:28 +01:00
Roland Scheidegger	38cdcb000d	llvmpipe: (trivial) use cast wrapper for __m128d to __m128 casts some compiler was unhappy.	2016-01-13 04:48:41 +01:00
Roland Scheidegger	49ec647c3b	llvmpipe: avoid most 64 bit math in rasterization The trick here is to recognize that in the c + n * dcdx calculations, not only can the lower FIXED_ORDER bits not change (as the dcdx values have those all zero) but that this means the sign bit of the calculations cannot be different as well, that is sign(c + ndcdx) == sign((c >> FIXED_ORDER) + n(dcdx >> FIXED_ORDER)). That shaves off more than enough bits to never require 64bit masks. A shifted plane c value could still easily exceed 32 bits, however since we throw out planes which are trivial accept even before binning (and similarly don't even get to see tris for which there was a trivial reject plane)) this is never a problem. The idea isnt't all that revolutionary, in fact something similar was tried ages ago (`9773722c2b`) back when the values were only 32 bit anyway. I believe now it didn't quite work then because the adjustment needed for testing trivial reject / partial masks wasn't handled correctly. This still keeps the separate 32/64 bit paths for now, as the 32 bit one still looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled from setup which would be a good reason to ditch the 32 bit path, we'd need to change the special-purpose rasterization functions for small tris). This passes piglit triangle-rasterization (-fbo -auto -max_size -subpixelbits 8) and triangle-rasterization-overdraw (with some hacks to make it work correctly with large sizes) easily (full piglit as well of course, but most tests wouldn't use triangles large enough to be affected, that is tris with a bounding box over 128x128). The profiler says indeed time spent in rast_tri functions is reduced substantially, BUT of course only if the tris are large. I measured a 3% improvement in mesa gloss demo when supersized to twice the screen size... Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-13 03:50:57 +01:00
Roland Scheidegger	16530fdc82	llvmpipe: scale up bounding box planes to subpixel precision Otherwise some planes we get in rasterization have subpixel precision, others not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports with subpixel accuracy, so could even do bounding box calcs with that). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-13 03:34:59 +01:00
Roland Scheidegger	0298f5aca7	llvmpipe: add sse code for fixed position calculation This is quite a few less instructions, albeit still do the 2 64bit muls with scalar c code (they'd need way more shuffles, plus fixup for the signed mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed scalar muls natively just fine after all (even on 32bit). (This still doesn't have a very measurable performance impact in reality, although profiler seems to say time spent in setup indeed has gone down by 10% or so overall. Maybe good for a 3% or so improvement in openarena.) Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-13 03:34:09 +01:00
Roland Scheidegger	9422999e40	draw: fix key comparison with uninitialized value Discovered by accident, valgrind was complaining (could have possibly caused us to create redundant geometry shader variants). v2: convinced by Brian and Jose, just use memset for both gs and vs keys, just as easy and less error prone.	2016-01-13 02:43:04 +01:00
Tom St Denis	56fc2986d5	st/omx: Avoid segfault in deconstructor if constructor fails If the constructor fails before the LIST_INIT calls the pointers will be null and the deconstructor will segfault. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2016-01-12 19:13:19 +01:00
Christian König	6f898f740c	vl: use preferred format for deinterlacing Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2016-01-12 13:28:42 +01:00
Christian König	5fdd4a5aef	vl: improve motion adaptive deinterlacer Handle other formats than YV12 as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2016-01-12 13:28:39 +01:00
Christian König	e945235aed	st/va: add BOB deinterlacing v2 Tested with MPV. v2: correctly handle compositor deinterlacing as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2016-01-12 13:28:35 +01:00

1 2 3 4 5 ...

25808 commits