fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-06 13:48:06 +02:00

Author	SHA1	Message	Date
Eric Anholt	229bf4475f	vc4: Optimize CL emits by doing size checks up front. The optimizer obviously doesn't have the ability to rewrite these to skip the size checks per call, so we have to do it manually. Improves a norast benchmark on simulation by 0.779706% +/- 0.405838% (n=6087).	2014-12-24 10:28:26 -10:00
Eric Anholt	20e3a2430e	vc4: Avoid repeated hindex lookups in the loop over tiles. Improves norast performance of a microbenchmark by 11.1865% +/- 2.37673% (n=20).	2014-12-24 08:28:33 -10:00
Kenneth Graunke	4616b2ef85	i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms. This was probably missed when moving from a fixed binding table layout to a dynamic one that changes based on the shader. Fixes newly proposed Piglit test fbo-mrt-new-bind. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87619 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Mike Stroyan <mike@LunarG.com> Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>	2014-12-24 00:15:40 -08:00
Kenneth Graunke	b7f14e03e3	i965: Cache register write capability checks. Our ability to perform register writes depends on the hardware and kernel version. It shouldn't ever change on a per-context basis, so we only need to check once. Checking introduces a synchronization point between the CPU and GPU: even though we submit very few GPU commands, the GPU might be busy doing other work, which could cause us to stall for a while. On an idle i7 4750HQ, this improves performance in OglDrvCtx (a context creation microbenchmark) by 6.14748% +/- 1.6837% (n=20). With Unigine Valley running in the background (to keep the GPU busy), it improves performance in OglDrvCtx by 2290.92% +/- 29.5274% (n=5). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2014-12-24 00:15:40 -08:00
Rob Clark	f332cf92b6	freedreno/ir3: split out legalize pass Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-23 19:53:01 -05:00
Rob Clark	4097ef6ee8	freedreno/ir3: ra debug Some compile time RA debug Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-23 19:53:01 -05:00
Alexander von Gluck IV	402c808372	egl/haiku: Clean up SConscript whitespace	2014-12-23 09:07:58 -05:00
Alexander von Gluck IV	49ce07878d	egl/dri2: Fix build of dri2 egl driver with SCons * egl/dri2 was missing a SConscript * Problem caught by Adrián Arroyo Calle	2014-12-23 09:07:58 -05:00
Alexander von Gluck IV	e7ac21202d	egl: Clean up Haiku visual creation * Only create one struct * 'final' also is a language conflict * Some style cleanup	2014-12-23 09:07:58 -05:00
Alexander von Gluck IV	400b833592	egl: Add Haiku code and support * This is the cleaned up work of the Haiku GCI student Adrián Arroyo Calle adrian.arroyocalle@gmail.com * Several patches were consolidated to prevent unnecessary touching of non-related code	2014-12-23 09:07:57 -05:00
Timothy Arceri	da4fb3e7a1	glsl: check if implicitly sized arrays match explicitly sized arrays across the same stage V2: Improve error message. Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-12-23 19:32:56 +11:00
Chad Versace	414be86c96	i965: Use safer pointer arithmetic in gather_oa_results() This patch reduces the likelihood of pointer arithmetic overflow bugs in gather_oa_results(), like the one fixed by `b69c7c5dac`. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I get nervous when I see code patterns like this: (void) + (int) (int) I smell 32-bit overflow all over this code. This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any potential overflow. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>	2014-12-22 15:47:14 -06:00
Chad Versace	225a09790d	i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy() This patch reduces the likelihood of pointer arithmetic overflow bugs in intel_texsubimage_tiled_memcpy() , like the one fixed by `b69c7c5dac`. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I recently solved, in commit `b69c7c5dac`, an overflow bug in a line of code that looks very similar to pointer arithmetic in this function. This patch conceptually applies the same fix as in `b69c7c5dac`. Instead of retyping the variables, though, this patch adds some casts. (I tried to retype the variables as ptrdiff_t, but it quickly got very messy. The casts are cleaner). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>	2014-12-22 15:47:11 -06:00
Chad Versace	aebcf26d82	i965: Fix intel_miptree_map() signature to be more 64-bit safe This patch should diminish the likelihood of pointer arithmetic overflow bugs, like the one fixed by `b69c7c5dac`. Change the type of parameter 'out_stride' from int to ptrdiff_t. The logic is that if you call intel_miptree_map() and use the value of 'out_stride', then you must be doing pointer arithmetic on 'out_ptr'. Using ptrdiff_t instead of int should make a little bit harder to hit overflow bugs. As a side-effect, some function-scope variables needed to be retyped to avoid compilation errors. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>	2014-12-22 15:47:07 -06:00
Chad Versace	d11bc9fe8d	i965: Remove spurious casts in copy_image_with_memcpy() If a pointer points to raw, untyped memory and is never dereferenced, then declare it as 'void' instead of casting it to 'void'. Signed-off-by: Chad Versace <chad.versace@linux.intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-22 15:46:54 -06:00
Marek Olšák	2150db4d5d	radeonsi: force NaNs to 0 This fixes incorrect rendering in Unreal Engine demos. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83510 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-12-21 20:34:38 +01:00
David Heidelberg	4fb1d00f4e	st/nine: fix DBG typo (trivial) Signed-off-by: David Heidelberg <david@ixit.cz> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2014-12-21 20:34:19 +01:00
David Heidelberg	fbfe2918f4	r300g: implement ARR opcode Same as ARL, just has extra rounding. Useful for st/nine. Tested-by: Pavel Ondračka <pavel.ondracka@email.cz> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: David Heidelberg <david@ixit.cz> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2014-12-21 20:34:19 +01:00
Rob Clark	aa6415b485	freedreno/a4xx: blend-color Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-20 12:08:37 -05:00
Rob Clark	10d81a03b3	freedreno/a4xx: alpha-test Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-20 12:08:37 -05:00
Rob Clark	097d760aac	freedreno: update generated headers	2014-12-20 12:08:37 -05:00
Rob Clark	f20a0acd43	freedreno/ir3: trans_kill cleanup trans_kill() only handles the single opcode. Drop the remnant of a time when both KILL and KILL_IF were handled by the same fxn. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-20 12:08:37 -05:00
Rob Clark	4ee545646d	freedreno/ir3: hack for standalone compiler Standalone compiler doesn't have screen or context. We need to come up with a better way to control the target arch (ie. something that we can control from cmdline w/ standalone compiler) but for now this hack keeps it from segfault'ing. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-20 12:08:37 -05:00
Matt Turner	a5481d6fbb	i965/fs: Add missing const qualifier.	2014-12-19 12:55:13 -08:00
Eric Anholt	e06b0778f5	vc4: Coalesce MOVs into VPM with the instructions generating the values. total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)	2014-12-18 15:00:56 -08:00
Eric Anholt	a871eff16c	vc4: Redefine VPM writes as a (destination) QIR register file. This will let me coalesce the VPM writes into the instructions generating the values.	2014-12-17 22:35:08 -08:00
Timothy Arceri	a9e77896a7	docs: note change in minimum GCC version to 4.2.0 Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>	2014-12-18 16:08:27 +11:00
Timothy Arceri	743a684512	gallium: remove support for GCC older than 4.2.0 Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-12-18 16:08:19 +11:00
Timothy Arceri	6852dce591	mesa: bump required GCC version to 4.2.0 It turns out Mesa hasn't compiled on less then 4.2 for a while so update conf to reflect this. Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-12-18 16:08:11 +11:00
Eric Anholt	e473fbe469	vc4: Add support for turning constant uniforms into small immediates. Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.	2014-12-17 19:35:13 -08:00
Eric Anholt	ff266483fb	vc4: Move follow_movs() to common QIR code. I want this from other passes.	2014-12-17 19:05:52 -08:00
Eric Anholt	8d22e8907f	vc4: Fix missing newline for load immediate instruction disasm.	2014-12-17 19:05:52 -08:00
Matt Turner	18ebf9e251	mesa: Remove unnecessary -f from $(RM). $(RM) includes -f.	2014-12-17 17:54:33 -08:00
Matt Turner	b2b6cf2437	mesa: Remove tarballs/checksum rules.	2014-12-17 17:54:33 -08:00
Matt Turner	4cc8d66f74	gallium: Add egl and gbm to distribution.	2014-12-17 17:54:33 -08:00
Matt Turner	baedd68ca9	mesa: Set DISTCHECK_CONFIGURE_FLAGS. Enable some non-default options that distros are likely to use.	2014-12-17 17:54:33 -08:00
Matt Turner	ce48ce425a	targets/xvmc: Add uninstall hooks to handle megadriver hardlinks.	2014-12-17 17:54:33 -08:00
Matt Turner	ed1ac1d574	targets/vdpau: Add uninstall hooks to handle megadriver hardlinks.	2014-12-17 17:54:33 -08:00
Matt Turner	adc2922f9c	targets/vdpau: Add clean-local rule to remove .lib links.	2014-12-17 17:54:33 -08:00
Eric Anholt	06890c444a	vc4: Add a userspace BO cache. Since our kernel BOs require CMA allocation, and the use of them requires new mmaps, it's pretty expensive and we should avoid it if possible. Copying my original design for Intel, make a userspace cache that reuses BOs that haven't been shared to other processes but frees BOs that have sat in the cache for over a second. Improves glxgears framerate on RPi by around 30%.	2014-12-17 16:07:01 -08:00
Eric Anholt	39bc936011	vc4: Add dmabuf support. This gets DRI3 working on modesetting with glamor. It's not enabled under simulation, because it looks like handing our dumb-allocated buffers off to the server doesn't actually work for the server's rendering.	2014-12-17 16:07:01 -08:00
Eric Anholt	113044e1b9	vc4: Drop a weird argument in the BOs-from-handles API.	2014-12-17 16:06:17 -08:00
Roland Scheidegger	f97b731c82	draw: revert using correct order for prim decomposition. This reverts `db3dfcfe90`. The commit was correct but we've got some precision problems later in llvmpipe (or possibly in draw clip) due to the vertices coming in in different order, causing some internal test failures. So revert for now. (Will only affect drivers which actually support constant-interpolated attributes and not just flatshading.)	2014-12-17 20:17:42 +01:00
Jan Vesely	bc18b48924	util: Silence signed-unsigned comparison warnings Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-12-17 17:15:36 +00:00
Cody Northrop	83e8bb5b1a	i965: Require pixel alignment for GPU copy blit The blitter will start at a pixel's natural alignment. For PBOs, if the provided offset if not aligned, bits will get dropped. This change adds offset alignment check for src and dst, kicking back if the requirements are not met. The change is based on following verbiage from BSPEC: Color pixel sizes supported are 8, 16, and 32 bits per pixel (bpp). All pixels are naturally aligned. Found in the following locations: page 35 of intel-gfx-prm-osrc-hsw-blitter.pdf page 29 of ivb_ihd_os_vol1_part4.pdf page 29 of snb_ihd_os_vol1_part5.pdf This behavior was observed with Steam Big Picture rendering incorrect icon colors. The fix has been tested on Ubuntu and SteamOS on Haswell. Signed-off-by: Cody Northrop <cody@lunarg.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83908 Reviewed-by: Neil Roberts <neil@linux.intel.com>	2014-12-16 16:04:14 -08:00
Mark Janes	fc016bc0f3	i965: remove includes of sampler.h from extern "C" blocks C linkage was removed from functions in program/sampler.cpp. However, some cpp files include program/sampler.h within extern "C" blocks, causing link errors for test_vec4_copy_propagation. Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-16 15:39:55 -08:00
Kenneth Graunke	3eb6258db7	i965/query: Cache whether the batch references the query BO. Chris Wilson noted that repeated calls to CheckQuery() would call drm_intel_bo_references(brw->batch.bo, query->bo) on each invocation, which is expensive. Once we've flushed, we know that future batches won't reference query->bo, so there's no point in asking more than once. This patch adds a brw_query_object::flushed flag, which is a conservative estimate of whether the batch has been flushed. On the first call to CheckQuery() or WaitQuery(), we check if the batch references query->bo. If not, it must have been flushed for some reason (such as being full). We record that it was flushed. If it does reference query->bo, we explicitly flush, and record that we did so. Any subsequent checks will simply see that query->flushed is set, and skip the drm_intel_bo_references() call. Inspired by a patch from Chris Wilson. According to Eero, this does not affect the performance of Witcher 2 on Haswell, but approximately halves the userspace CPU usage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-16 15:39:54 -08:00
Kenneth Graunke	cb5cfb8361	i965/query: Use brw_bo_map to handle stall warnings. This is less code and also measures the duration of the stall for us. Our old code predates the existance of brw_bo_map(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-16 15:39:54 -08:00
Kenneth Graunke	9c47653d32	i965/query: Remove redundant drm_intel_bo_references call in CheckQuery. CheckQuery calls drm_intel_bo_references to see if the batch references the query BO, and if so, flushes. It then checks if the query BO is busy, and if not, calls gen6_queryobj_get_results(). Stupidly, gen6_queryobj_get_results() immediately did a second redundant drm_intel_bo_references check, even though we know the buffer is not referenced and in fact idle. This patch moves the batch-flush check out of gen6_queryobj_get_results and into WaitQuery() (the other caller). That way, both callers do a single batch-flush check. This should only be a minor improvement, since it would only affect the first CheckQuery call where the result is actually available. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-16 15:39:53 -08:00
Kenneth Graunke	12c16f4f27	i965/query: Add query->bo == NULL early return in CheckQuery hook. If query->bo == NULL, this is a redundant CheckQuery call, and we should simply return. We didn't do anything anyway - we skipped the batch flushing block, and although we called get_results(), it has an early return and does nothing. Why bother? Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-16 15:39:53 -08:00

... 9 10 11 12 13 ...

67598 commits