fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-06 11:38:05 +02:00

Author	SHA1	Message	Date
Roland Scheidegger	dea0fcf4e6	meta: (trivial) remove accidental double semicolon	2014-10-01 23:14:46 +02:00
Anuj Phogat	4330fa970b	i965: Enable EXT_framebuffer_multisample_blit_scaled for gen8 Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-10-01 12:04:15 -07:00
Anuj Phogat	68ee950c78	meta: Implement ext_framebuffer_multisample_blit_scaled extension Extension enables doing a multisample buffer resolve and buffer scaling using a single glBlitFrameBuffer() call. Currently, we have this extension implemented in BLORP which is only used by SNB and IVB. This patch implements the extension in meta path which makes it available to Broadwell. Implementation features: - Supports scaled resolves of 2X, 4X and 8X multisample buffers. - Avoids unnecessary shader compilations by storing the pre compiled shaders for each supported sample count. - Uses bilinear filtering for both GL_SCALED_RESOLVE_FASTEST_EXT and GL_SCALED_RESOLVE_NICEST_EXT filter options. This is an allowed behavior in the extension's spec. - I tried doing bicubic filtering for GL_SCALED_RESOLVE_NICEST_EXT filter. It made the edges in the image look little smoother but the image gets blurred causing no overall quality improvement. For now I have dropped the idea of doing different filtering for nicest filter. V2: - Minor changes to simplify the fragment shader. - Refactor the code to move i965 specific sample_map computation out of Meta. We now use ctx->Const.SampleMap{2,4,8}x variables initialized by the driver. - Use a simple msaa resolve shader for scaled resolves with scaling factor = 1.0. V3: - Make changes to create a string out of ctx->Const.SampleMap{2,4,8}x variables and use it in fragment shader. V4: - Make changes to use uint8_t type ctx->Const.SampleMap{2,4,8}x variables. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-10-01 12:04:15 -07:00
Anuj Phogat	7a4790148c	i965: Initialize the SampleMap{2,4,8}x variables with values specific to Intel hardware. V2: Define and use gen6_get_sample_map() function to initialize the variables. V3: Change the function name to gen6_set_sample_maps() and use memcpy() to fill in the data. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-10-01 12:04:15 -07:00
Anuj Phogat	38cd40faab	mesa: Add new variables in gl_context to store sample layout SampleMap{2,4,8}x variables are used in later patches to implement EXT_framebuffer_multisample_blit_scaled extension. V2: Use integer array instead of a string. Bump up the comment. V3: Use uint8_t type array. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-10-01 12:04:15 -07:00
Leo Liu	4f7916ab4f	st/va: implement vlVa(Query\|Create\|Get\|Put\|Destroy)Image This patch implements functions for images support, which basically supports copy data between video surface and user buffers, in this case supports SW decode, and other video output v2: fix buffer size for odd-sized image case expose I420 format as well v3: fix YUV 4:2:2 format data buffer size cleanup I420 format exposure Signed-off-by: Leo Liu <leo.liu@amd.com>	2014-10-01 13:21:36 -04:00
Christian König	7913c8943a	st/va: implement Picture functions for mpeg2 h264 and vc1 This patch implements codec for mpeg2 h264 and vc1, populates codec parameters and pass them to HW driver. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Leo Liu <leo.liu@amd.com>	2014-10-01 13:21:36 -04:00
Christian König	1be5515838	st/va: implement Context Surface and Buffer This patch implements context managements, relate it HW driver, functions for video surface managements, and functions for application data memory buffer managements. implemented functions: vlVa(Create\|Destroy)Context vlVa(Create\|Destroy\|Put)Surfaces vlVa(Create\|Destroy)Buffer Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Leo Liu <leo.liu@amd.com>	2014-10-01 13:21:36 -04:00
Christian König	2825ef3abf	st/va: implement vlVa(Create\|Destroy\|Query\|Get)Config This patch is for application to query configuration, such as profiles, entrypoints, and attributes v2: fix missing profile with query Signed-off-by: Michael Varga <michael.varga@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Leo Liu <leo.liu@amd.com>	2014-10-01 13:21:36 -04:00
Christian König	3867933ecb	st/va: skeleton VAAPI state tracker This patch adds a skeleton VA-API state tracker, which is filled with live in the subsequent patches. v2: fixes in configure.ac and va state_tracker Makefile.am v3: do not link against libva. detect libva version, and correctly set driver entrypoint name. rebase(cleanup) targets/va/Makefile.am v4: cleanup va version auto detection add back targets/va/va.sym Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2014-10-01 13:21:36 -04:00
Leo Liu	0eb8f89981	st/vdpau: move common functions to util Break out these functions so that they can be shared with a other state trackers. They will be used in subsequent patches for the new VA-API state tracker. Signed-off-by: Leo Liu <leo.liu@amd.com>	2014-10-01 13:21:36 -04:00
Rob Clark	204dd73c99	freedreno: max-texture-lod-bias should be 15.0f Fixes piglit lodbias test. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-10-01 07:28:06 -04:00
Kenneth Graunke	95073a2dca	mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates. Cuts the number of i965 color calculator viewport uploads by 100x (11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-10-01 01:08:26 -07:00
Kenneth Graunke	0a1730200e	i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom. I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG, which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not needed here anyway - only SBE needs it. Just a copy and paste mistake. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-10-01 01:08:07 -07:00
Kenneth Graunke	106e0db769	i965: Drop brwBindProgram driver hook. This function flagged BRW_NEW__PROGRAM When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa calls the BindProgram driver hook, which flagged BRW_NEW__PROGRAM. However, brw_upload_state also checks for that changing, sets the same flags, and also updates brw->fragment_program and so on. So, this looks to be entirely redundant. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-01 01:05:41 -07:00
Kenneth Graunke	e25a453b7f	i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments. I had to dig a bit to figure out why this was necessary. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-01 01:05:39 -07:00
Kenneth Graunke	3d31ed0d93	i965: Use "1ull" instead of "1" in BRW_NEW_* defines. Now that the bitfield is a uint64_t, we should use 1ull. Currently, we only have 32 entries, so 1 works fine, but it's not future-proof. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-01 01:05:38 -07:00
Kenneth Graunke	a114f452ae	i965: Use ~0ull when flagging all BRW_NEW_* dirty flags. ~0 is 0xFFFFFFFF, which only covers the first 32 bits. We need all 64. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-01 01:05:36 -07:00
Kenneth Graunke	5105f9a7ae	i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits. This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits beyond 1 << 31. We missed doing this when widening the driver flags from uint32_t to uint64_t. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-01 01:05:35 -07:00
Kenneth Graunke	fbebd5e4a5	i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG. Unused since krh rewrote fast clears to use meta. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-01 01:05:24 -07:00
Chris Forbes	e4e3b0fc0d	i965: Fix typo in comment Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-01 18:37:06 +13:00
Chris Forbes	d8c5c4f3e4	i965: Fix spelling of GEN7_SAMPLER_EWA_ANISOTROPIC_ALGORITHM Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-01 18:37:06 +13:00
Vinson Lee	6a238ac0b7	llvmpipe: Add missing LLVMGetGlobalContext() arg in lp_test_format.c. Fix build error introduced with commit `eedbce9c63`. lp_test_format.c: In function ‘test_format_unorm8’: lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’ gallivm = gallivm_create("test_module_unorm8"); ^ In file included from ../../../../src/gallium/auxiliary/gallivm/lp_bld_format.h:38:0, from lp_test_format.c:42: ../../../../src/gallium/auxiliary/gallivm/lp_bld_init.h:58:1: note: declared here gallivm_create(const char *name, LLVMContextRef context); ^ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84538 Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2014-09-30 21:52:13 -07:00
Keith Packard	3202926746	glx/dri3: Provide error diagnostics when DRI3 allocation fails Instead of just segfaulting in the driver when a buffer allocation fails, report error messages indicating what went wrong so that we can debug things. As a simple example, chromium wraps Mesa in a sandbox which doesn't allow access to most syscalls, including the ability to create shared memory segments for fences. Before, you'd get a simple segfault in mesa and your 3D acceleration would fail. Now you get: $ chromium --disable-gpu-blacklist [10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018 libGL: pci id for fd 12: 8086:0a16, driver i965 libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted. libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted. libGL error: DRI3 Fence object allocation failure Operation not permitted [10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize. [10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed. [10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer. This made it pretty easy to diagnose the problem in the referenced bug report. Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681 Signed-off-by: Keith Packard <keithp@keithp.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 21:23:04 -07:00
Keith Packard	f7a355556e	glx/dri3: Use four buffers until X driver supports async flips A driver which doesn't have async flip support will queue up flips without any way to replace them afterwards. This means we've got a scanout buffer pinned as soon as we schedule a flip and so we need another buffer to keep from stalling. When vblank_mode=0, if there are only three buffers we do: current scanout buffer = 0 at MSC 0 Render frame 1 to buffer 1 PresentPixmap for buffer 1 at MSC 1 This is sitting down in the kernel waiting for vblank to become the next scanout buffer Render frame 2 to buffer 2 PresentPixmap for buffer 2 at MSC 1 This cannot be displayed at MSC 1 because the kernel doesn't have any way to replace buffer 1 as the pending scanout buffer. So, best case this will get displayed at MSC 2. Now we block after this, waiting for one of the three buffers to become idle. We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1 because it's sitting in the kernel waiting to become the next scanout buffer and we can't use buffer 2 because that's the most recent frame which will become the next scanout buffer if the application doesn't manage to generate another complete frame by MSC 2. With four buffers, we get: current scanout buffer = 0 at MSC 0 Render frame 1 to buffer 1 PresentPixmap for buffer 1 at MSC 1 This is sitting down in the kernel waiting for vblank to become the next scanout buffer Render frame 2 to buffer 2 PresentPixmap for buffer 2 at MSC 1 This cannot be displayed at MSC 1 because the kernel doesn't have any way to replace buffer 1 as the pending scanout buffer. So, best case this will get displayed at MSC 2. The X server will queue this swap until buffer 1 becomes the scanout buffer. Render frame 3 to buffer 3 PresentPixmap for buffer 3 at MSC 1 As soon as the X server sees this, it will replace the pending buffer 2 swap with this swap and release buffer 2 back to the application Render frame 4 to buffer 2 PresentPixmap for buffer 2 at MSC 1 Now we're in a steady state, flipping between buffer 2 and 3 waiting for one of them to be queued to the kernel. ... current scanout buffer = 1 at MSC 1 Now buffer 0 is free and (e.g.) buffer 2 is queued in the kernel to be the scanout buffer at MSC 2 Render frames, flipping between buffer 0 and 3 When the system can replace a queued buffer, and we update Present to take advantage of that, we can use three buffers and get: current scanout buffer = 0 at MSC 0 Render frame 1 to buffer 1 PresentPixmap for buffer 1 at MSC 1 This is sitting waiting for vblank to become the next scanout buffer Render frame 2 to buffer 2 PresentPixmap for buffer 2 at MSC 1 Queue this for display at MSC 1 1. There are three possible results: 1) We're still before MSC 1. Buffer 1 is released, buffer 2 is queued waiting for MSC 1. 2) We're now after MSC 1. Buffer 0 was released at MSC 1. Buffer 1 is the current scanout buffer. a) If the user asked for a tearing update, we swap scanout from buffer 1 to buffer 2 and release buffer 1. b) If the user asked for non-tearing update, we queue buffer 2 for the MSC 2. In all three cases, we have a buffer released (call it 'n'), ready to receive the next frame. Render frame 3 to buffer n PresentPixmap for buffer n If we're still before MSC 1, then we'll ask to present at MSC 1. Otherwise, we'll ask to present at MSC 2. Present already does this if the driver offers async flips, however it does this by waiting for the right vblank event and sending an async flip right at that point. I've hacked the intel driver to offer this, but I get tearing at the top of the screen. I think this is because flips are always done from within the ring, and so the latency between the vblank event and the async flip happening can cause tearing at the top of the screen. That's why I'm keying the need for the extra buffer on the lack of 2D driver support for async flips. Signed-off-by: Keith Packard <keithp@keithp.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com> Tested-by: Dylan Baker <baker.dylan.c@gmail.com>	2014-09-30 20:08:28 -07:00
Jason Ekstrand	eedbce9c63	i965/fs: Fix the build	2014-09-30 17:27:33 -07:00
Jason Ekstrand	83669fac9d	i965/fs: Fix an uninitialized value warnings Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 17:26:05 -07:00
Roland Scheidegger	9750ae8ca9	galahad: fix indirect draw Need to unwrap the indirect resource otherwise bad things will happen. Fixes random crashes and timeouts with piglit's arb_indirect_draw tests. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-10-01 02:17:24 +02:00
Roland Scheidegger	e3da8c110c	galahad: (trivial) handle cubemap arrays Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-10-01 02:16:57 +02:00
Matt Turner	3e7f8005db	i965/fs: Emit compressed BFI2 instructions on Gen > 7. IVB had a restriction that prevented us from emitting compressed three-source instructions, and although that was lifted on Haswell, Haswell had a new restriction that said BFI instructions specifically couldn't be compressed.	2014-09-30 17:09:34 -07:00
Matt Turner	9f5e5bd34d	i965/fs: Allow SIMD16 borrow/carry/64-bit multiply on Gen > 7. These checks were intended for Gen 7 only. None of these restrictions apply to Gen 8. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-09-30 17:09:34 -07:00
Matt Turner	05586f9bc1	i965/fs: Set MUL source type to W/UW in 64-bit mul macro on Gen8. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-09-30 17:09:34 -07:00
Matt Turner	94b68109fb	i965/fs: Optimize sqrt+inv into rsq. Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b The improvement here is that we've broken a dependency between these instructions. Leads to 330 fewer INV instructions and 330 more RSQ. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-09-30 17:09:34 -07:00
Matt Turner	b52126b44f	i965/vec4: Optimize sqrt+inv into rsq. Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b In most cases the sqrt's result is still used, so the improvement here is that we've broken a dependency between these instructions. Leads to 80 fewer INV instructions and 80 more RSQ. Occasionally the sqrt's result is no longer used, leading to: instructions in affected programs: 5005 -> 4949 (-1.12%) Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-09-30 17:09:34 -07:00
Matt Turner	189ac07764	i965/vec4: Call opt_algebraic after opt_cse. The next patch adds an algebraic optimization for the pattern sqrt a, b rcp c, a and turns it into sqrt a, b rsq c, b but many vertex shaders do a = sqrt(b); var1 /= a; var2 /= a; which generates sqrt a, b rcp c, a rcp d, a If we apply the algebraic optimization before CSE, we'll end up with sqrt a, b rsq c, b rcp d, a Applying CSE combines the RCP instructions, preventing this from happening. No shader-db changes. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-09-30 17:09:34 -07:00
Matt Turner	d13bcdb3a9	i965/fs: Extend predicated break pass to predicate WHILE. Helps a handful of programs in Serious Sam 3 that use do-while loops. instructions in affected programs: 16114 -> 16075 (-0.24%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-09-30 17:09:34 -07:00
Mathias Fröhlich	6e7d36fd2c	gallivm: Fix build for LLVM 3.2 Do not rely on LLVMMCJITMemoryManagerRef being available. The c binding to the memory manager objects only appeared on llvm-3.4. The change is based on an initial patch of Brian Paul. Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>	2014-10-01 00:29:31 +02:00
Rob Clark	cc355f1c06	freedreno: destroy transfer pool after blitter Blitter can still have transfers hanging around which it frees in util_blitter_destroy(). So let it clean up before we yank the transfer_pool from under it. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-09-30 16:56:15 -04:00
Rob Clark	01ff0b28b3	freedreno/lowering: fix token calculation for lowering Indirect registers consume an additional token. Try to clean up the token calculation math a bit, and fix it at the same time. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-09-30 16:56:15 -04:00
Ian Romanick	408aa46ca8	i965/fs: Don't make a name for a vector splitting temporary If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 13:34:43 -07:00
Ian Romanick	0b47252999	glsl: Don't make a name for the function return variable If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 13:34:43 -07:00
Ian Romanick	c87d09d7f0	glsl: Don't allocate a name for ir_var_temporary variables Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0 After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0 Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0 After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0 A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 13:34:43 -07:00
Ian Romanick	eaa0c74142	glsl: Use ir_var_temporary for compiler generated temporaries These few places were using ir_var_auto for seemingly no reason. The names were not added to the symbol table. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 13:34:43 -07:00
Ian Romanick	04e1357d97	glsl: Add context-level controls for whether temporaries have real names No change Valgrind massif results for a trimmed apitrace of dota2. v2: Minor rebase on _mesa_init_constants changes. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 13:34:42 -07:00
Ian Romanick	a99482482d	glsl: Never put ir_var_temporary variables in the symbol table Later patches will give every ir_var_temporary the same name in release builds. Adding a bunch of variables named "compiler_temp" to the symbol table can only cause problems. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 13:34:42 -07:00
Ian Romanick	7625babfae	glsl: Add the possibility for ir_variable to have a non-ralloced name Specifically, ir_var_temporary variables constructed with a NULL name will all have the name "compiler_temp" in static storage. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 13:34:42 -07:00
Ian Romanick	0e654ab1b9	glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits each Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0 After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0 Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0 After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0 A real savings of 173KiB on 32-bit and no change on 64-bit. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2014-09-30 13:34:42 -07:00
Ian Romanick	a32ac726ee	glsl: Squish ir_variable::max_ifc_array_access and ::state_slots together At least one of these pointers must be NULL, and we can determine which will be NULL by looking at other fields. Use this information to store both pointers in the same location. If anyone can think of a better name for the union than "u", I'm all ears. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0 After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0 Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0 After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2014-09-30 13:34:42 -07:00
Ian Romanick	5aa8d8194c	glsl: Make ir_variable::num_state_slots and ir_variable::state_slots private Also move num_state_slots inside ir_variable_data for better packing. The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2014-09-30 13:34:42 -07:00
Ian Romanick	21df016902	glsl: Make ir_variable::max_ifc_array_access private The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2014-09-30 13:34:42 -07:00

1 2 3 4 5 ...

65851 commits