Commit graph

65851 commits

Author SHA1 Message Date
Roland Scheidegger
dea0fcf4e6 meta: (trivial) remove accidental double semicolon 2014-10-01 23:14:46 +02:00
Anuj Phogat
4330fa970b i965: Enable EXT_framebuffer_multisample_blit_scaled for gen8
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2014-10-01 12:04:15 -07:00
Anuj Phogat
68ee950c78 meta: Implement ext_framebuffer_multisample_blit_scaled extension
Extension enables doing a multisample buffer resolve and buffer
scaling using a single glBlitFrameBuffer() call. Currently, we
have this extension implemented in BLORP which is only used by
SNB and IVB. This patch implements the extension in meta path
which makes it available to Broadwell.

Implementation features:
 - Supports scaled resolves of 2X, 4X and 8X multisample buffers.

 - Avoids unnecessary shader compilations by storing the pre compiled
   shaders for each supported sample count.

 - Uses bilinear filtering for both GL_SCALED_RESOLVE_FASTEST_EXT and
   GL_SCALED_RESOLVE_NICEST_EXT filter options. This is an allowed
   behavior in the extension's spec.

 - I tried doing bicubic filtering for GL_SCALED_RESOLVE_NICEST_EXT
   filter. It made the edges in the image look little smoother but
   the image gets blurred causing no overall quality improvement.
   For now I have dropped the idea of doing different filtering for
   nicest filter.

V2:
 - Minor changes to simplify the fragment shader.
 - Refactor the code to move i965 specific sample_map computation out
   of Meta. We now use ctx->Const.SampleMap{2,4,8}x variables initialized
   by the driver.
 - Use a simple msaa resolve shader for scaled resolves with scaling
   factor = 1.0.

V3:
 - Make changes to create a string out of ctx->Const.SampleMap{2,4,8}x
   variables and use it in fragment shader.

V4:
 - Make changes to use uint8_t type ctx->Const.SampleMap{2,4,8}x
   variables.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2014-10-01 12:04:15 -07:00
Anuj Phogat
7a4790148c i965: Initialize the SampleMap{2,4,8}x variables
with values specific to Intel hardware.

V2: Define and use gen6_get_sample_map() function to initialize
    the variables.

V3: Change the function name to gen6_set_sample_maps() and use
    memcpy() to fill in the data.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2014-10-01 12:04:15 -07:00
Anuj Phogat
38cd40faab mesa: Add new variables in gl_context to store sample layout
SampleMap{2,4,8}x variables are used in later patches to implement
EXT_framebuffer_multisample_blit_scaled extension.

V2: Use integer array instead of a string.
    Bump up the comment.

V3: Use uint8_t type array.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2014-10-01 12:04:15 -07:00
Leo Liu
4f7916ab4f st/va: implement vlVa(Query|Create|Get|Put|Destroy)Image
This patch implements functions for images support,
which basically supports copy data between video
surface and user buffers, in this case supports
SW decode, and other video output

v2: fix buffer size for odd-sized image case
    expose I420 format as well
v3: fix YUV 4:2:2 format data buffer size
    cleanup I420 format  exposure

Signed-off-by: Leo Liu <leo.liu@amd.com>
2014-10-01 13:21:36 -04:00
Christian König
7913c8943a st/va: implement Picture functions for mpeg2 h264 and vc1
This patch implements codec for mpeg2 h264 and vc1,
populates codec parameters and pass them to HW driver.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
2014-10-01 13:21:36 -04:00
Christian König
1be5515838 st/va: implement Context Surface and Buffer
This patch implements context managements, relate it HW driver,
functions for video surface managements, and functions for
application data memory buffer managements.

implemented functions:
vlVa(Create|Destroy)Context
vlVa(Create|Destroy|Put)Surfaces
vlVa(Create|Destroy)Buffer

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
2014-10-01 13:21:36 -04:00
Christian König
2825ef3abf st/va: implement vlVa(Create|Destroy|Query|Get)Config
This patch is for application to query configuration,
such as profiles, entrypoints, and attributes

v2: fix missing profile with query

Signed-off-by: Michael Varga <michael.varga@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
2014-10-01 13:21:36 -04:00
Christian König
3867933ecb st/va: skeleton VAAPI state tracker
This patch adds a skeleton VA-API state tracker,
which is filled with live in the subsequent patches.

v2: fixes in configure.ac and va state_tracker Makefile.am
v3: do not link against libva.
    detect libva version, and correctly set driver entrypoint name.
    rebase(cleanup) targets/va/Makefile.am
v4: cleanup va version auto detection
    add back targets/va/va.sym

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-10-01 13:21:36 -04:00
Leo Liu
0eb8f89981 st/vdpau: move common functions to util
Break out these functions so that they can be shared with a other
state trackers.  They will be used in subsequent patches for the new
VA-API state tracker.

Signed-off-by: Leo Liu <leo.liu@amd.com>
2014-10-01 13:21:36 -04:00
Rob Clark
204dd73c99 freedreno: max-texture-lod-bias should be 15.0f
Fixes piglit lodbias test.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-01 07:28:06 -04:00
Kenneth Graunke
95073a2dca mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.
Cuts the number of i965 color calculator viewport uploads by 100x
(11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-10-01 01:08:26 -07:00
Kenneth Graunke
0a1730200e i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.
I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG,
which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT.  It's not
needed here anyway - only SBE needs it.  Just a copy and paste mistake.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-10-01 01:08:07 -07:00
Kenneth Graunke
106e0db769 i965: Drop brwBindProgram driver hook.
This function flagged BRW_NEW_*_PROGRAM

When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa
calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM.

However, brw_upload_state also checks for that changing, sets the same
flags, and also updates brw->fragment_program and so on.  So, this looks
to be entirely redundant.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-10-01 01:05:41 -07:00
Kenneth Graunke
e25a453b7f i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments.
I had to dig a bit to figure out why this was necessary.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-10-01 01:05:39 -07:00
Kenneth Graunke
3d31ed0d93 i965: Use "1ull" instead of "1" in BRW_NEW_* defines.
Now that the bitfield is a uint64_t, we should use 1ull.  Currently, we
only have 32 entries, so 1 works fine, but it's not future-proof.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-10-01 01:05:38 -07:00
Kenneth Graunke
a114f452ae i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.
~0 is 0xFFFFFFFF, which only covers the first 32 bits.  We need all 64.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-10-01 01:05:36 -07:00
Kenneth Graunke
5105f9a7ae i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits.
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits
beyond 1 << 31.  We missed doing this when widening the driver flags
from uint32_t to uint64_t.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-10-01 01:05:35 -07:00
Kenneth Graunke
fbebd5e4a5 i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.
Unused since krh rewrote fast clears to use meta.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-10-01 01:05:24 -07:00
Chris Forbes
e4e3b0fc0d i965: Fix typo in comment
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2014-10-01 18:37:06 +13:00
Chris Forbes
d8c5c4f3e4 i965: Fix spelling of GEN7_SAMPLER_EWA_ANISOTROPIC_ALGORITHM
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2014-10-01 18:37:06 +13:00
Vinson Lee
6a238ac0b7 llvmpipe: Add missing LLVMGetGlobalContext() arg in lp_test_format.c.
Fix build error introduced with commit
eedbce9c63.

lp_test_format.c: In function ‘test_format_unorm8’:
lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’
    gallivm = gallivm_create("test_module_unorm8");
    ^
In file included from ../../../../src/gallium/auxiliary/gallivm/lp_bld_format.h:38:0,
                 from lp_test_format.c:42:
../../../../src/gallium/auxiliary/gallivm/lp_bld_init.h:58:1: note: declared here
 gallivm_create(const char *name, LLVMContextRef context);
 ^

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84538
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2014-09-30 21:52:13 -07:00
Keith Packard
3202926746 glx/dri3: Provide error diagnostics when DRI3 allocation fails
Instead of just segfaulting in the driver when a buffer allocation fails,
report error messages indicating what went wrong so that we can debug things.

As a simple example, chromium wraps Mesa in a sandbox which doesn't allow
access to most syscalls, including the ability to create shared memory
segments for fences. Before, you'd get a simple segfault in mesa and your 3D
acceleration would fail. Now you get:

$ chromium --disable-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted
[10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.

This made it pretty easy to diagnose the problem in the referenced bug report.

Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681
Signed-off-by: Keith Packard <keithp@keithp.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 21:23:04 -07:00
Keith Packard
f7a355556e glx/dri3: Use four buffers until X driver supports async flips
A driver which doesn't have async flip support will queue up flips without any
way to replace them afterwards. This means we've got a scanout buffer pinned
as soon as we schedule a flip and so we need another buffer to keep from
stalling.

When vblank_mode=0, if there are only three buffers we do:

        current scanout buffer = 0 at MSC 0

        Render frame 1 to buffer 1
        PresentPixmap for buffer 1 at MSC 1

                This is sitting down in the kernel waiting for vblank to
                become the next scanout buffer

        Render frame 2 to buffer 2
        PresentPixmap for buffer 2 at MSC 1

                This cannot be displayed at MSC 1 because the
                kernel doesn't have any way to replace buffer 1 as the pending
                scanout buffer. So, best case this will get displayed at MSC 2.

Now we block after this, waiting for one of the three buffers to become idle.
We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1
because it's sitting in the kernel waiting to become the next scanout buffer
and we can't use buffer 2 because that's the most recent frame which will
become the next scanout buffer if the application doesn't manage to generate
another complete frame by MSC 2.

With four buffers, we get:

        current scanout buffer = 0 at MSC 0

        Render frame 1 to buffer 1
        PresentPixmap for buffer 1 at MSC 1

                This is sitting down in the kernel waiting for vblank to
                become the next scanout buffer

        Render frame 2 to buffer 2
        PresentPixmap for buffer 2 at MSC 1

                This cannot be displayed at MSC 1 because the
                kernel doesn't have any way to replace buffer 1 as the pending
                scanout buffer. So, best case this will get displayed at MSC
                2. The X server will queue this swap until buffer 1 becomes
                the scanout buffer.

        Render frame 3 to buffer 3
        PresentPixmap for buffer 3 at MSC 1

                As soon as the X server sees this, it will replace the pending
                buffer 2 swap with this swap and release buffer 2 back to the
                application

        Render frame 4 to buffer 2
        PresentPixmap for buffer 2 at MSC 1

                Now we're in a steady state, flipping between buffer 2 and 3
                waiting for one of them to be queued to the kernel.

        ...

        current scanout buffer = 1 at MSC 1

                Now buffer 0 is free and (e.g.) buffer 2 is queued in
                the kernel to be the scanout buffer at MSC 2

        Render frames, flipping between buffer 0 and 3

When the system can replace a queued buffer, and we update Present to take
advantage of that, we can use three buffers and get:

        current scanout buffer = 0 at MSC 0

        Render frame 1 to buffer 1
        PresentPixmap for buffer 1 at MSC 1

                This is sitting waiting for vblank to become the next scanout
                buffer

        Render frame 2 to buffer 2
        PresentPixmap for buffer 2 at MSC 1

                Queue this for display at MSC 1
                1. There are three possible results:

                  1) We're still before MSC 1. Buffer 1 is released,
                     buffer 2 is queued waiting for MSC 1.

                  2) We're now after MSC 1. Buffer 0 was released at MSC 1.
                     Buffer 1 is the current scanout buffer.

                     a) If the user asked for a tearing update, we swap
                        scanout from buffer 1 to buffer 2 and release buffer
                        1.

                     b) If the user asked for non-tearing update, we
                        queue buffer 2 for the MSC 2.

                In all three cases, we have a buffer released (call it 'n'),
                ready to receive the next frame.

        Render frame 3 to buffer n
        PresentPixmap for buffer n

                If we're still before MSC 1, then we'll ask to present at MSC
                1. Otherwise, we'll ask to present at MSC 2.

Present already does this if the driver offers async flips, however it does
this by waiting for the right vblank event and sending an async flip right at
that point.

I've hacked the intel driver to offer this, but I get tearing at the top of
the screen. I think this is because flips are always done from within the
ring, and so the latency between the vblank event and the async flip happening
can cause tearing at the top of the screen.

That's why I'm keying the need for the extra buffer on the lack of 2D
driver support for async flips.

Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
2014-09-30 20:08:28 -07:00
Jason Ekstrand
eedbce9c63 i965/fs: Fix the build 2014-09-30 17:27:33 -07:00
Jason Ekstrand
83669fac9d i965/fs: Fix an uninitialized value warnings
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 17:26:05 -07:00
Roland Scheidegger
9750ae8ca9 galahad: fix indirect draw
Need to unwrap the indirect resource otherwise bad things will happen.

Fixes random crashes and timeouts with piglit's arb_indirect_draw tests.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-10-01 02:17:24 +02:00
Roland Scheidegger
e3da8c110c galahad: (trivial) handle cubemap arrays
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-10-01 02:16:57 +02:00
Matt Turner
3e7f8005db i965/fs: Emit compressed BFI2 instructions on Gen > 7.
IVB had a restriction that prevented us from emitting compressed
three-source instructions, and although that was lifted on Haswell,
Haswell had a new restriction that said BFI instructions specifically
couldn't be compressed.
2014-09-30 17:09:34 -07:00
Matt Turner
9f5e5bd34d i965/fs: Allow SIMD16 borrow/carry/64-bit multiply on Gen > 7.
These checks were intended for Gen 7 only. None of these restrictions
apply to Gen 8.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-09-30 17:09:34 -07:00
Matt Turner
05586f9bc1 i965/fs: Set MUL source type to W/UW in 64-bit mul macro on Gen8.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-09-30 17:09:34 -07:00
Matt Turner
94b68109fb i965/fs: Optimize sqrt+inv into rsq.
Transform

   sqrt a, b
   rcp  c, a

into

   sqrt a, b
   rsq  c, b

The improvement here is that we've broken a dependency between these
instructions. Leads to 330 fewer INV instructions and 330 more RSQ.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-09-30 17:09:34 -07:00
Matt Turner
b52126b44f i965/vec4: Optimize sqrt+inv into rsq.
Transform

   sqrt a, b
   rcp  c, a

into

   sqrt a, b
   rsq  c, b

In most cases the sqrt's result is still used, so the improvement here
is that we've broken a dependency between these instructions. Leads to
80 fewer INV instructions and 80 more RSQ.

Occasionally the sqrt's result is no longer used, leading to:

instructions in affected programs:     5005 -> 4949 (-1.12%)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-09-30 17:09:34 -07:00
Matt Turner
189ac07764 i965/vec4: Call opt_algebraic after opt_cse.
The next patch adds an algebraic optimization for the pattern

   sqrt a, b
   rcp  c, a

and turns it into

   sqrt a, b
   rsq  c, b

but many vertex shaders do

   a = sqrt(b);
   var1 /= a;
   var2 /= a;

which generates

   sqrt a, b
   rcp  c, a
   rcp  d, a

If we apply the algebraic optimization before CSE, we'll end up with

   sqrt a, b
   rsq  c, b
   rcp  d, a

Applying CSE combines the RCP instructions, preventing this from
happening.

No shader-db changes.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-09-30 17:09:34 -07:00
Matt Turner
d13bcdb3a9 i965/fs: Extend predicated break pass to predicate WHILE.
Helps a handful of programs in Serious Sam 3 that use do-while loops.

instructions in affected programs:     16114 -> 16075 (-0.24%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-09-30 17:09:34 -07:00
Mathias Fröhlich
6e7d36fd2c gallivm: Fix build for LLVM 3.2
Do not rely on LLVMMCJITMemoryManagerRef being available.
The c binding to the memory manager objects only appeared
on llvm-3.4.
The change is based on an initial patch of Brian Paul.

Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
2014-10-01 00:29:31 +02:00
Rob Clark
cc355f1c06 freedreno: destroy transfer pool after blitter
Blitter can still have transfers hanging around which it frees in
util_blitter_destroy().  So let it clean up before we yank the
transfer_pool from under it.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-09-30 16:56:15 -04:00
Rob Clark
01ff0b28b3 freedreno/lowering: fix token calculation for lowering
Indirect registers consume an additional token.  Try to clean up the
token calculation math a bit, and fix it at the same time.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-09-30 16:56:15 -04:00
Ian Romanick
408aa46ca8 i965/fs: Don't make a name for a vector splitting temporary
If the name is just going to get dropped, don't bother making it.  If
the name is made, release it sooner (rather than later).

No change Valgrind massif results for a trimmed apitrace of dota2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 13:34:43 -07:00
Ian Romanick
0b47252999 glsl: Don't make a name for the function return variable
If the name is just going to get dropped, don't bother making it.  If
the name is made, release it sooner (rather than later).

No change Valgrind massif results for a trimmed apitrace of dota2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 13:34:43 -07:00
Ian Romanick
c87d09d7f0 glsl: Don't allocate a name for ir_var_temporary variables
Valgrind massif results for a trimmed apitrace of dota2:

                  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
Before (32-bit): 74 40,578,719,715       67,762,208       62,263,404     5,498,804            0
After  (32-bit): 52 40,565,579,466       66,359,800       61,187,818     5,171,982            0

Before (64-bit): 74 37,129,541,061       95,195,160       87,369,671     7,825,489            0
After  (64-bit): 76 37,134,691,404       93,271,352       85,900,223     7,371,129            0

A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 13:34:43 -07:00
Ian Romanick
eaa0c74142 glsl: Use ir_var_temporary for compiler generated temporaries
These few places were using ir_var_auto for seemingly no reason.  The
names were not added to the symbol table.

No change Valgrind massif results for a trimmed apitrace of dota2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 13:34:43 -07:00
Ian Romanick
04e1357d97 glsl: Add context-level controls for whether temporaries have real names
No change Valgrind massif results for a trimmed apitrace of dota2.

v2: Minor rebase on _mesa_init_constants changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 13:34:42 -07:00
Ian Romanick
a99482482d glsl: Never put ir_var_temporary variables in the symbol table
Later patches will give every ir_var_temporary the same name in release
builds.  Adding a bunch of variables named "compiler_temp" to the symbol
table can only cause problems.

No change Valgrind massif results for a trimmed apitrace of dota2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 13:34:42 -07:00
Ian Romanick
7625babfae glsl: Add the possibility for ir_variable to have a non-ralloced name
Specifically, ir_var_temporary variables constructed with a NULL name
will all have the name "compiler_temp" in static storage.

No change Valgrind massif results for a trimmed apitrace of dota2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 13:34:42 -07:00
Ian Romanick
0e654ab1b9 glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits each
Valgrind massif results for a trimmed apitrace of dota2:

                  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
Before (32-bit): 44 40,577,049,140       68,118,608       62,441,063     5,677,545            0
After  (32-bit): 71 40,583,408,411       67,761,528       62,263,519     5,498,009            0

Before (64-bit): 63 37,122,829,194       95,153,008       87,333,600     7,819,408            0
After  (64-bit): 67 37,123,303,706       95,150,544       87,333,600     7,816,944            0

A real savings of 173KiB on 32-bit and no change on 64-bit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2014-09-30 13:34:42 -07:00
Ian Romanick
a32ac726ee glsl: Squish ir_variable::max_ifc_array_access and ::state_slots together
At least one of these pointers must be NULL, and we can determine which
will be NULL by looking at other fields.  Use this information to store
both pointers in the same location.

If anyone can think of a better name for the union than "u", I'm all
ears.

Valgrind massif results for a trimmed apitrace of dota2:

                  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
Before (32-bit): 63 40,574,239,515       68,117,280       62,618,607     5,498,673            0
After  (32-bit): 44 40,577,049,140       68,118,608       62,441,063     5,677,545            0

Before (64-bit): 53 37,126,451,468       95,150,256       87,711,304     7,438,952            0
After  (64-bit): 63 37,122,829,194       95,153,008       87,333,600     7,819,408            0

A real savings of 173KiB on 32-bit and 368KiB on 64-bit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2014-09-30 13:34:42 -07:00
Ian Romanick
5aa8d8194c glsl: Make ir_variable::num_state_slots and ir_variable::state_slots private
Also move num_state_slots inside ir_variable_data for better packing.

The payoff for this will come in a few more patches.

No change Valgrind massif results for a trimmed apitrace of dota2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2014-09-30 13:34:42 -07:00
Ian Romanick
21df016902 glsl: Make ir_variable::max_ifc_array_access private
The payoff for this will come in a few more patches.

No change Valgrind massif results for a trimmed apitrace of dota2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2014-09-30 13:34:42 -07:00