fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-02 01:28:07 +02:00

Author	SHA1	Message	Date
Jordan Justen	6cf3034ba7	mesa osmesa/x11: fix build error introduced in `4bea4cb9` Fixes https://bugs.freedesktop.org/show_bug.cgi?id=58380 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2012-12-17 08:55:41 -08:00
Roland Scheidegger	3d14b25030	gallivm: fix texel fetch for array textures (2) `a460aea3f1` wasn't entirely correct, since all coords are already ints hence need to skip the iround. Passes piglit texelFetch with sampler1DArray/sampler2DArray. Reviewed-by: Dave Airlie <airlied@redhat.com>	2012-12-17 11:50:27 +01:00
Jordan Justen	1358f3a905	mesa: assert if driver did not compute the version Make sure drivers initialize the version before: * _mesa_initialize_exec_table is called * _mesa_initialize_exec_table_vbo is called * A context is made current Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-16 15:30:28 -08:00
Jordan Justen	075f8722ab	mesa: don't initialize VBO vtxfmt in _vbo_CreateContext The driver should call _mesa_initialize_vbo_vtxfmt after computing the context version. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-16 15:30:28 -08:00
Jordan Justen	53ee3959f2	mesa: don't initialize exec dispatch tables in _mesa_initialize_context Drivers must compute the context version, and then call _mesa_initialize_exec_table themselves. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-16 15:30:27 -08:00
Jordan Justen	d5d1f10955	mesa dispatch_sanity: call new functions to initialize exec table In a future patch the exec functions will no longer set up by _mesa_initialize_context and _vbo_CreateContext. Therefore we must call _mesa_initialize_exec_table and _mesa_initialize_exec_table_vbo. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-16 15:30:27 -08:00
Jordan Justen	4bea4cb9fd	drivers: compute version and then initialize exec table This change forces the context version to be computed before initilizing the exec dispatch tables. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-16 15:30:27 -08:00
Jordan Justen	0924f4e90c	vbo: add _mesa_initialize_vbo_vtxfmt This function initializes the exec/save dispatch tables for VBO vtxfmt. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-16 15:30:27 -08:00
Jordan Justen	d440149538	mesa: separate exec allocation from initialization In glapi/gl_genexec.py: * Remove _mesa_alloc_dispatch_table call In glapi/gl_genexec.py and api_exec.h: * Rename _mesa_create_exec_table to _mesa_initialize_exec_table In context.c: * Call _mesa_alloc_dispatch_table instead of _mesa_create_exec_table * Call _mesa_initialize_exec_table (this is temporary) Once all drivers have been modified to call _mesa_initialize_exec_table, then the call to _mesa_initialize_context can be removed from context.c. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-16 15:30:27 -08:00
Dave Airlie	fa5078c255	r600g: fixup offset types for printing This allows the debug code to at least show the sign properly. Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-16 10:36:42 +00:00
Henri Verbeet	cf358a2b42	gallium/u_blitter: Remove the overlapped blit assert from util_blitter_blit_generic(). This is used by st_BlitFramebuffer() / r600_blit(), and ARB_fbo allows overlapped blits, even though the result is undefined. No piglit regressions on r600g / CYPRESS. Signed-off-by: Henri Verbeet <hverbeet@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <maraeo@gmail.com>	2012-12-16 11:13:20 +01:00
Dave Airlie	a9abaaafd8	glsl_parser_extras.cpp: fixup gl vs mem contexts again. This should fix: https://bugs.freedesktop.org/show_bug.cgi?id=58039 Tested-by: Darxus on bug 58039 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-16 17:30:08 +10:00
Kenneth Graunke	4f91f8dd60	i965: Move BRW_MAX_GRF and similar defines to brw_reg.h. These don't really belong in brw_structs.h. Reviewed-by: Eric Anholt <eric@anholt.net>	2012-12-15 13:40:16 -08:00
Kenneth Graunke	1db1283563	i965: Split struct brw_reg out from brw_eu.h into its own header. struct brw_instruction and the related instruction emitting code won't be useful on Gen8+, as the instruction encoding changed. However, the struct brw_reg code is still extremely valuable. While we're at it, fix up some style points: - s/GLuint/unsigned/g - s/GLint/int/g - s/GLshort/int16_t/g - s/GLushort/uint16_t/g - s/INLINE/inline/g - Replace tabs with spaces - Put return types on a separate line from the function name/parameters - Remove trailing whitespace - Remove extraneous whitespace around function parameters Reviewed-by: Eric Anholt <eric@anholt.net>	2012-12-15 13:40:09 -08:00
Dave Airlie	39fa4c0a58	st/mesa: add texture buffer object rgb32 support. This checks if the pipe driver can support RGB32 formats. Reviewed-by: Marek Olšák <maraeo@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-16 06:55:39 +10:00
Dave Airlie	1b62c326ea	mesa: add support for ARB_texture_buffer_object_rgb32 This adds the extensions + the tex buffer support for checking the formats. There is a piglit test enhancement sent to that list. Reviewed-by: Marek Olšák <maraeo@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-16 06:55:33 +10:00
Dave Airlie	7d7a549fa0	glsl: avoid using gl context as a memory context Not sure what was going on here, but running piglit with debug builds might be a good plan :-) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-15 15:29:49 +10:00
Ian Romanick	b23e92dbe7	i965: Add missing autoconf bits so test_vec4_register_coalesce will build Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Eric Anholt <eric@anholt.net>	2012-12-14 18:44:18 -08:00
Eric Anholt	c9e48e5b08	i965: Generalize VS compute-to-MRF for compute-to-another-GRF, too. No statistically significant performance difference on glbenchmark 2.7 (n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8% (n=5), but that's only about .3% of all cycles spent according to the fixed shader_time. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 16:06:35 -08:00
Eric Anholt	471af25fc5	i965/vs: Extend opt_compute_to_mrf to handle limited "reswizzling" The way our visitor works, scalar expression/swizzle results that get stored in channels other than .x will have an intermediate MOV from their result in the .x channel to the real .y (or whatever) channel, and similarly for vec2/vec3 results. By knowing how to adjust DP4-type instructions for optimizing out a swizzled MOV, we can reduce instructions in common matrix multiplication cases. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 16:06:30 -08:00
Eric Anholt	a76a03f437	i965/vs: Add a unit test for opt_compute_to_mrf(). The compute-to-mrf code is really twitchy, and it's hard to construct GLSL testcases for it. This unit test is also really hard to work with (for example, if your instruction is removed by dead code elimination, you end up inspecting something irrelevant), but I did use it for debugging some of the commits to follow. I called it test_vec4_register_coalesce because the compute-to-mrf code is about to morph into that. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 16:06:01 -08:00
Eric Anholt	7171c45d3a	i965/fs: Drop an unnecessary _safe on a list walk. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 16:05:57 -08:00
Eric Anholt	78ce522932	i965/fs: Add a note explaining a detail of register_coalesce_2(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 16:05:48 -08:00
Eric Anholt	7baf9198b2	i965: Also consider HALTs a potential block end. The final halt of the fragment shader turns off the remaining channels, then jumps such that everything is turned back on. So, we can have our last ENDIF of the shader point at that directly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:45:26 -08:00
Kenneth Graunke	2702202290	i965: Jump to the end of the next outer conditional block on ENDIFs. From the Ivybridge PRM, Volume 4, Part 3, section 6.24 (page 172): "The endif instruction is also used to hop out of nested conditionals by jumping to the end of the next outer conditional block when all channels are disabled." Also: "Pseudocode: Evaluate(WrEn); if ( WrEn == 0 ) { // all channels false Jump(IP + JIP); }" First, ENDIF re-enables any channels that were disabled because they didn't match the conditional. If any channels are active, it proceeds to the next instruction (IP + 16). However, if they're all disabled, there's no point in walking through all of the instructions that have no effect---it can jump to the next instruction that might re-enable some channels (an ELSE, ENDIF, or WHILE). Previously, we always set JIP on ENDIF instructions to 2 (which is measured in 8-byte units). This made it do Jump(IP + 16), which just meant it would go to the next instruction even if all channels were off. It turns out that walking over instructions while all the channels are disabled like this is worse than just instruction dispatch overhead: if there are texturing messages, it still costs a couple hundred cycles to not-actually-read from the texture results. This patch finds the next instruction that could re-enable channels and sets JIP accordingly. Reviewed-by: Eric Anholt <eric@anholt.net>	2012-12-14 15:42:34 -08:00
Chris Forbes	2f7f095a80	i965: expose ARB_texture_cube_map_array V3: Put enable in an existing block rather than making a new one for no good reason. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:51 -08:00
Eric Anholt	380fc562b3	i965/fs: Fix setup for textureGrad(samplerCubeArray, coord, dPdx, dPdy) Caught by tex_grad-01.frag. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:48 -08:00
Eric Anholt	3c56063354	i965/fs: Move the failure for gen7 16-wide intdiv to emit_math(). The cube map array code adds another caller of emit_math(), which needs this check. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:43 -08:00
Chris Forbes	d2dbba8755	i965: fs: Add fixup for textureSize on Gen6/7 V2: Moved up into emit(ir_texture *) to avoid duplication and fix ordering for Gen7; Gen6 math quirks moved into previous patches. Tested on Gen6 only; passes all the cube_map_array piglits. V3: Fixed weird whitespace V4: Use sampler->type; otherwise broken on arrays of samplers. v5: Minor style fixes (by anholt) Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:39 -08:00
Chris Forbes	6e34723ac9	i965: fs: fix gen6+ math operands in one place V4: Fix various style nits as pointed out by Eric, and expand IMM operands on both Gen6 and Gen7. v5: minor style nits (by anholt) Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:35 -08:00
Chris Forbes	f6a3fda25d	i965: vs: Add fixup for textureSize with cube array samplers V3: Fixed weird whitespace V4: Use sampler's type rather than variable's type; otherwise broken with arrays of samplers. (Thanks Eric) v5: Fix a couple more style nits (by anholt) Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:31 -08:00
Chris Forbes	1cb57ea493	i965/vs: Fix gen6+ math operand quirks in one place This causes immediate values to get moved to a temp on gen7, which is needed for an upcoming change but hadn't happened in the visitor until then. v2: Drop gen > 7 checks (doesn't exist), and style-fix comments (changes by anholt). Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:28 -08:00
Chris Forbes	0cda3382a6	i965: Add various plumbing for cubemap arrays V4: Fixed style nits Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:12 -08:00
Eric Anholt	2cae9f2d4a	i965/fs: Add empirically-determined instruction latencies for gen7. v2: Actually switch on the other math instructions mentioned in the comment. v3: Add timing data for textureSize(), and clean up some long comment lines. Testing shader_time of fs16 shaders on a few frames of various apps: nexuiz improved by 2.9% +/- 1.5% (n=10) no difference on GLB2.5 (n=36, outliers removed) no difference on GLB2.7 (n=25) etqw improved by 2.6% +/- 2.2% (n=25) no difference on lightsmark (n=25) Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:18:22 -08:00
Eric Anholt	4df1e18864	i965/fs: Fix the clock increment in scheduling. I've tested this to be true with various ALU ops on gen7 (with the exception of MADs, which go at either 3 or 4 cycles per dispatch). Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:18:14 -08:00
Eric Anholt	6255fc7426	i965/fs: Move the old gen4 bspec-based scheduling info to a helper func. For gen7 everything changes, and we have actual information on latency. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:18:10 -08:00
Eric Anholt	461a29783a	i965/fs: Set up gen7 UBO loads as sends from GRFs. This gives the instruction scheduler a chance to schedule between the loads, whereas before it was restricted due to the dependencies between the MRFs for setting them up. For one shader in gles3conform, it goes from getting stuck in register allocation for as long as anybody's bothered to leave it running down to 23 seconds, thanks to the LIFO scheduling. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:18:05 -08:00
Eric Anholt	456dbcc337	i965/fs: Before reg alloc, schedule instructions to reduce live ranges. This came from an idea by Ben Segovia. 16-wide pixel shaders are very important for latency hiding on i965, so we want to try really hard to get them. If scheduling an instruction makes some set of instructions available, those are probably the ones that make the instruction's result dead. By choosing those first, we'll have a tendency to reduce the amount of live data as opposed to creating more. Previously, we were sometimes getting this behavior out of the scheduler, which was what produced the scheduler's original performance wins on lightsmark. Unfortunately, that was mostly an accident of the lame instruction latency information that I had, which made it impossible to fix the actual scheduling for performance. Now that we've fixed the scheduling for setup for register allocation, we can safely update the latency parameters for the final schedule. In shader-db, we lose 37 16-wide shaders, but gain 90 new ones. 4 shaders that were spilling change how many registers spill, for a reduction of 70/3899 instructions. v2: Simplify the new loop. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:17:59 -08:00
Eric Anholt	ba864bfcfa	i965/fs: Add some optional debug printfs to scheduling. Seeing when instructions become available to schedule is really useful. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:17:55 -08:00
Eric Anholt	7a9f940cab	i965/fs: Schedule instructions both before and after register allocation. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:17:41 -08:00
Eric Anholt	1315f3b4b3	i965: Make sure that the shader_time report at context destroy happens. Otherwise, you end up with some report from within a second of context destroy, which is now what you really want for testing the impact of changes	2012-12-14 15:05:10 -08:00
Eric Anholt	81c247404a	i965: Print a total time for the different shader stages. Sometimes I've got a patch for a performance optimization that's not showing a statistically significant performance difference on reported FPS, but still seems like a good idea because it ought to reduce time spent in the shader. If I can see the total number of cycles spent in the shader stage being optimized, it may show that the patch is still worthwhile (or point out that it's actually broken in some way).	2012-12-14 15:05:10 -08:00
Eric Anholt	f74560f3fb	i965: Scale shader_time to compensate for resets. Some shaders experience resets more than others, which skews the numbers reported. Attempt to correct for this by linearly scaling according to the number of resets that happen. Note that will not be accurate if invocations of shaders have varying times and longer invocations are more likely to reset. However, this should at least be better than the previous situation.	2012-12-14 15:05:10 -08:00
Eric Anholt	338b5f887d	i965: Adjust the split between shader_time_end() and shader_time_write(). I'm about to emit other kinds of writes besides time deltas, and it turns out with the frequency of resets, we couldn't really use the old time delta write() function more than once in a shader.	2012-12-14 15:05:10 -08:00
Paul Berry	ca7e891e8a	glsl/linker: Pack between varyings. This patch implements varying packing between varyings. Previously, each varying occupied components 0 through N-1 of its assigned varying slot, so there was no way to pack two varyings into the same slot. For example, if the varyings were a float, a vec2, a vec3, and another vec2, they would be stored as follows: <----slot1----> <----slot2----> <----slot3----> <----slot4----> slots * * * * * * * * * * * * * * * * flt x x x <vec2-> x x <--vec3---> x <vec2-> x x varyings (Each * represents a varying component, and the "x"s represent wasted space). This change packs the varyings together to eliminate wasted space between varyings, like so: <----slot1----> <----slot2----> <----slot3----> <----slot4----> slots * * * * * * * * * * * * * * * * <vec2-> <vec2-> flt <--vec3---> x x x x x x x x varyings Note that we take advantage of the sort order introduced in previous patches (vec4's first, then vec2's, then scalars, then vec3's) to minimize how often a varying is "double parked" (split across varying slots). Reviewed-by: Eric Anholt <eric@anholt.net> v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.	2012-12-14 10:51:21 -08:00
Paul Berry	df87722bec	glsl/linker: Pack within compound varyings. This patch implements varying packing within varyings that are composed of multiple vectors of size less than 4 (e.g. arrays of vec2's, or matrices with height less than 4). Previously, such varyings used up a full 4-wide varying slot for each constituent vector, meaning that some of the components of each varying slot went unused. For example, a mat4x3 would be stored as follows: <----slot1----> <----slot2----> <----slot3----> <----slot4----> slots * * * * * * * * * * * * * * * * <-column1-> x <-column2-> x <-column3-> x <-column4-> x matrix (Each * represents a varying component, and the "x"s represent wasted space). In addition to wasting precious varying components, this layout complicated transform feedback, since the constituents of the varying are expected to be output to the transform feedback buffer contiguously (e.g. without gaps between the columns, in the case of a matrix). This change packs the constituents of each varying together so that all wasted space is at the end. For the mat4x3 example, this looks like so: <----slot1----> <----slot2----> <----slot3----> <----slot4----> slots * * * * * * * * * * * * * * * * <-column1-> <-column2-> <-column3-> <-column4-> x x x x matrix Note that matrix columns 2 and 3 now cross a boundary between varying slots (a characteristic I call "double parking" of a varying). We don't bother trying to eliminate the wasted space at the end of the varying, since the patch that follows will take care of that. Since compiler back-ends don't (yet) support this packed layout, the lower_packed_varyings function is used to rewrite the shader into a form where each varying occupies a full varying slot. Later, if we add native back-end support for varying packing, we can make this lowering pass optional. Reviewed-by: Eric Anholt <eric@anholt.net> v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.	2012-12-14 10:51:18 -08:00
Paul Berry	4bb8661b1b	gallium: Disable varying packing on hardware with <=8 texture indirections. In practice this will disable varying packing on R300, R400, i915g, and nv30. Reviewed-by: Marek Olšák <maraeo@gmail.com>	2012-12-14 10:51:10 -08:00
Paul Berry	6ee500cfd2	mesa: Add an option so driver can opt out of varying packing. On hardware that supports a limited number of texture indirections, varying packing will comsume an extra texture indirection, since ALU operations are needed in the fragment shader to unpack the varyings before any texturing can be done. This patch introduces a new driver option, ctx->Const.DisableVaryingPacking, which can be used by a driver to opt out of varying packing if the extra texture indirection is costly enough to outweigh the advantages of packing varyings. Reviewed-by: Marek Olšák <maraeo@gmail.com>	2012-12-14 10:49:32 -08:00
Paul Berry	1745a4d751	glsl: Add a lowering pass for packing varyings. This lowering pass generates GLSL code that manually packs varyings into vec4 slots, for the benefit of back-ends that don't support packed varyings natively. No functional change--the lowering pass is not yet used. Reviewed-by: Eric Anholt <eric@anholt.net> v2: Don't use ir_hierarchical_visitor--just loop over instructions directly. Also, make the names of the packed varyings include the names of the original varyings that were packed into them.	2012-12-14 10:49:21 -08:00
Paul Berry	f3993107f0	glsl/linker: Sort varyings by packing class, then vector size. This patch paves the way for varying packing by adding a sorting step before varying assignment, which sorts the varyings into an order that increases the likelihood of being able to find an efficient packing. First, varyings are sorted into "packing classes" by considering attributes that can't be mixed during varying packing--at the moment this includes base type (float/int/uint/bool) and interpolation mode (smooth/noperspective/flat/centroid), though later we will hopefully be able to relax some of these restrictions. The number of packing classes places an upper limit on the amount of space that must be wasted by varying packing, since in theory a shader might nave 4n+1 components worth of varyings in each of m packing classes, resulting in 3m components worth of wasted space. Then, within each packing class, varyings are sorted by vector size, with vec4's coming first, then vec2's, then scalars, and then finally vec3's. The motivation for this order is that it ensures that the only vectors that might be "double parked" (with part of the vector in one varying slot and the remainder in another) are vec3's. Note that the varyings aren't actually packed yet, merely placed in an order that will facilitate packing. Reviewed-by: Eric Anholt <eric@anholt.net>	2012-12-14 10:49:12 -08:00

... 2 3 4 5 6 ...

48513 commits