fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-02-03 10:50:26 +01:00

Author	SHA1	Message	Date
Timothy Arceri	47dde2bd45	glsl: don't try adding built-ins to explicit locations bitmask Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 09:06:26 +11:00
Timothy Arceri	ac6e2c2056	glsl: fix overlapping of varying locations for arrays and structs Previously we were only reserving a single location for arrays and structs. We also didn't take into account implicit locations clashing with explicit locations when assigning locations for their arrays or structs. This patch fixes both issues. V5: fix regression for patch inputs/outputs in tessellation shaders V4: just use count_attribute_slots() to get the number of slots, also calculate the correct number of slots to reserve for gs and tess stages by making use of the new get_varying_type() helper. V3: handle arrays of structs V2: also fix for arrays of arrays and structs. Acked-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 09:06:20 +11:00
Timothy Arceri	5907a02ab6	glsl: create helper to remove outer vertex index array used by some stages This will be used in the following patch for calculating array sizes correctly when reserving explicit varying locations. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 09:06:16 +11:00
Timothy Arceri	30991d7389	glsl: remove unused varyings before packing them Previously we would pack varyings before trying to remove them, this relied on the packing pass not packing varyings with a location of -1 to avoid packing varyings that should be removed. However this meant unused varyings with an explicit location would be packed before they could be removed when we enable packing of them in a later patch. V2: fix regression in V1 removing unused varyings in multi-stage SSO, fix regression with single stage programs. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 09:06:12 +11:00
Krzysztof Sobiecki	0d7477a289	gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UP ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality. Replacing it with DIV_ROUND_UP eliminates this problems. Signed-off-by: Krzysztof A. Sobiecki <sobkas@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-06 16:09:12 -05:00
Eric Anholt	bbd29f1375	vc4: Fix driver build from last minute rebase fix. I had the driver all tested for the last series, and in my last build I noticed that get_swizzled_channel was unused now, and removed it... apparently without testing to find that I removed the wrong channel swizzle function.	2016-01-06 12:49:45 -08:00
Eric Anholt	25aa436e86	vc4: Optimize out a comparison for bcsel based on an ALU comparison We routinely have code like: vec1 ssa_220 = fge ssa_104, ssa_61 vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105 and we would compare fge's args and choose between ~0 and 0 to generate ssa_220, then compare ssa_220 to 0 and choose between bcsel's args. Instead, try to notice the pattern and compare between fge's args to select between bcsel's args. total instructions in shared programs: 88019 -> 87574 (-0.51%) instructions in affected programs: 9985 -> 9540 (-4.46%) total estimated cycles in shared programs: 245752 -> 245237 (-0.21%) estimated cycles in affected programs: 17232 -> 16717 (-2.99%)	2016-01-06 12:43:09 -08:00
Eric Anholt	7a9eb76786	vc4: Add missing sRGB decode to texel fetches. We only see txf on MSAA textures, currently, and apparently this didn't impact any of our piglit tests.	2016-01-06 12:43:09 -08:00
Eric Anholt	f01ca9eeda	vc4: Add support for GL_ARB_texture_swizzle. We already had the code supporting it, since it's needed for the depth mode when doing shadow comparisons.	2016-01-06 12:43:09 -08:00
Eric Anholt	12519a972f	vc4: Use NIR texture lowering for texture swizzling. We can't use its other features currently (mostly because we don't want Newton-Raphson on rcps for texture coordinates), but it gets us started. This eliminates some comparisons with constants in GLB2.7 and ETQW traces at the QIR level by moving the comparisons into NIR, where they get constant-folded out. instructions in affected programs: 165 -> 156 (-5.45%) total uniforms in shared programs: 32087 -> 32085 (-0.01%) total estimated cycles in shared programs: 245762 -> 245752 (-0.00%) estimated cycles in affected programs: 461 -> 451 (-2.17%)	2016-01-06 12:43:08 -08:00
Eric Anholt	71db7d3dc5	vc4: Replace the SSA-style SEL operators with conditional MOVs. I'm moving away from QIR being SSA (since NIR is doing lots of SSA optimization for us now) and instead having QIR just be QPU operations with virtual registers. By making our SELs be composed of two MOVs, we could potentially coalesce the registers for the MOV's src and dst and eliminate the MOV. total instructions in shared programs: 88448 -> 88028 (-0.47%) instructions in affected programs: 39845 -> 39425 (-1.05%) total estimated cycles in shared programs: 246306 -> 245762 (-0.22%) estimated cycles in affected programs: 162887 -> 162343 (-0.33%)	2016-01-06 12:39:51 -08:00
Eric Anholt	0a89f307f9	vc4: Don't try the SF coalescing unless it's on a def. If you want the SF of the value of a register produced from a series of packing MOVs or conditional MOVs, we can't just SF on the last MOV into the register.	2016-01-06 12:39:27 -08:00
Chad Versace	8284786c5d	anv/gen9: Teach gen9_image_view_init() about 1D surface qpitch QPitch is usually expressed as rows of surface elements (where a surface element is an compression block or a single surface sample. Skylake 1D is an outlier; there QPitch is expressed as individual surface elements.	2016-01-06 09:38:57 -08:00
Chad Versace	e05b307942	isl: Add isl_surf_get_array_pitch_el() Will be needed to program SurfaceQPitch for Skylake 1D arrays.	2016-01-06 09:38:57 -08:00
Chad Versace	c1e890541e	isl/gen9: Support ISL_DIM_LAYOUT_GEN9_1D	2016-01-06 09:38:57 -08:00
Chad Versace	eea2d4d059	isl: Don't align phys_slice0_sa.width twice It's already aligned to the format's block width. Don't align it again in isl_calc_row_pitch().	2016-01-06 09:38:57 -08:00
Chad Versace	39d043f94a	isl: Fix the documented units of isl_surf::row_pitch It's the pitch between surface elements, not between surface samples.	2016-01-06 09:38:57 -08:00
Chad Versace	dcb9c11dc7	anv/gen9: Fix oob lookup of surface halign, valign For 1D surfaces and for surfaces with Yf or Ys tiling, the hardware ignores SurfaceVerticalAlignment and SurfaceHorizontalAlignment. Moreover, the anv_halign[] and anv_valign[] lookup tables may not even contain the surface's actual alignment values. So don't do the lookup for those surfaces.	2016-01-06 09:38:57 -08:00
Chad Versace	94566d9b68	anv/meta: Teach meta how to blit from a 1D image Meta needed a VkShader with a 1D sampler type.	2016-01-06 09:38:57 -08:00
Edward O'Callaghan	1953cee6d7	gallium/drivers/svga: Use unsigned for loop index Fix a 's/unsigned int/unsigned/' consistency case while here. Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	8e2a8ec731	gallium/drivers/r600: Use unsigned for loop index Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	76a7d6f412	gallium/drivers/ilo: Use unsigned for loop index Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	5071c192cc	gallium: Use unsigned for loop index Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	bfabd5e74a	gallium/drivers: Remove unnecessary semicolons Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	67d4b4b28c	gallium: Remove unnecessary semicolons Fix silly issue with MSVC case fall-though support to need a extra 'break;' Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Oded Gabbay	9d59b9d00c	llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8 This patch converts the SSE-optimized lp_rast_triangle_32_3_16() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ openarena 16.35 16.7 2.14% xonotic 4.707 4.97 5.57% glmark2 didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Oded Gabbay	925c46cfc4	llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8 This patch converts the SSE-optimized build_mask_32() and build_mask_linear_32() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 139.8 142.7 2.07% openarena and xonotic didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Oded Gabbay	3bbe16ea79	llvmpipe: Optimize do_triangle_ccw for POWER8 This patch converts the SSE optimization done in do_triangle_ccw to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 136.6 139.8 2.34% openarena 16.14 16.35 1.30% xonotic 4.655 4.707 1.11% v2: - Convert loads to use aligned loads - Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Oded Gabbay	e99555ef0b	llvmpipe: add POWER8 portability file - u_pwr8.h This file provides a portability layer that will make it easier to convert SSE-based functions to VMX/VSX-based functions. All the functions implemented in this file are prefixed using "vec_". Therefore, when converting from SSE-based function, one needs to simply replace the "_mm_" prefix of the SSE function being called to "vec_". Having said that, not all functions could be converted as such, due to the differences between the architectures. So, when doing such conversion hurt the performance, I preferred to implement a more ad-hoc solution. For example, converting the _mm_shuffle_epi32 needed to be done using ad-hoc masks instead of a generic function. All the functions in this file support both little-endian and big-endian but currently the file is build only on POWER8 LE machine. All of the functions are implemented using the Altivec/VMX intrinsics, except one where I needed to use inline assembly (due to missing intrinsic). v2: - Use vec_vgbbd instead of __builtin_vec_vgbbd - Add an aligned load function - Don't use typeof() - Make file build only on POWER8 LE machine Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Oded Gabbay	afe88f66a8	configure.ac: Detect if running on POWER8 arch To determine if we could use special POWER8 assembly directives, we first need to detect whether we are running on POWER8 architecture. This patch adds this detection to configure.ac and adds the necessary compilation flags accordingly. v2: - Add option to disable POWER8 instructions generation - Detect whether building on BE or LE machine and build with -mpower8-vector only on LE machine - Make the printed messages more standard Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Kenneth Graunke	7295f4fcc2	nir: Add a lower_fdiv option, turn fdiv into fmul/frcp. The nir_opt_algebraic rule (('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))), can produce new fdiv operations, which need to be lowered on i965, as we don't actually implement fdiv. (Normally, we handle this in GLSL IR's lower_instructions pass, but in the above case we introduce an fdiv after that point. So, make NIR do it for us.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-01-05 19:22:11 -08:00
Kenneth Graunke	bd21b54607	i965: Only turn on ARB_compute_shader if we can write registers. Compute shaders require reconfiguring the L3 for shared local memory support. We have to be able to write the L3 registers to do that. This effectively turns off compute shaders prior to Kernel 4.2. (Previously, the extension enable was in an API_OPENGL_CORE conditional. However, that isn't necessary - core Mesa extension handling already restricts it properly. I've moved it out in this patch.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-01-05 18:07:27 -08:00
Kenneth Graunke	25b7e4a01f	i965: Use rcp in brw_lower_texture_gradients rather than 1.0 / x. That's what it's for. Plus, we actually implement rcp. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-01-05 18:07:27 -08:00
Timothy Arceri	3d402d4450	mesa: fix GL_MAX_NAME_LENGTH query for tessellation shaders This fixes some piglit subtests for ARB_program_interface_query. V3: remove some of the unnecessary parentheses V2: fix alignment Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-06 12:01:09 +11:00
Jason Ekstrand	7a069bea5d	nir/spirv: Fix switch statements with duplicate cases	2016-01-05 16:18:01 -08:00
Jason Ekstrand	506a467f16	nir/spirv/cfg: Assert that blocks only ever get added once This effectively prevents infinite loops in cfg_walk_blocks.	2016-01-05 15:56:59 -08:00
Timothy Arceri	e1e1b67878	glsl: don't change the varying type in validation code There is a function dedicated to demoting unused varyings lets trust it to do its job. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-06 10:52:58 +11:00
Timothy Arceri	21590a307c	glsl: move lowering after matching validation After lowering the matching flag is_unmatched_generic_inout is lost so we need to move this validation before lowering. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-06 10:52:54 +11:00
Timothy Arceri	0508d9504a	glsl: only add outward facing varyings to resourse list for SSO An SSO program can have multiple stages and we only want to add the externally facing varyings. The current code was adding both the packed inputs and outputs for the first and last stage of each program. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-06 10:52:48 +11:00
Jason Ekstrand	71a25a0b07	nir/spirv: Simplify phi node handling Instead of trying to crawl through predecessor chains and build phi nodes, we just do a poor-man's out-of-ssa on the spot. The into-SSA pass will deal with putting the actual phi nodes in for us.	2016-01-05 14:59:40 -08:00
Anuj Phogat	4d2a7f5111	i965/gen9: Modify the conditions to use blitter on skl+ Conditions modified allow skl+ to use blitter: - for all tiling formats - to write data to YF/YS tiled surfaces Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-01-05 13:43:32 -08:00
Anuj Phogat	0bf037c0fe	i965/gen9: Return false in place of assert in intelEmitCopyBlit() This allows the fallback paths to handle it correctly. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2016-01-05 13:43:32 -08:00
Anuj Phogat	5cbe01c83f	i965/gen9: Remove regions overlap check in fast copy blit Overlapping blits are anyway undefined in OpenGL. So no need of overlap check here. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2016-01-05 13:43:32 -08:00
Anuj Phogat	3c8b97a45b	i965/gen9: Don't use fast copy blit in case of non power of 2 cpp Fast copy blit is currently enabled for use only with Yf/Ys tiling. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2016-01-05 13:43:32 -08:00
Jason Ekstrand	ec899f6b42	anv/pipeline: Lower indirect temporaries and inputs	2016-01-05 13:42:52 -08:00
Jason Ekstrand	bff45dc44e	nir: Add an indirect deref lowering pass	2016-01-05 13:42:52 -08:00
Ian Romanick	ee4676aa57	i915/i965: Fix typo in perf_debug message Trivial Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2016-01-05 13:18:45 -08:00
Brian Paul	a13e9adbee	st/mesa: minor indentation fixes	2016-01-05 13:04:46 -07:00
Kristian Høgsberg Kristensen	30521fb19e	vk: Implement a basic pipeline cache This is not really a cache yet, but it allows us to share one state stream for all pipelines, which means we can bump the block size without wasting a lot of memory.	2016-01-05 12:03:21 -08:00
Kristian Høgsberg Kristensen	f551047751	vk: Destroy device->mutex when destroying the device	2016-01-05 12:03:21 -08:00

... 171 172 173 174 175 ...

85652 commits