fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-01 18:20:10 +01:00

Author	SHA1	Message	Date
Bas Nieuwenhuizen	a794f09017	radv: Don't generate radv_timestamp.h Not needed anymore. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-24 19:25:03 +01:00
Dave Airlie	bb8ac18340	radv: fix texel fetch offset with 2d arrays. The code didn't limit the offsets to the number supplied, so if we expected 3 but only got 2 we were accessing undefined memory. This fixes random failures in: dEQP-VK.glsl.texture_functions.texelfetchoffset.sampler2darray_* Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: "13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-11-24 18:06:05 +10:00
Eduardo Lima Mitev	116fed80ff	mesa/getteximage: Add validation of target to glGetTextureImage There is an specific list of texture targets that can be used with glGetTextureImage. From OpenGL 4.5 spec, section '8.11 Texture Queries', page 234 of the PDF: "An INVALID_ENUM error is generated if the effective target is not one of TEXTURE_1D , TEXTURE_2D , TEXTURE_3D , TEXTURE_1D_- ARRAY , TEXTURE_2D_ARRAY , TEXTURE_CUBE_MAP_ARRAY , TEXTURE_- RECTANGLE , one of the targets from table 8.19 (for GetTexImage and GetnTexImage only), or TEXTURE_CUBE_MAP (for GetTextureImage only)." We are currently not validating the target for glGetTextureImage. As an example, calling this function on a texture with target GL_TEXTURE_2D_MULTISAMPLE should return INVALID_ENUM, but instead it hits an assertion down the road in the i965 driver. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-24 08:24:07 +01:00
Eduardo Lima Mitev	89cbe0d21f	main/texobj: Check that texture id > 0 before looking it up in hash-table _mesa_lookup_texture_err() is not currently checking that the texture-id can be zero, but _mesa_HashLookup() doesn't expect the key to be zero, and will fail an assertion. Considering that _mesa_lookup_texture_err() is called from _mesa_GetTextureImage and _mesa_GetTextureSubImage with user provided arguments, we must validate the texture-id before looking it up in the hash-table. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-24 08:23:45 +01:00
Charmaine Lee	3233a9fe0b	util: fix memory leak from the fragment shaders for SINT<->UINT blits This patch deletes those fragment shaders in util_blitter_destroy(). Reviewed-by: Brian Paul <brianp@vmware.com>	2016-11-23 22:53:08 -08:00
Kenneth Graunke	ec1f159ac8	i965: Always reserve clip distance VUE slots in SSO mode. This fixes rendering in Dolphin on Vulkan since we enabled clip distances. (Dolphin on GL has a similar bug because the linker fails to eliminate unused clip distance built-in arrays, but it isn't using SSO...so that needs more fixing.) Also fixes a Piglit test: spec/glsl-1.50/execution/geometry.clip-distance-vs-gs-out-sso Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 21:23:38 -08:00
Ilia Mirkin	8cdf73c324	anv/gen7: only enable dual-source blending when there are dual-source factors Apparently the hw wedges otherwise, as mentioned in i965 comments. Reported-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 19:40:00 -08:00
Ilia Mirkin	a783b67e17	swr: clear every layer of the attached surfaces Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-23 20:34:02 -05:00
Ilia Mirkin	1a80ec0cd1	swr: [rasterizer core] pipe renderTargetArrayIndex through to clears Currently clears only operate on the 0th array index (ignoring surface layout parameters). Instead normalize to take a RTAI like all the load/store tile logic does, and use ComputeSurfaceAddress to properly take the surface state's lod/array index into account. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-23 20:33:50 -05:00
Ilia Mirkin	cec515999c	swr: [rasterizer core] clear data now comes in as float The non-fast-clear path was never updated after clear colors were passed in as floats. Remove the now-harmful conversion from unorm8. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-23 20:33:36 -05:00
Ilia Mirkin	74943db82c	swr: [rasterizer core] actually perform clear before store in GetHotTile When switching render target array indexes (as might happen in a GS, or in a future change, with layered clears), if the previous state is HOTTILE_CLEAR, we should actually clear the tile before saving it off. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-23 20:33:32 -05:00
Kenneth Graunke	5da84a7e12	i965: Fix a mistake from porting the URB allocation code to arrays. Commit `6d416bcd84` (i965: Use arrays in Gen7+ URB code.) introduced a regression which caused us to fail to allocate all of our URB space. - total_wants -= ds_wants; + total_wants -= additional; The new line should have been total_wants -= wants[i]. Fixes a large performance regression in TessMark. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98815 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-11-23 16:57:29 -08:00
Kenneth Graunke	903056e016	i965: Use 3DSTATE_CLIP's User Clip Distance Enable bitmask on Gen8+. Gen6-7.5 specify the user clip distance enable bitmask in 3DSTATE_CLIP. Gen8+ normally uses the new internal signalling mechanism to select the one specified in the last enabled shader stage (3DSTATE_VS, DS, or GS). This is a pretty good fit for Vulkan, or even newer GL, where the bitmask comes entirely from the shader. But with glClipPlane(), this is dynamic state, and we have to listen to _NEW_TRASNFORM. Clip plane enables are the only reason the VS/DS/GS atoms need to listen to _NEW_TRANSFORM. 3DSTATE_CLIP already has to listen to it in order to support ARB_clip_control settings. Setting the "Use the 3DSTATE_CLIP bitmask" force enable bit allows us to drop _NEW_TRANSFORM from all the shader stage atoms, so we can re-emit them less often. Improves performance of OglBatch7 (version 6) by 2.70773% +/- 0.491257% (n = 38) at 1024x768 on Cherryview. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-11-23 16:57:29 -08:00
Dave Airlie	3b6893b678	radv: fix flipped blits This fixes: dEQP-VK.api.copy_and_blit.blit_image.simple_tests.mirror* Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-11-23 23:49:32 +00:00
Dave Airlie	b06568873d	radv/meta: just local vars for src/dst subresources. This is just a cleanup before I rework this code to fix mirrored blits. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-11-23 23:49:23 +00:00
Fredrik Höglund	28c781b574	radv: add support for VK_AMD_draw_indirect_count Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-11-24 08:19:27 +10:00
Fredrik Höglund	eff7bbc47e	radv: add support for VK_AMD_negative_viewport_height The driver already supports this extension in practice. Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-11-24 08:19:24 +10:00
Fredrik Höglund	2c748c5c8a	radv: add support for VK_KHR_sampler_mirror_clamp_to_edge radv_tex_wrap() already supports VK_SAMPLER_ADDRESS_MODE_MIRROR_CLAMP_TO_EDGE, so all that's needed is to advertise support for the extension. Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-11-24 08:19:20 +10:00
Fredrik Höglund	5cbcbc75f4	radv: add support for anisotropic filtering on SI-CI Ported from radeonsi. Note that si_make_texture_descriptor() already sets img7 to the mask value referred to in the comment. Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-11-24 08:19:06 +10:00
Jordan Justen	72c00e7c47	i965/gen7: Only advertise 4 samples for RGBA32F on GLES We can't render to 8x MSAA if the width is greater than 64 bits. (see brw_render_target_supported) Fixes ES31-CTS.sample_variables.mask.rgba32f.samples_8.mask_* Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-11-23 11:15:31 -08:00
Marek Olšák	76e953788a	radeonsi: print new opt flags in si_dump_shader_key Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-23 18:49:10 +01:00
Marek Olšák	e5302ad936	radeonsi: add a debug flag that disables optimized shader variants Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-23 18:49:10 +01:00
Aaron Watry	ac458d2ae8	compiler/glsl/tests: Fix print format when building 32-bit binaries on 64-bit host Avoids two warnings. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-11-23 10:15:00 -06:00
Aaron Watry	60c3a0a67c	compiler/glsl/tests: Fix print format when building 32-bit binaries on 64-bit host Avoids three warnings. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-11-23 10:15:00 -06:00
Emil Velikov	5cc07d854c	anv: fix enumeration of properties Driver should enumerate only up-to min2(num_available, num_requested) properties and return VK_INCOMPLETE if the # of requested props is smaller than the ones available. Presently we assert out in such cases. Inspired by a similar fix for RADV. v2: Use MIN2 + typed_memcpy (Jason). Should fix: dEQP-VK.api.info.device.extensions Cc: "13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 14:13:47 +00:00
Ben Widawsky	0a0ce884ea	i965: Restructure fast clear eligibility decision v2 (Jason): - Use PRM citation for SKL now that it is available - Also return false for gen < 8 mipmapped/arrayed Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 11:06:53 +02:00
Topi Pohjolainen	f4c7989408	i965: Set initial msaa fast clear status explicitly instead of in intel_miptree_init_mcs(). For lossless compression the status is immediately overwritten in intel_miptree_alloc_non_msrt_mcs() while the status for non-compressed non-msaa miptrees is explicitly set in do_blorp_clear(). Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 11:06:53 +02:00
Topi Pohjolainen	dfd6088b3a	i965: Declare read-only input to level/layer check const Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 11:06:53 +02:00
Topi Pohjolainen	07d070f324	i965/fbo: Prepare layer multiplier for render buffer compression This path is not yet taken for fast cleared or compressed buffers but later patches will enable it. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 11:06:53 +02:00
Topi Pohjolainen	a2d029dc5f	i965: Add multi-slice getter for resolve maps This is useful when checking if any slice is in unresolved state. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 11:06:53 +02:00
Topi Pohjolainen	7c75fd9a59	i965/meta: Split conversion of color and setting it And fix a mangled comment while at it. v2 (Ben): Return the converted color. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 11:06:53 +02:00
Topi Pohjolainen	f19e0967c9	intel/blorp: Fix rectangle size for level-not-zero resolves Needed to prevent gpu hangs when mip-mapped compression gets enabled. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 11:06:52 +02:00
Topi Pohjolainen	ca84e190a4	i965/miptree: Don't shrink textures when augmenting for more levels This was detected when examining CCS_E failures with piglit test: "fbo-generatemipmap-formats". Test creates a 2D texture with dimensions 293x277. It manually loops over all levels and calls glTexImage2D(). Level one triggers creation of full miptree: intel_alloc_texture_image_buffer() realizes that there is only one level in the miptree and calls intel_miptree_create_for_teximage() to re-allocate the miptree with all 9 levels. However, the end result is a miptree with level zero dimensions of 292x276. Related, and possibly calling for treatment of its own is mip-map generation: After calling glTexImage2D() against every level test continues by replacing content for levels one to eight with data derived from level zero by calling glGenerateMipmapEXT(). This results into the miptree being allocated anew for every level: Mip-map generation goes thru meta which ends up validating the texture (brw_validate_textures()->intel_finalize_mipmap_tree()-> intel_miptree_match_image()) where one finds texture with base level size 292:276. This results into new miptree being created for the npot size 293:277. Only here intel_finalize_mipmap_tree() is asked for only one level, and therefore such is created. Generation for level one in turn finds right base level size but only one level when two is needed. And the same goes on for all eight levels. This patch prevents the shrink maintaining the NPOT size of 293x277. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 11:06:52 +02:00
Eduardo Lima Mitev	6e8f12619f	main/getteximage: Use the height argument to calculate memcpy copy size In get_tex_memcpy, when copying texture data directly from source to destination (when row strides match for both src and dst), the copy size is currently calculated using the full texture height instead of the sub-region height parameter that was passed. This can cause a read past the end of the mapped buffer when y-offset is greater than zero, leading to a segfault. Fixes CTS test (from crash to pass): * GL45-CTS/get_texture_sub_image/functional_test v2: (Jason) Use the passed 'height' instead of copying til the end of the buffer (tex-height - yoffset). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-23 09:22:32 +01:00
Iago Toral Quiroga	e062eb6415	nir/spirv: implement ordered / unordered floating point comparisons properly Besides the logical operation involved, these also require that we test if the operands are ordered / unordered. For ordered operations, both operands must be ordered (and they must pass the conditional test) while for unordered operations it is sufficient if only one of the operands is unordered (or they pass the logical test). Fixes the following Vulkan CTS tests: dEQP-VK.spirv_assembly.instruction.compute.opfunord.equal dEQP-VK.spirv_assembly.instruction.compute.opfunord.greater dEQP-VK.spirv_assembly.instruction.compute.opfunord.greaterequal dEQP-VK.spirv_assembly.instruction.compute.opfunord.less dEQP-VK.spirv_assembly.instruction.compute.opfunord.lessequal v2: Fixed typo: s/nir_eq/nir_feq Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2016-11-23 08:07:44 +01:00
Dave Airlie	9ce5926476	anv: fix segfault in anv_BindImageMemory Since bind image memory started memsetting surfaces, the device node can't be NULL, since we lookup device->info.has_llc. Not sure why it ever was NULL before. Fixes some things on my Ivybridge. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-11-23 16:11:03 +10:00
Tim Rowley	9c13cc9451	swr: [rasterizer core] fix cast for stencil clear value Bad type cast for stencil clear value was picking up structure padding bytes. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-22 20:06:17 -06:00
Ilia Mirkin	f6f644ea12	swr: color interpolation is also supposed to get perspective division Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	7cbfe59cf3	swr: add sprite coord enable mask to fs key This fixes gl-coord-replace-doesnt-eliminate-frag-tex-coords Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	6d6ef3fb55	swr: rework vert <-> frag shader linkage logic Fixes a few things: - sprite coords only apply to generic varyings, and are a bitmask - back color only applies in 2-sided lighting mode - handle some odd situations between only some front/back colors being there. This is only semi-legal in GL, but we shouldn't start crashing. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	2595aebd91	swr: flatshading makes color outputs flat, it doesn't affect others We were previously not marking the "regular" flat outputs as flat when flatshading was enabled. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	37be598dda	swr: only broadcast color0 value, not all color values The way that dual-source blending is described for GLES2 is very odd, and we end up with a shader that both has this property set and has a color1 value to be used as the second source. While changing the state tracker is an option, it seems more reliable to verify that the broadcast is only done on color0. Fixes arb_blend_func_extended-fbo-extended-blend-pattern_gles2 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	2234a4330e	swr: report a reasonable max lod bias This is the same value that llvmpipe uses. Since swr uses the same sampler logic, makes sense for this value to also be the same. Most applications don't care. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	2b7bdff83f	swr: avoid using exceptions for expected condition handling I was getting a weird segfault from GCC 4.9.3: 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6 #1 0x00007ffff4f128e5 in get_cie_encoding (cie=cie@entry=0x7ffff6e09813) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:272 #2 0x00007ffff4f1318e in classify_object_over_fdes (ob=ob@entry=0xd7bb90, this_fde=0x7ffff7f11010) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:628 #3 0x00007ffff4f135ba in init_object (ob=0xd7bb90) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:749 #4 search_object (ob=ob@entry=0xd7bb90, pc=pc@entry=0x7ffff4f11f4d <_Unwind_RaiseException+61>) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:961 #5 0x00007ffff4f13e62 in _Unwind_Find_registered_FDE (bases=0x7fffffffd358, pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:1025 #6 _Unwind_Find_FDE (pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>, bases=bases@entry=0x7fffffffd358) at /gcc-4.9.3/libgcc/unwind-dw2-fde-dip.c:450 #7 0x00007ffff4f11197 in uw_frame_state_for (context=context@entry=0x7fffffffd2b0, fs=fs@entry=0x7fffffffd100) at /gcc-4.9.3/libgcc/unwind-dw2.c:1245 #8 0x00007ffff4f11b15 in uw_init_context_1 (context=context@entry=0x7fffffffd2b0, outer_cfa=outer_cfa@entry=0x7fffffffd660, outer_ra=0x7ffff518d23b <__cxa_throw+91>) at /gcc-4.9.3/libgcc/unwind-dw2.c:1566 #9 0x00007ffff4f11f4e in _Unwind_RaiseException (exc=0xd7c250) at /gcc-4.9.3/libgcc/unwind.inc:88 #10 0x00007ffff518d23b in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6 #11 0x00007ffff51ed556 in std::__throw_out_of_range(char const*) () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6 #12 0x00007fffea778be0 in std::map<pipe_format, SWR_FORMAT, std::less<pipe_format>, std::allocator<std::pair<pipe_format const, SWR_FORMAT> > >::at ( this=0x7fffebeb4c40 <mesa_to_swr_format(pipe_format)::mesa2swr>, __k=@0x7fffffffd73c: PIPE_FORMAT_RGTC1_UNORM) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include/g++-v4/bits/stl_map.h:549 #13 0x00007fffea776aee in mesa_to_swr_format (format=PIPE_FORMAT_RGTC1_UNORM) at swr_screen.cpp:597 We can just void this whole issue by not using exceptions in the first place. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	946a7abd1c	swr: remove formats from mapping table that don't have StoreTile impls This table exists for the purpose of determining renderable formats. Without a StoreTile implementation, that can't happen. This basically removes rendering support to all L/LA/I formats. They can be re-added when/if StoreTile implementations are added. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	2e12d2ba72	swr: remove unnecessary -1 entries in format mapping table Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-22 20:27:20 -05:00
Ilia Mirkin	7cfb364b1a	swr: rework resource layout and surface setup This is a bit of a mega-commit, but unfortunately there's no great way to break this up since a lot of different pieces have to match up. Here we do the following: - change surface layout to match swr's Load/StoreTile expectations - fix sampler settings to respect all sampler view parameters - fix stencil sampling to read from secondary resource - respect pipe surface format, level, and layer settings - fix resource map/unmap based on the new layout logic - fix resource map/unmap to copy proper parts of stencil values in and out of the matching depth texture These fix a massive quantity of piglits, including all the tex-miplevel-selection ones. Note that the swr native miptree layout isn't extremely space-efficient, and we end up using it for all textures, not just the renderable ones. A back-of-the-envelope calculation suggests about 10%-25% increased memory usage for miptrees, depending on the number of LODs. Single-LOD textures should be unaffected. There are a handful of regressions as a result of this change: - Some textureGrad tests, these failures match llvmpipe. (There are debug settings allowing improved gallivm sampling accurancy.) - Some layered clearing tests as swr doesn't currently support that. It was getting lucky before because enough other things were broken. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-22 20:27:20 -05:00
Charmaine Lee	5d2b5996e1	util: fix missing swizzle components in the SINT <-> UINT conversion string Fixes tgsi error introduced in commit `3817a7a`. The error complains missing swizzle component in the conversion string "UMIN TEMP[0], TEMP[0], IMM[0].x". Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-11-23 01:54:57 +01:00
Eric Anholt	414dbb2d5c	vc4: Don't conditionalize the src1 mov of qir_SEL(). My thought in having both arguments conditionally moved was that it should theoretically save some power by not doing work in those channels. However, it ends up costing us instructions because we can't register-coalesce the first of the MOVs, and it also introduces extra scheduling dependencies. The instruction cost would swamp whatever power benefit I was hoping for. shader-db results: total instructions in shared programs: 100548 -> 99741 (-0.80%) instructions in affected programs: 42450 -> 41643 (-1.90%) With obvious outliers removed (I had an X11 emacs running over the network in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps improvement (n=18, 30).	2016-11-22 16:46:03 -08:00
Eric Anholt	1f0ba902f0	vc4: Re-add R4 to the "any" register class. I screwed this up in `fdad4d2402` which was supposed to be making this code more maintainable. What's amazing is multithreaded FS showed the wins it did despite this bug. shader-db results: total instructions in shared programs: 103535 -> 100548 (-2.89%) instructions in affected programs: 83794 -> 80807 (-3.56%)	2016-11-22 16:46:03 -08:00

... 105 106 107 108 109 ...

92185 commits