fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-26 14:38:13 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	587842a0ca	anv/gem: Add a helper for getting bit6 swizzling information	2016-01-18 17:21:05 -08:00
Jason Ekstrand	c2a6f4302e	nir/spirv: Patch through image qualifiers	2016-01-18 17:21:05 -08:00
Jason Ekstrand	56c8a5f2b8	nir/spirv: Implement ImageQuerySize for storage iamges SPIR-V only has one ImageQuerySize opcode that has to work for both textures and storage images. Therefore, we have to special-case that one a bit and look at the type of the incoming image handle.	2016-01-18 17:21:05 -08:00
Jason Ekstrand	bb8cadd169	nir/spirv: Insert movs around image intrinsics Image intrinsics always take a vec4 coordinate and always return a vec4. This simplifies the intrinsics a but but also means that they don't actually match the incomming SPIR-V. In order to compensate for this, we add swizzling movs for both source and destination to get the right number of components.	2016-01-18 17:21:05 -08:00
Ilia Mirkin	a31819cff8	nv50/ir: swap the least-ref'd source into src1 when both const/imm The whole point of inlining sources is to reduce loads. We can end up in a situation where one value is used a lot of times, and one value is used only once per instruction. The once-per-instruction one is the one that should get inlined, but with the previous algorithm, it was given no preference. This flips things around to preferring putting less-referenced values into src1 which increases the likelihood of them being inlined. While we're at it, adjust the heuristic to not treat 0 as an immediate, as well as (effectively) check for situations where LIMMs can't be loaded. All this yields improvements on nvc0: total instructions in shared programs : 6261157 -> 6255985 (-0.08%) total gprs used in shared programs : 945082 -> 943417 (-0.18%) total local used in shared programs : 30372 -> 30288 (-0.28%) total bytes used in shared programs : 50089256 -> 50047880 (-0.08%) local gpr inst bytes helped 21 822 3332 3332 hurt 0 278 565 565 And more importantly avoids generating really bad code with SSBOs, where we end up checking a lot of different values (usually immediates) against the length. On nv50 we get comparable results, and even improve packing (bytes went down more than instructions): total instructions in shared programs : 6346564 -> 6341277 (-0.08%) total gprs used in shared programs : 728719 -> 725131 (-0.49%) total local used in shared programs : 3552 -> 3552 (0.00%) total bytes used in shared programs : 43995688 -> 43932928 (-0.14%) local gpr inst bytes helped 0 1380 3252 3774 hurt 0 287 1710 1365 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-18 17:52:07 -05:00
Ilia Mirkin	af686e7de3	st/mesa: restore the stObj's size if it was cleared out An issue could still occur if the base level is set, but fixing that would require a lot more logic. This fixes the recently-failing texelFetch 3D tests because the mipmaps were no longer being generated, which in turn caused the copying logic to be hit, which in turn didn't work because of the broken width/height/depth. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-18 17:52:07 -05:00
Jason Ekstrand	6f956b0b22	anv/meta: Improve meta clear cleanup a bit	2016-01-18 14:07:46 -08:00
Jason Ekstrand	45d17fcf9b	anv: Misc allocation scope fixes	2016-01-18 14:04:13 -08:00
Jason Ekstrand	378af64e30	anv/meta: Add a meta allocator that uses SCOPE_DEVICE The Vulkan spec requires all allocations that happen for device creation to happen with SCOPE_DEVICE. Since meta calls into other things that allocate memory, the easiest way to do this is with an allocator.	2016-01-18 14:03:24 -08:00
Rob Clark	805e080ba0	freedreno/a4xx: use smaller threadsize for more registers Once we go past half of the "GPR" register file, it seems like we need to run frag shader with smaller threadsize. (The vertex shader already runs at TWO_QUADS, which is the minimum.) Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-18 16:58:25 -05:00
Rob Clark	6062941e4d	freedreno: per-generation OUT_IB packet Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled) version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per generation. Switch a3xx and a4xx over to using prefetch-enabled version (which is also what blob does.. it seems only on a2xx we cannot use PFE). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-18 16:58:25 -05:00
Jason Ekstrand	3dfa6a881c	anv/meta: Initialize a handle to null	2016-01-18 13:05:02 -08:00
Jason Ekstrand	d49298c702	gen8: Fix border color The border color packet is specified as a 64-byte aligned address relative to dynamic state base address. The way the packing functions are currently set up, we need to provide it with (offset >> 6) because it just shoves the bits in where the PRM says they go and isn't really aware that it's an address.	2016-01-18 12:16:31 -08:00
Jason Ekstrand	bfcc744892	genX/pack: Add a __gen_fixed helper and use it for TextureLODBias The __gen_fixed helper properly clamps the value and also handles negative values correctly. Eventually, we need to make the scripts generate this and use it for more things.	2016-01-18 11:35:04 -08:00
Jason Ekstrand	5a67df2546	anv/pack: Make TextureLODBias a proper 4.8 float XXX: We need to update the generators so this doesn't get stompped.	2016-01-18 10:36:53 -08:00
Jason Ekstrand	15e6af0708	nir/spirv: Handle if's where the merge is also a break or continue	2016-01-18 10:10:47 -08:00
Jason Ekstrand	14ebd0fdd7	nir/spirv: Hanle continues that use SSA values from the loop body Instead of emitting the continue before the loop body we emit it afterwards. Then, once we've finished with the entire function, we run nir_repair_ssa to add whatever phi nodes are needed.	2016-01-18 09:43:12 -08:00
Jason Ekstrand	61ba97522e	nir/lower_returns: Repair SSA after doing return lowering	2016-01-18 09:43:12 -08:00
Jason Ekstrand	b11825590d	nir: Add a pass to repair SSA form	2016-01-18 09:43:12 -08:00
Jason Ekstrand	a7a5e8a2de	nir/vars_to_ssa: Use the new nir_phi_builder helper The efficiency should be approximately the same. We do a little more work per phi node because we have to sort the predecessors. However, we no longer have to walk the blocks a second time to pop things off the stack. The bigger advantage, however, is that we can now re-use the phi placement and per-block SSA value tracking in other passes.	2016-01-18 09:18:42 -08:00
Jason Ekstrand	8aab4a7bd2	nir: Add a phi node placement helper Right now, we have phi placement code in two places and there are other places where it would be nice to be able to do this analysis. Instead of repeating it all over the place, this commit adds a helper for placing all of the needed phi nodes for a value.	2016-01-18 09:18:42 -08:00
Jason Ekstrand	b1f1200e80	util/bitset: Allow iterating over const bitsets	2016-01-18 09:18:42 -08:00
Emil Velikov	c03f3dd0a5	gallium: bundle the compat header u_pwr8.h in the tarball Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-01-18 13:37:58 +02:00
Emil Velikov	7bc714509b	mapi: include gl.xml in the tarball Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-01-18 13:37:58 +02:00
Emil Velikov	a78e08e88f	i965: adding missing headers to the dist tarball Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-01-18 13:37:58 +02:00
Christian König	eaf7ec9cfc	st/va: add motion adaptive deinterlacing v2 v2: minor cleanup Signed-off-by: Christian König <christian.koenig@amd.com>	2016-01-18 10:59:32 +01:00
Michel Dänzer	ad20be1f30	gallium/radeon: Rename do_invalidate_resource to invalidate_buffer And only call it from r600_invalidate_resource for buffer resources. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-18 17:39:37 +09:00
Michel Dänzer	0491dd1deb	st/dri: Don't call invalidate_resource for NULL depth/stencil buffers Fixes crash in 4 EGL piglit tests with radeonsi. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-18 17:39:37 +09:00
Michel Dänzer	a9ab7172a6	radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2016-01-18 17:39:37 +09:00
Michel Dänzer	4297259fc8	radeonsi: Print "LLVM emitted unknown config register" warning only once Say "LLVM" instead of "Compiler" for clarity. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-18 17:39:37 +09:00
Oded Gabbay	679a654a77	llvmpipe: use vpkswss when dst is signed This patch fixes a bug when building a pack instruction. For POWER (altivec), in case the destination is signed and the src width is 32, we need to use vpkswss. The original code used vpkuwus, which emits an unsigned result. This fixes the following piglit tests on ppc64le: - spec@arb_color_buffer_float@gl_rgba8-drawpixels - shaders@glsl-fs-fogscale I've also corrected some coding style issues in the function. v2: Returned else statements to vmware style Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-18 09:45:25 +02:00
Dave Airlie	119bef9543	glsl: fix subroutine lowering reusing actual parmaters One of the oglconform tests was crashing here, and it was due to not cloning the actual parameters before creating the new call. This makes a call clone function that does the right things to make sure we clone all the needed info, and points the callee at it. (It differs from ->clone due to this). this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this patch in my cts fixes tree, but hadn't had time to make sure I liked it. Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2016-01-18 15:02:34 +10:00
Timothy Arceri	9258d9f23d	glsl: remove special case for detecting stream duplicates Any duplicates in a single declaration will already fail the generic duplicates test due to the explicit_stream flag being set. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2016-01-18 13:09:28 +11:00
Timothy Arceri	eac2cece31	glsl: add missing explicit_stream flag to has_layout() This will allow the ARB_shading_language_420pack rules in glsl_parser.yy for catching duplicate layout qualifiers to be triggered for the stream identifier rather than relying on the code meant to catch duplicates within a single layout(...) Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2016-01-18 13:09:16 +11:00
Timothy Arceri	86677f1016	mesa: fix segfault in glUniformSubroutinesuiv() From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL 4.5 Core spec: "The command void UniformSubroutinesuiv(enum shadertype, sizei count, const uint *indices); will load all active subroutine uniforms for shader stage shadertype with subroutine indices from indices, storing indices[i] into the uniform at location i. The indices for any locations between zero and the value of ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not used will be ignored." V2: simplify NULL check suggested by Jason. Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dave Airlie <airlied@redhat.com> Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org https://bugs.freedesktop.org/show_bug.cgi?id=93731	2016-01-18 11:53:24 +11:00
Timothy Arceri	50376e0c0e	glsl: fix segfault linking subroutine uniform with explicit location Reviewed-by: Dave Airlie <airlied@redhat.com> Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org	2016-01-18 11:30:45 +11:00
Ilia Mirkin	4ac1274caa	gm107/ir: don't do indirect frag shader inputs on GM107 Apparently the IPA op decided to stop working with offsets. Need to figure out if we need to do an AL2P situation or something similar. For now just turn it back off. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-17 16:37:04 -05:00
Ilia Mirkin	3281ae96c8	tgsi: initialize Atomic field in tgsi_default_declaration Spotted by Coverity. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-17 16:37:04 -05:00
Ilia Mirkin	5a81b48ad0	nvc0: bsp_bo can't be null We already deref it earlier. And these are all allocated on load. Spotted by Coverity. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-17 16:37:04 -05:00
Oded Gabbay	529aa8249a	llvmpipe: fix arguments order given to vec_andc This patch fixes a classic "confuse the enemy" bug. _mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the arguments are opposite. _mm_andnot_si128 performs "r = (~a) & b" while vec_andc performs "r = a & (~b)" To make sure this error won't return in another place, I added a wrapper function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-17 21:07:27 +02:00
Rob Clark	02ac91d717	freedreno/ir3: fix mad 3rd src delay calc In `fad158a0` ("freedreno/ir3: array rework") the src # (n) shifted by one, but missed updating delay-slot calc. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-17 12:21:45 -05:00
Rob Clark	2a6ec1e061	freedreno/ir3: better array register allocation Detect arrays which don't conflict with each other and allow overlapping register allocation. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:23:52 -05:00
Rob Clark	6a33c5c0df	freedreno/ir3: array offset can be negative It at least happens with some piglit tests, like $piglit/bin/vp-address-01 VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], COLOR DCL CONST[0..7] DCL ADDR[0] 0: ARL ADDR[0].x, IN[1].xxxx 1: MOV_SAT OUT[1], CONST[ADDR[0].x-1] 2: DP4 OUT[0].x, CONST[4], IN[0] 3: DP4 OUT[0].y, CONST[5], IN[0] 4: DP4 OUT[0].z, CONST[6], IN[0] 5: DP4 OUT[0].w, CONST[7], IN[0] 6: END Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:23:20 -05:00
Rob Clark	ddede497b8	freedreno/ir3: workaround bug/feature Seems like in certain cases, we cannot use c<a0.x+0> as the third src to cat3 instructions. This may be slightly conservative, we may only have this restriction when the first src is also const. This fixes, for example, +24/-0 of the variable-indexing piglit tests. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:22:43 -05:00
Rob Clark	ebd3a1fc17	ttn: use writemask for store_var Only user is freedreno, and after array-rework it can cope. Avoids generating loads for a store. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:21:52 -05:00
Rob Clark	fad158a0e0	freedreno/ir3: array rework Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:21:08 -05:00
Rob Clark	cc7ed34df9	freedreno/ir3: refactor/simplify cp If we handle separately the special case of eliminating output mov (which includes keeps and various other cases where we don't have a consuming instruction's src register to collapse things into), we can simplify the logic. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:20:46 -05:00
Rob Clark	680664dff9	freedreno/ir3: fix incorrect decoding of mov instructions Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:20:37 -05:00
Rob Clark	2809c87f90	freedreno/ir3: remove unused tgsi tokens ptr Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:18:59 -05:00
Rob Clark	fc0d2f7e02	freedreno/ir3: bit of ra refactor Shuffle things slightly, passing instr-data to ra_name() to reduce the number of places where we need to add support for array names. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:18:47 -05:00

... 15 16 17 18 19 ...

71368 commits