fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-28 01:18:15 +02:00

Author	SHA1	Message	Date
Dave Airlie	fd301472bd	r600/eg: dump event type in dumps This just makes it easier to debug some things. Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-11-27 12:53:18 +10:00
Tobias Klausmann	068a72fbcb	nouveau/compiler: Allow to omit line numbers when printing instructions This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff! V2: - Use environmental variable (Karol Herbst) V3: - Use the already populated nv50_ir_prog_info to forward information to the print pass (Pierre Moreau) V4: - get rid of default value in PrintPass constructor Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-11-26 12:51:30 -05:00
Nicolai Hähnle	0fed7f83ba	radeonsi: try flushing unflushed fences in si_fence_finish even when timeout == 0 Under certain conditions, waiting on a GL sync objects should act like a flush, regardless of the timeout. Portal 2, CS:GO, and presumably other Source engine games rely on this behavior and hang during loading without this fix. Fixes: `bc65dcab3b` ("radeonsi: avoid syncing the driver thread in si_fence_finish") Signed-off-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904	2017-11-26 16:53:00 +01:00
Ilia Mirkin	0bd83d0461	nv50/ir: move LateAlgebraicOpt to the very end Memory loads can take offsets, but the SHLADD will often attempt to consume the offsets too. As there may be multiple memory loads with the same base but different offsets, those would end up in a SHLADD instead of the offset of the memory operation. This moves the pass after we've had a chance to attempt to propagate immediate adds into the indirect offset. total instructions in shared programs : 6580681 -> 6567716 (-0.20%) total gprs used in shared programs : 944261 -> 943375 (-0.09%) total shared used in shared programs : 0 -> 0 (0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) total bytes used in shared programs : 60339896 -> 60221504 (-0.20%) local shared gpr inst bytes helped 0 0 555 2698 2698 hurt 0 0 138 336 336 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-11-26 01:10:19 -05:00
Ilia Mirkin	3072bbef63	nv50/ir: when merging immediates/consts, load directly When a MERGE operation gets its constraint moves added, it susbstantially extends live ranges to be reusing an immediate from earlier in the program (not to mention the silliness of loading an immediate into a register, and then moving into another register). We detect these scenarios and insert moves that take the immediate or constbuf load directly into the register. If it's the last use, then we can just move that operation to the closer location. With SM35 (255 regs) we get these results: total instructions in shared programs : 6583670 -> 6580681 (-0.05%) total gprs used in shared programs : 950818 -> 944261 (-0.69%) total shared used in shared programs : 0 -> 0 (0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) total bytes used in shared programs : 60367456 -> 60339896 (-0.05%) local shared gpr inst bytes helped 0 0 4584 3186 3186 hurt 0 0 55 968 968 I suspect they will be better for SM20 and SM30. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-11-26 01:10:19 -05:00
Ilia Mirkin	50e913b9c5	nv50/ir: add optimization for modulo by a non-power-of-2 value We can still use the optimized division methods which make use of multiplication with overflow. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>	2017-11-26 01:10:03 -05:00
Ilia Mirkin	3079993727	nv50/ir: optimize signed integer modulo by pow-of-2 It's common to use signed int modulo in GLSL. As it happens, the GLSL specs allow the result to be undefined, but that seems fairly surprising. It's not that much more effort to get it right, at least for positive modulo operators. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-11-25 22:48:09 -05:00
Ilia Mirkin	f39a91c152	freedreno/a4xx: add ARB_framebuffer_no_attachments support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>	2017-11-25 17:20:17 -05:00
Ilia Mirkin	4f748d12e8	freedreno/a4xx: add indirect draw support This is a copy of the a5xx logic. Fails a few tests, but basic functionality is there. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>	2017-11-25 17:20:17 -05:00
Ilia Mirkin	c3c8d48725	freedreno: regenerate pm4 header, adjust code for new names Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>	2017-11-25 17:20:17 -05:00
Ilia Mirkin	ffdcd51e66	freedreno/a4xx: add stencil texturing support Copied from a5xx, should be identical. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>	2017-11-25 17:20:17 -05:00
Ilia Mirkin	86f12e9377	freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx Unfortunately Adreno A4xx hardware returns incorrect results with the GATHER4 opcodes. As a result, we have to lower to 4 individual texture calls (txl since we have to force lod to 0). We achieve this using offsets, including on cube maps which normally never have offsets. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>	2017-11-25 16:56:59 -05:00
Marek Olšák	2cfa319f9f	radeonsi: expose all CB performance counters on Stoney Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-25 17:16:56 +01:00
Marek Olšák	797c447f1c	radeonsi: handle imported textures with DCC robustly now you can hack the driver to enable DCC for displayable textures and Glamor that doesn't enable that by default won't crash anymore. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-25 17:16:56 +01:00
Marek Olšák	992b6e18d0	radeonsi: fix a typo in creating monolithic ES-GS This has no effect because both occupy the same memory in a union. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-25 17:16:56 +01:00
Marek Olšák	f783677a82	radeonsi: don't write undefined output channels to LDS in LS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-25 17:16:56 +01:00
Marek Olšák	b63e7d4c6f	radeonsi: use ac.lds for shared memory Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-25 17:16:56 +01:00
Marek Olšák	39b098dafb	radeonsi: do 64-bit LDS loads recursively Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-25 17:16:56 +01:00
Wladimir J. van der Laan	35548cae93	etnaviv: Emit vertex buffers consecutively Vertex buffer legacy state is no longer picked up with new drawing commands. Change to use different cases depending on the number of vertex streams in the GPU specs. This results in slightly more compact state emission as well, on all vivantes. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2017-11-23 22:24:51 +01:00
Roland Scheidegger	71e630753e	r600: set DX10_CLAMP for compute shader too I really intended to set this for all shader stages by `3835009796` but missed it for compute shaders (because it's in a different source file...). Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-11-23 02:28:38 +01:00
Gert Wollny	799d350870	r600/shader: Fix all warnings issed with "-Wall -Wextra" - fix a number of -Wsign-compare warnings - fix two warnings for -Woverride-init because TGSI_OPCODE_CEIL == 83, and the according field was defined two times. [airlied: don't use -1 with unsigned type, fix whitespace] Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-11-22 22:50:18 +00:00
Gert Wollny	1d076aafbc	r600: Emit EOP for more CF instruction types So far on pre-cayman chipsets the CF instructions CF_OP_LOOP_END, CF_OP_CALL_FS, CF_OP_POP, and CF_OP_GDS an extra CF_NOP instruction was added to add the EOP flag, even though this is not actually needed, because all these instrutions support the EOP flag. This patch removes the fixup code, adds setting the EOP flag for the according instructions as well as others like CF_OP_TEX and CF_OP_VTX, and adds writing out EOP for this type of instruction in the disassembler. This also fixes a bug where shaders were created that didn't actually have the EOP flag set in the last CF instruction, which might have resulted in GPU lockups. [airlied: cleaned up a little] Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-11-22 22:39:42 +00:00
Eric Anholt	6a78416dab	broadcom/vc5: Fix BASE_LEVEL handling with txl. The HW doesn't add the base level anywhere (the min/max lod clamping is what does base level), so we need to add it manually in this case. Fixes piglit tex-miplevel-selection *Lod 2D.	2017-11-22 10:56:31 -08:00
Eric Anholt	c55813c22e	broadcom/vc5: Fix array texture layer count setup. Fixes piglit array-texture.	2017-11-22 10:56:31 -08:00
Eric Anholt	ad1521d708	broadcom/vc5: Don't increment primitive queries while they're paused. Fixes ext_transform_feedback-generatemipmap prims_generated	2017-11-22 10:56:31 -08:00
Eric Anholt	1214c2ea2a	broadcom/vc5: Fix incorrect padding of TF outputs. After the first output, we were padding by an extra size of the previous output. Fixes piglit ext_transform_feedback-output-type mat4x3[2] and friends.	2017-11-22 10:56:31 -08:00
Eric Anholt	b18840ac6e	broadcom/vc5: Fix UIF surface size setup for ARB_fbo's mismatched sizes. The HW was computing an implicit height for the surface based on the image size, but that may be smaller than the surface with ARB_fbo mismatched sizes. In that case, we need to tell it about the pad, either with the little 4-bit field in the RT config, or the extended field in CLEAR_COLORS_PART3. Fixes piglit arb_framebuffer_object-mixed-buffer-sizes.	2017-11-22 10:56:31 -08:00
Wladimir J. van der Laan	9f162fa107	etnaviv: Put HALTI level in specs The HALTI level is an indication of the gross architecture of the GPU. It determines for significant part what feature level the GPU has, what state (especially frontend state) is there, and where it is located. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2017-11-22 14:42:06 +01:00
Wladimir J. van der Laan	391c958f08	etnaviv: Const-correctness etnaviv_emit.h The relocation structure is never changed by submitting it. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2017-11-22 14:42:00 +01:00
Roland Scheidegger	b5957cee92	llvmpipe: fix snorm blending The blend math gets a bit funky due to inverse blend factors being in range [0,2] rather than [-1,1], our normalized math can't really cover this. src_alpha_saturate blend factor has a similar problem too. (Note that piglit fbo-blending-formats test is mostly useless for anything but unorm formats, since not just all src/dst values are between [0,1], but the tests are crafted in a way that the results are between [0,1] too.) v2: some formatting fixes, and fix a fairly obscure (to debug) issue with alpha-only formats (not related to snorm at all), where blend optimization would think it could simplify the blend equation if the blend factors were complementary, however was using the completely unrelated rgb blend factors instead of the alpha ones... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-11-21 04:06:29 +01:00
Dave Airlie	464c2d8083	r600: add cull distance support This passes all the tests in piglit. Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-11-21 09:00:52 +10:00
Eric Anholt	494effd242	broadcom/vc5: Align 1D texture miplevels to 64b. Fixes tex-miplevel-selection GL2:texture() 1D	2017-11-20 13:54:45 -08:00
Eric Anholt	9d5972da80	broadcom/vc5: Clamp min lod to the last level. Otherwise, the simulator would complain in tex-miplevel-selection that the min/max clamp was out of order. The actual HW seems to have clamped to the max anyway.	2017-11-20 13:52:33 -08:00
Eric Anholt	2c8913e224	broadcom/vc5: Increase simulator memory for tex-miplevel-selection. We were overflowing, because of all the little 4k allocations for CLs that were getting expanded to 128kb in the simulator due to the GMP alignment.	2017-11-20 13:52:33 -08:00
Tim Rowley	34838c2212	swr/rast: Repair simd8 frontend code rot Keep non-default simd8 frontend code running for comparison purposes. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:51:10 -06:00
Tim Rowley	005d937e15	swr/rast: Implement AVX-512 GATHERPS in SIMD16 fetch shader Disabled for now. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:51:06 -06:00
Tim Rowley	2e244c7168	swr/rast: Simplify GATHER* jit builder api General cleanup, and prep work for possibly moving to llvm masked gather intrinsic. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:51:01 -06:00
Tim Rowley	44025def06	swr/rast: Add alignment to transpose targets Needed to ensure alignment for avx512. Fixes address sanitizer crash. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:50:56 -06:00
Tim Rowley	bc356b0fc0	swr/rast: Cache eventmanager Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:50:51 -06:00
Tim Rowley	395a298fa5	swr/rast: Enable AVX-512 targets in the jitter Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:50:45 -06:00
Tim Rowley	37bb69fb88	swr/rast: Points with clipdistance can't go through simplepoints path Fixes piglit glsl-1.20:vs-clip-vertex-primitives and glsl-1.30:vs-clip-distance-primitives. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:50:38 -06:00
Tim Rowley	d9de8f3122	swr/rast: Code style change (NFC) Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:50:29 -06:00
Tim Rowley	08512c52de	swr/rast: Widen fetch shader to SIMD16 Widen fetch shader to SIMD16, enable SIMD16 types in the jitter, and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:50:23 -06:00
Tim Rowley	e612231f20	swr/rast: Support flexible vertex layout for DS output Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-11-20 13:49:59 -06:00
Nicolai Hähnle	3f17d3c017	gallium/u_threaded: avoid syncing in threaded_context_flush We could always do the flush asynchronously, but if we're going to wait for a fence anyway and the driver thread is currently idle, the additional communication overhead isn't worth it. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-20 18:16:15 +01:00
Nicolai Hähnle	bc65dcab3b	radeonsi: avoid syncing the driver thread in si_fence_finish It is really only required when we need to flush for deferred fences. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-20 18:16:11 +01:00
Nicolai Hähnle	3db1ce01b1	radeonsi: recompute the relative timeout after waiting for ready fence Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-20 18:16:06 +01:00
Nicolai Hähnle	f5ea8d18ff	ddebug: fix the hang detection timeout calculation Fixes: `c9fefa062b` ("ddebug: rewrite to always use a threaded approach") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-20 18:16:03 +01:00
Nicolai Hähnle	16f8da2997	ddebug: fix use-after-free of streamout targets Fixes: `b47727a83a` ("ddebug: implement pipelined hang detection mode") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-20 18:16:00 +01:00
Nicolai Hähnle	df5ebe0c26	radeonsi/gfx9: fix VM fault with fetched instance divisors We need to account for SGPR locations in merged shaders. This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations Fixes: `79c2e7388c` ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-20 16:26:10 +01:00

1 2 3 4 5 ...

20513 commits