fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-17 13:30:20 +01:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	40abf11708	panfrost/midgard: Dump MIR of RA failure Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	a08e9511e3	pan/midgard; Dump successor graph when printing MIR We just use the pointers of the midgard_block*, which is crude, but it gets the point across and will help debug successor related issues. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	1aa556de2e	pan/midgard: Remove debug statement Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	21510c253c	panfrost/midgard: Implement register spilling Now that we run RA in a loop, before each iteration after a failed allocation we choose a spill node and spill it to Thread Local Storage using st_int4/ld_int4 instructions (for spills and fills respectively). This allows us to compile complex shaders that normally would not fit within the 16 work register limits, although it comes at a fairly steep performance penalty. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	533d65786f	panfrost/midgard: Add mir_has_arg helper Helps scan the MIR for uses of an index. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	076838ef0c	panfrost/midgard: Check write-before-read in liveness analysis If we write to an index before reading it, the old copy we're checking liveness for isn't live in this block, even if it does get read later. Fixes abnormally high register pressure in shaders with loops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	997f85c136	panfrost/midgard/disasm: Check for certain tag errors Midgard bundles contain a tag, as well as a copy of the tag of the next bundle to facilitate prefetch. Do some simple static analysis to detect certain tag errors (particularly on shaders without branching). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	d168b08d62	pan/midgard: Add OP_IS_CSEL helper Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	1f297471a0	pan/midgard: Add mir_rewrite_index_src_single helper Rather than rewriting an index away across the whole block, we expose finer (per-instruction) granularity for rewrites. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	16c8c354d0	pan/midgard: Ignore inline_constant in liveness It doesn't make any sense to look at it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	d155168e6c	panfrost/midgard: Implement load/store scratch opcodes These are used to load/store from Thread Local Storage, which is memory allocated per-thread (corresponding to ctx->scratchpad in the command stream) and used for register spilling. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	3bb780ecb9	pan/midg/disasm: Check for int varying ops Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	7e052d9332	pan/midgard: Remove "aliasing" It was a crazy idea that didn't pan out. We're better served by a good copyprop pass. It's also unused now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	3174bc9972	panfrost: Promote uniform registers late Rather than creating either a load or a uniform register read with a fixed beginning offset, we always create a load and then promote to a uniform register later. This will allow us to promote in a register pressure aware manner. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	aa03159120	pan/midgard: Call scheduler/RA in a loop This will allow us to insert instructions as a result of register allocation, permitting spilling to be implemented. As a side effect, with the assert commented out this would fix a bunch of glamor crashes (due to RA failures) so MATE becomes useable. Ideally we'll have scheduling or RA actually sorted out before the branch point but if not this gives us a one-line out to get X working... Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:33 -07:00
Alyssa Rosenzweig	1cabb8a706	pan/midgard: Remove custom register selection callback What we have is equivalent to the default callback; let's use that. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:33 -07:00
Samuel Pitoiset	b5116d3cb7	radv: fix crash in vkCmdClearAttachments with unused attachment depth_stencil_attachment and/or ds_resolve attachment can be NULL. This fixes crashes with dEQP-VK.renderpass.suballocation.unused_clear_attachments.* Cc: 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 14:25:54 +02:00
Sergii Romantsov	253be49402	i965: free object labels when deleting Some leaks detected with GL_KHR_debug on i965. CC: Timothy Arceri <t_arceri@yahoo.com.au> Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-22 12:39:32 +03:00
Samuel Pitoiset	915abbe932	radv/gfx10: update descriptors for inline uniform blocks Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:42 +02:00
Samuel Pitoiset	d76746c1ff	radv/gfx10: emit the GS NGG prologue before the nested barrier Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:39 +02:00
Samuel Pitoiset	8c97a07967	radv/gfx10: do not allocate space for the ZPASS_DONE bug GFX10 isn't affected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:35 +02:00
Samuel Pitoiset	1fb7bd046b	radv/gfx10: do not set ELEMENT_SIZE for buffer descriptors This field doesn't exist. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:31 +02:00
Samuel Pitoiset	1878090b68	radv: clean up fill_geom_tess_rings() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:28 +02:00
Samuel Pitoiset	e7c356866e	radv: change a bunch of >= GFX9 to == GFX9 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:26 +02:00
Samuel Pitoiset	6049745b13	ac/nir: do not clamp shadow reference on GFX10 RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures on GFX10 and RADV doesn't promotes Z16 or Z24. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:22 +02:00
Daniel Schürmann	64b7386ee8	radv: move nir_opt_conditional_discard out of optimization loop This late optimization pass is only affected by nir_opt_if() and handles all cases in a single pass. It's enough to call it once after the optimization loop. No changes on vkpipeline-db. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 08:12:18 +02:00
Iago Toral Quiroga	dacaf7ec06	v3d: fill logicop_func in the fragment shader key when precompiling shaders Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng of logic ops on precompiled shaders, which we don't want to do. Also, this had the side effect of making shader-db crash, as during this lowering we would try to read the color format swizzle information from the fragment shader key that we don't populate in precompiled shaders because right now we only need it when logic operations are enabled. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 08:05:59 +02:00
Jose Maria Casanova Crespo	9bf0bdf776	v3d: Avoid scheduling an instruction that stalls waiting for SFU retval If we detect that a scheduling candidate will stall because having a register source that is the written by the SFU unit in the previous instruction we reduce its priority so any non stalling operation would be chosen. The latency of SFU operations is defined as 2. So they would be scheduled earlier if other candidates have the same priority. Finally we won't merge instructions that stall to a previously chosen one. As the result of the previous one would be waiting for an extra cycle. Although shader-db result show that instruction are hurt with an increase of 0.35% the sum of instructions + stalls is reduced a 0.52%. And the total of sfu-stalls is reduced a 63.51%. It implies also a small increase in the max-temps metric because of scheduling earlier SFU operations. total instructions in shared programs: 9102719 -> 9117851 (0.17%) instructions in affected programs: 4324628 -> 4339760 (0.35%) helped: 4162 HURT: 12128 helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1 helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51% HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1 HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68% 95% mean confidence interval for instructions value: 0.90 0.96 95% mean confidence interval for instructions %-change: 0.47% 0.50% Instructions are HURT. total max-temps in shared programs: 1327728 -> 1327812 (<.01%) max-temps in affected programs: 4730 -> 4814 (1.78%) helped: 61 HURT: 134 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26% 95% mean confidence interval for max-temps value: 0.28 0.58 95% mean confidence interval for max-temps %-change: 1.80% 3.52% Max-temps are HURT. total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%) sfu-stalls in affected programs: 95029 -> 31802 (-66.53%) helped: 25882 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2 helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00% 95% mean confidence interval for sfu-stalls value: -2.47 -2.42 95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54% Sfu-stalls are helped. total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%) inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%) helped: 22728 HURT: 855 helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92% HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86% 95% mean confidence interval for inst-and-stalls value: -2.07 -2.01 95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05% Inst-and-stalls are helped. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 03:00:50 +02:00
Jose Maria Casanova Crespo	c341ab7ffb	v3d: add shader-db stat to count SFU stalls SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 03:00:50 +02:00
Eric Engestrom	f7224014df	radv: replace memset()+strcpy() with snprintf() Just like the next line :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-21 10:38:17 +01:00
Eric Engestrom	29e8f15bdc	radv: drop unnecessary memset() before snprintf() snprintf() always terminates the string. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-21 10:38:17 +01:00
Bas Nieuwenhuizen	451f030c06	radv: Fix uninitialized warning. For es_vgpr_comp_cnt. Fixes: `795adbbadd` "radv/gfx10: Add pipeline state support for tess." Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-21 01:39:08 +02:00
Chia-I Wu	d31d25f634	virgl: fix a sync issue in virgl_buffer_transfer_extend In virgl_buffer_transfer_extend, when no flush is needed, it tries to extend a previously queued transfer instead if it can find one. Comparing to virgl_resource_transfer_prepare, it fails to check if the resource is busy. The existence of a previously queued transfer normally implies that the resource is not busy, maybe except for when the transfer is PIPE_TRANSFER_UNSYNCHRONIZED. Rather than burdening us with a lengthy comment, and potential concerns over breaking it as the transfer code evolves, this commit makes the valid_buffer_range check the only condition to take the fast path. In real world, we hit the fast path almost only because of the valid_buffer_range check. In micro benchmarks, the condition should always be true, otherwise the benchmarks are not very representative of meaningful workloads. I think this fix is justified. The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the fast path. This commit re-enables it as well. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-07-19 18:04:42 -07:00
Chia-I Wu	324c20304e	virgl: rework virgl_transfer_queue_extend Do not take a transfer and do the memcpy. Add a _buffer suffix to the function name to make it clear that it is only for buffers. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-07-19 18:04:37 -07:00
Chia-I Wu	2b8ad88078	virgl: fix virgl_buffer_transfer_extend Without setting hw_res, virgl_transfer_queue_extend never finds a match and always returns NULL. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-07-19 18:04:34 -07:00
Marek Olšák	bcabf75ab7	radeonsi: initialize scissor registers etc. without clear state Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:56 -04:00
Marek Olšák	47f41af06c	radeonsi: return success from vi_dcc_clear_level to simplify callers Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:54 -04:00
Marek Olšák	7a764b963a	radeonsi: fix compute-based culling regression in `1ce52c1e37` Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:50 -04:00
Marek Olšák	c741bed6e8	radeonsi/gfx10: fix VGT_PRIMITIVE_TYPE programming Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	a0d330bedb	radeonsi/gfx10: enable Wave32 for vertex, geometry, and tessellation shaders Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	1d82240f55	radeonsi/gfx10: add debug options to enable/disable Wave32 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	8f72f137ad	radeonsi/gfx10: add as_ngg variant for TES as ES to select Wave32/64 Legacy GS has to use Wave64, so TES before GS has to use Wave64 too. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	88efb63caf	radeonsi/gfx10: implement Wave32 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	54e6900ede	radeonsi/gfx10: use 32-bit wavemasks for Wave32 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	81091a5183	ac: create the LLVM builder in ac_llvm_context_init Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	eb54b8c222	ac: create the LLVM module for Wave32 or Wave64 in ac_llvm_context_init Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	921c1d24d5	ac/rtld: add support for Wave32 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	73aa04e40d	ac: add Wave32 LLVM target machine Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	9e467d111b	ac: initial Wave32 support in LLVM build helpers Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	c35e926a81	radeonsi: assume that selector != NULL for compute shaders Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:48 -04:00

... 40 41 42 43 44 ...

115447 commits