fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-25 08:40:11 +01:00

Author	SHA1	Message	Date
Rafael Antognolli	a1a499e7fe	iris/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced. If the pixel pipes have a different number of subslices, emit a slice hashing table that will ensure proper workload distribution. v2: Don't need to set the mask - it's mbo (Ken). v3: Don't keep a reference to the resource used for emitting the table (Ken).	2019-08-12 16:19:08 -07:00
Rafael Antognolli	ad513fd386	intel: Get information about pixel pipes subslices. v2: Use 1 instead of 1UL (Ken).	2019-08-12 16:19:08 -07:00
Rafael Antognolli	32344dc581	intel/gen_decoder: Decode SLICE_HASH_TABLE.	2019-08-12 16:19:08 -07:00
Rafael Antognolli	e1cb71c182	intel/genxml: Update 3D_MODE and add SLICE_HASH_TABLE. Add these fields and the 3DSTATE_SLICE_TABLE_STATE_POINTERS instruction so we can properly configure the slice and subslice hashing on ICL+ v2: Make 'Mask' field a mbo (Ken).	2019-08-12 16:19:08 -07:00
Jason Ekstrand	d787a2d05e	anv: Implement VK_KHR_pipeline_executable_properties Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	67cb55ad11	anv: Add a ralloc context to anv_pipeline Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	fec4bdff40	anv: Force a full re-compile when CAPTURE_INTERNAL_REPRESENTATION_TEXT is set Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	651fbbf9b8	anv/pipeline: Split setting up per-stage keys into its own loop Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	78f3dfb4a2	anv: Record shader compile stats in the pipeline cache We're going to want these to be available regardless of caching. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	2af380d20f	anv/pipeline: Stash generated code in the pipeline stage Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	8d3cbd0393	intel/fs: Add SLM size to brw_cs_prog_data We don't need it for state setup but it's a useful statistic we want to pass on to developers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	134607760a	intel/compiler: Fill a compiler statistics struct This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Khaled Emara	2720ad5fd9	freedreno: disable tiling for cubemaps Tiling doesn't work quite well with cubemaps. Revert to linear textures, until it's fixed.	2019-08-12 22:30:54 +00:00
Khaled Emara	0ae16fb565	freedreno: add tiling parameters for 2D/2DArray/3D	2019-08-12 22:30:54 +00:00
Khaled Emara	aeaba3e4a6	freedreno: simplified slices setup for a3xx a3xx doesn't support ASTC and layout_first always returns false	2019-08-12 22:30:54 +00:00
Khaled Emara	e11a239e8c	freedreno: enable tiled textures for debug builds	2019-08-12 22:30:54 +00:00
Paulo Zanoni	866bb775de	intel/fs: add 64 bit integer multiplication lowering While NIR's lower_imul64() solves the case of 64 bit integer multiplications generated early, we don't have a way to lower such instructions when they are generated by our own backend, such as the scan/reduce intrinsics. We'll need this soon, so implement it now. An easy way to test this is to simply disable nir_lower_imul64 to let those operations reach the backend. v2: - Fix Q/UQ copy/paste errors (Caio). - Transform an 'if' into 'else if' (Caio). - Add an extra comment to clarify the need for 64b = 32b * 32b (Caio). - Make private functions private (Caio). v3: - Remove ambiguity with 'b' and 'd' variables (Caio). - Allocate potentially less regs for the dwords (Caio). Cc: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Matt Turner <matt.turner@intel.com> Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-08-12 15:16:23 -07:00
Paulo Zanoni	9217cf3b5e	intel/compiler: invert the logic of lower_integer_multiplication() Invert the logic of how progress is handled: remove the continue statements and mark progress inside the places where it actually happens. We're going to add a new lowering that also looks for BRW_OPCODE_MUL, so inverting the logic here makes the resulting code much easier to follow. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-08-12 15:16:23 -07:00
Paulo Zanoni	6ba4717924	intel/compiler: don't instantiate a builder for each instruction Don't instantiate a builder for each instruction during lower_integer_multiplication(). Instantiate one only when needed. On the other hand, these unneeded builders don't seem to cost much to init, so I don't expect any significant difference in performance: this is mostly about code organization. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-08-12 15:16:23 -07:00
Paulo Zanoni	75b3868dcc	intel/compiler: extract subfunctions of lower_integer_multiplication() The lower_integer_multiplication() function is already a little too big. I want to add more to it, so let's reorganize the existing code first. Let's start with just extracting the current code to subfunctions. Later we'll change them a little more. v2: Make private functions private (Caio). v3: Fix typo (Caio). Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-08-12 15:16:23 -07:00
Rhys Perry	7740149852	nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 22:01:30 +00:00
Rhys Perry	da8ed68aca	nir: replace nir_move_load_const() with nir_opt_sink() This is mostly the same as nir_move_load_const() but can also move undef instructions, comparisons and some intrinsics (being careful with loops). v2: actually delete nir_move_load_const.c v3: fix nir_opt_sink() usage in freedreno v3: update Makefile.sources v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def v4: handle if uses v4: fix handling of nested loops v5: re-write adjust_block_for_loops v5: re-write setting of use_block for if uses Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 22:01:30 +00:00
Francisco Jerez	c2fe7a0fb8	anv/gen9: Optimize slice and subslice load balancing behavior. See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. According to Jason, improves Aztec Ruins performance by 2.7%. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) v2: Undo CPU performance micro-optimization done in i965 and iris due to lack of data justifying it on anv. Use cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe control command directly. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-12 14:40:21 -07:00
Andreas Baierl	1c45541c7f	lima/ppir: Add fddx and fddy Lower fddx and fddy and set the right bits in codegen. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Erico Nunes <nunes.erico@gmail.com>	2019-08-12 23:20:04 +02:00
Bas Nieuwenhuizen	f1da129220	radv: Enable VK_KHR_pipeline_executable_properties. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	afad67cd7a	radv: Implement radv_GetPipelineExecutableStatisticsKHR. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	35302f0189	radv: Implement radv_GetPipelineExecutableInternalRepresentationsKHR. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	86864eedd2	radv: Implement radv_GetPipelineExecutablePropertiesKHR. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	8874af8ef4	radv: Keep shader info when needed. This allows enabling the shader info keeping on a per shader basis. Also disables the cache on a per shader basis. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	e8a256eb54	radv: Add VK_KHR_pipeline_executable_properties in disabled state. So we can add the functions. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	5444d3e0c2	radv: Use string for nir dumping. Reviewed-by: Dave Airlie <airlied@redhat.com> Allows us to easily dump all nir shaders for combined variants in vega and simplifies ownership.	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	739a2880f5	radv: Get max workgroup size without nir. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	290ca0c4dd	radv: Add utility function to calculate max waves. Not AC because a lot of it is data extraction out of radv structs. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Francisco Jerez	026773397b	iris/gen9: Optimize slice and subslice load balancing behavior. See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-12 13:17:58 -07:00
Francisco Jerez	03cba9f5d9	intel/genxml: Add GT_MODE hashing defs for Gen9. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-12 13:17:58 -07:00
Francisco Jerez	9406b3a5c1	i965/gen9: Optimize slice and subslice load balancing behavior. The default pixel hashing mode settings used for slice and subslice load balancing are far from optimal under certain conditions (see the comments below for the gory details). The top-of-the-line GT4 parts suffer from a particularly severe performance problem currently due to a subslice load balancing issue. Fixing this seems to improve graphics performance across the board for most of the benchmarks in my test set, up to ~20% in some cases, e.g. from SKL GT4: unigine/valley: 3.44% ±0.11% gfxbench/gl_manhattan31: 3.99% ±0.13% gputest/pixmark_piano: 7.95% ±0.33% synmark/OglTexFilterAniso: 15.22% ±0.07% synmark/OglTexMem128: 22.26% ±0.06% Lower-end platforms are also affected by some subslice load imbalance to a lesser degree, especially during CCS resolve and fast clear operations, which are handled specially here due to rasterization ocurring in reduced CCS coordinates, which changes the semantics of the pixel hashing mode settings. No regressions seen during my tests on some SKL, KBL and BXT configurations. Additional benchmark reports welcome on any Gen9 platforms (that includes anything with Skylake, Broxton, Kabylake, Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your renderer string). P.S.: A similar problem is likely to be present on other non-Gen9 platforms, especially for CCS resolve and fast clear operations. Will follow-up with additional patches fixing the hashing mode for those once I have enough performance data to justify it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-12 13:17:58 -07:00
Alyssa Rosenzweig	b1965831e4	pan/midgard: Handle 64-bit address in mir_mask_of_read_components This is a bit of a hack, but it'll hold us over until we have 64-bit support wired through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig	41e68094f8	pan/midgard: Allocate separate spill indices for lowered moves This helps RA be slightly more reasonable. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig	14b5b9ac38	pan/midgard: Extend liveness analysis to trinary ops Fixes RA fails with multiple indirect SSBO writes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig	c690b37d76	pan/midgard: Fix load/store pairing This used a delicate hack to try to find indirect inputs and skip them as candidates for pairing. Let's use a better criterion -- no sources -- and pair based on that. We could do better, but that would require more complex data flow analysis than we're interested in doing here. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig	15954ab6ca	pan/midgard: Implement nir_intrinsic_load_num_work_groups Just a sysval to route through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig	7229af794b	pan/midgard: Implement some compute builtins We implement gl_WorkGroupID and gl_LocalInvocationID, which map to ld_compute_id with special sources. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig	2b4e579585	pan/midgard: Rename ld_global_id -> ld_compute_id It's used for more general loads within a compute shader. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig	a5059f2cba	pan/midgard: Handle partial writes in liveness analysis This allows liveness analysis within a loop to be more fine grained, fixing RA failures with partial spilled movs within a loop, as well as enabling a slight reduction of register pressure more generally: total registers in shared programs: 350 -> 347 (-0.86%) registers in affected programs: 12 -> 9 (-25.00%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	e333bf606f	pan/midgard: Dump "no spill"? Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	cc3df917d3	pan/midgard: Absorb nonexistance sources Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	0a7cc239bd	pan/midgard: Pretty-print destinations They're not "sources" but they follow the same conventions. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	ba8ec19a64	pan/midgard: Pretty-print units Since we are seeing some use of MIR post-scheduling, let's get this printed right. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	73f54f286a	pan/midgard: Print mask in dumped MIR Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	2ec4f9a74b	pan/midgard: Add no_spill flag Hint for the RA to avoid infinite spilling loops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00

1 2 3 4 5 ...

114334 commits