Commit graph

114334 commits

Author SHA1 Message Date
Rafael Antognolli
a1a499e7fe iris/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.

v2: Don't need to set the mask - it's mbo (Ken).
v3: Don't keep a reference to the resource used for emitting the table
(Ken).
2019-08-12 16:19:08 -07:00
Rafael Antognolli
ad513fd386 intel: Get information about pixel pipes subslices.
v2: Use 1 instead of 1UL (Ken).
2019-08-12 16:19:08 -07:00
Rafael Antognolli
32344dc581 intel/gen_decoder: Decode SLICE_HASH_TABLE. 2019-08-12 16:19:08 -07:00
Rafael Antognolli
e1cb71c182 intel/genxml: Update 3D_MODE and add SLICE_HASH_TABLE.
Add these fields and the 3DSTATE_SLICE_TABLE_STATE_POINTERS instruction
so we can properly configure the slice and subslice hashing on ICL+

v2: Make 'Mask' field a mbo (Ken).
2019-08-12 16:19:08 -07:00
Jason Ekstrand
d787a2d05e anv: Implement VK_KHR_pipeline_executable_properties
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
67cb55ad11 anv: Add a ralloc context to anv_pipeline
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
fec4bdff40 anv: Force a full re-compile when CAPTURE_INTERNAL_REPRESENTATION_TEXT is set
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
651fbbf9b8 anv/pipeline: Split setting up per-stage keys into its own loop
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
78f3dfb4a2 anv: Record shader compile stats in the pipeline cache
We're going to want these to be available regardless of caching.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
2af380d20f anv/pipeline: Stash generated code in the pipeline stage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
8d3cbd0393 intel/fs: Add SLM size to brw_cs_prog_data
We don't need it for state setup but it's a useful statistic we want to
pass on to developers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
134607760a intel/compiler: Fill a compiler statistics struct
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct.  This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Khaled Emara
2720ad5fd9 freedreno: disable tiling for cubemaps
Tiling doesn't work quite well with cubemaps.
Revert to linear textures, until it's fixed.
2019-08-12 22:30:54 +00:00
Khaled Emara
0ae16fb565 freedreno: add tiling parameters for 2D/2DArray/3D 2019-08-12 22:30:54 +00:00
Khaled Emara
aeaba3e4a6 freedreno: simplified slices setup for a3xx
a3xx doesn't support ASTC and layout_first always returns false
2019-08-12 22:30:54 +00:00
Khaled Emara
e11a239e8c freedreno: enable tiled textures for debug builds 2019-08-12 22:30:54 +00:00
Paulo Zanoni
866bb775de intel/fs: add 64 bit integer multiplication lowering
While NIR's lower_imul64() solves the case of 64 bit integer multiplications
generated early, we don't have a way to lower such instructions when they are
generated by our own backend, such as the scan/reduce intrinsics. We'll need
this soon, so implement it now.

An easy way to test this is to simply disable nir_lower_imul64 to let
those operations reach the backend.

v2:
  - Fix Q/UQ copy/paste errors (Caio).
  - Transform an 'if' into 'else if' (Caio).
  - Add an extra comment to clarify the need for 64b = 32b * 32b
    (Caio).
  - Make private functions private (Caio).
v3:
  - Remove ambiguity with 'b' and 'd' variables (Caio).
  - Allocate potentially less regs for the dwords (Caio).

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Matt Turner <matt.turner@intel.com>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
9217cf3b5e intel/compiler: invert the logic of lower_integer_multiplication()
Invert the logic of how progress is handled: remove the continue
statements and mark progress inside the places where it actually
happens.

We're going to add a new lowering that also looks for BRW_OPCODE_MUL,
so inverting the logic here makes the resulting code much easier to
follow.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
6ba4717924 intel/compiler: don't instantiate a builder for each instruction
Don't instantiate a builder for each instruction during
lower_integer_multiplication(). Instantiate one only when needed.

On the other hand, these unneeded builders don't seem to cost much to
init, so I don't expect any significant difference in performance:
this is mostly about code organization.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
75b3868dcc intel/compiler: extract subfunctions of lower_integer_multiplication()
The lower_integer_multiplication() function is already a little too
big. I want to add more to it, so let's reorganize the existing code
first. Let's start with just extracting the current code to
subfunctions. Later we'll change them a little more.

v2: Make private functions private (Caio).
v3: Fix typo (Caio).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Rhys Perry
7740149852 nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Rhys Perry
da8ed68aca nir: replace nir_move_load_const() with nir_opt_sink()
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).

v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Francisco Jerez
c2fe7a0fb8 anv/gen9: Optimize slice and subslice load balancing behavior.
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.  According to Jason, improves Aztec Ruins
performance by 2.7%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)

v2: Undo CPU performance micro-optimization done in i965 and iris due
    to lack of data justifying it on anv.  Use
    cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe
    control command directly.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-12 14:40:21 -07:00
Andreas Baierl
1c45541c7f lima/ppir: Add fddx and fddy
Lower fddx and fddy and set the right bits in codegen.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
2019-08-12 23:20:04 +02:00
Bas Nieuwenhuizen
f1da129220 radv: Enable VK_KHR_pipeline_executable_properties.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
afad67cd7a radv: Implement radv_GetPipelineExecutableStatisticsKHR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
35302f0189 radv: Implement radv_GetPipelineExecutableInternalRepresentationsKHR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
86864eedd2 radv: Implement radv_GetPipelineExecutablePropertiesKHR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
8874af8ef4 radv: Keep shader info when needed.
This allows enabling the shader info keeping on a per shader basis.
Also disables the cache on a per shader basis.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
e8a256eb54 radv: Add VK_KHR_pipeline_executable_properties in disabled state.
So we can add the functions.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
5444d3e0c2 radv: Use string for nir dumping.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Allows us to easily dump all nir shaders for combined variants in
vega and simplifies ownership.
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
739a2880f5 radv: Get max workgroup size without nir.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
290ca0c4dd radv: Add utility function to calculate max waves.
Not AC because a lot of it is data extraction out of radv structs.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Francisco Jerez
026773397b iris/gen9: Optimize slice and subslice load balancing behavior.
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 13:17:58 -07:00
Francisco Jerez
03cba9f5d9 intel/genxml: Add GT_MODE hashing defs for Gen9.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 13:17:58 -07:00
Francisco Jerez
9406b3a5c1 i965/gen9: Optimize slice and subslice load balancing behavior.
The default pixel hashing mode settings used for slice and subslice
load balancing are far from optimal under certain conditions (see the
comments below for the gory details).  The top-of-the-line GT4 parts
suffer from a particularly severe performance problem currently due to
a subslice load balancing issue.  Fixing this seems to improve
graphics performance across the board for most of the benchmarks in my
test set, up to ~20% in some cases, e.g. from SKL GT4:

unigine/valley:                3.44% ±0.11%
gfxbench/gl_manhattan31:       3.99% ±0.13%
gputest/pixmark_piano:         7.95% ±0.33%
synmark/OglTexFilterAniso:    15.22% ±0.07%
synmark/OglTexMem128:         22.26% ±0.06%

Lower-end platforms are also affected by some subslice load imbalance
to a lesser degree, especially during CCS resolve and fast clear
operations, which are handled specially here due to rasterization
ocurring in reduced CCS coordinates, which changes the semantics of
the pixel hashing mode settings.

No regressions seen during my tests on some SKL, KBL and BXT
configurations.  Additional benchmark reports welcome on any Gen9
platforms (that includes anything with Skylake, Broxton, Kabylake,
Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your
renderer string).

P.S.: A similar problem is likely to be present on other non-Gen9
      platforms, especially for CCS resolve and fast clear operations.
      Will follow-up with additional patches fixing the hashing mode
      for those once I have enough performance data to justify it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 13:17:58 -07:00
Alyssa Rosenzweig
b1965831e4 pan/midgard: Handle 64-bit address in mir_mask_of_read_components
This is a bit of a hack, but it'll hold us over until we have 64-bit
support wired through.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig
41e68094f8 pan/midgard: Allocate separate spill indices for lowered moves
This helps RA be slightly more reasonable.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig
14b5b9ac38 pan/midgard: Extend liveness analysis to trinary ops
Fixes RA fails with multiple indirect SSBO writes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig
c690b37d76 pan/midgard: Fix load/store pairing
This used a delicate hack to try to find indirect inputs and skip them
as candidates for pairing. Let's use a better criterion -- no sources --
and pair based on that.

We could do better, but that would require more complex data flow
analysis than we're interested in doing here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig
15954ab6ca pan/midgard: Implement nir_intrinsic_load_num_work_groups
Just a sysval to route through.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig
7229af794b pan/midgard: Implement some compute builtins
We implement gl_WorkGroupID and gl_LocalInvocationID, which map to
ld_compute_id with special sources.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig
2b4e579585 pan/midgard: Rename ld_global_id -> ld_compute_id
It's used for more general loads within a compute shader.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig
a5059f2cba pan/midgard: Handle partial writes in liveness analysis
This allows liveness analysis within a loop to be more fine grained,
fixing RA failures with partial spilled movs within a loop, as well as
enabling a slight reduction of register pressure more generally:

total registers in shared programs: 350 -> 347 (-0.86%)
registers in affected programs: 12 -> 9 (-25.00%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
e333bf606f pan/midgard: Dump "no spill"?
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
cc3df917d3 pan/midgard: Absorb nonexistance sources
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
0a7cc239bd pan/midgard: Pretty-print destinations
They're not "sources" but they follow the same conventions.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
ba8ec19a64 pan/midgard: Pretty-print units
Since we are seeing some use of MIR post-scheduling, let's get this
printed right.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
73f54f286a pan/midgard: Print mask in dumped MIR
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
2ec4f9a74b pan/midgard: Add no_spill flag
Hint for the RA to avoid infinite spilling loops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00