mesa/src/panfrost/bifrost
Alyssa Rosenzweig 3fedf22b60 pan/bi: Tune lower_vars_to_scratch
Increase the threshold to lower indirect indexing of arrays to scratch memory
all the way up to 256 bytes, which was the lowest power-of-two threshold for
which enabling the pass on Mali-G57 was a win in shaderdb.

It's difficult to tell what threshold is optimal here. The shader-db stats are
based on a rough cycle model that assumes a 16:1 ratio between CVT and
load/store on Valhall, and a 24:1 ratio between arithmetic and load/store on
Bifrost. Those ratios are at most rules of thumb, as the number of cycles
required by a load/store instruction will vary tremendously based on caching and
the memory controller. However, they may well be lower bounds (if those are the
upper bounds on instruction issuing in the Mali shader cores). As such, a large
threshold seems well motivated.

shader-db results on Mali-G52 follow, results on Mali-G57 were similar. Note the
shader that's hurt for spills/fills is *helped* for load/store overall.

cycles helped: 129 -> 98 (-24.03%) (spills: 17 -> 20 (17.65%); fills: 34 -> 40 (17.65%))
ldst helped: 129 -> 98 (-24.03%) (spills: 17 -> 20 (17.65%); fills: 34 -> 40 (17.65%))

total instructions in shared programs: 2415410 -> 2415372 (<.01%)
instructions in affected programs: 1041 -> 1003 (-3.65%)
helped: 3
HURT: 0
helped stats (abs) min: 2.0 max: 31.0 x̄: 12.67 x̃: 5
helped stats (rel) min: 2.08% max: 6.02% x̄: 3.90% x̃: 3.60%

total tuples in shared programs: 1928558 -> 1928527 (<.01%)
tuples in affected programs: 826 -> 795 (-3.75%)
helped: 2
HURT: 1
helped stats (abs) min: 6.0 max: 26.0 x̄: 16.00 x̃: 16
helped stats (rel) min: 3.72% max: 9.68% x̄: 6.70% x̃: 6.70%
HURT stats (abs)   min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.54% max: 1.54% x̄: 1.54% x̃: 1.54%

total clauses in shared programs: 355013 -> 354981 (<.01%)
clauses in affected programs: 220 -> 188 (-14.55%)
helped: 3
HURT: 0
helped stats (abs) min: 2.0 max: 27.0 x̄: 10.67 x̃: 3
helped stats (rel) min: 13.99% max: 21.43% x̄: 16.93% x̃: 15.38%

total cycles in shared programs: 166610.27 -> 166574.90 (-0.02%)
cycles in affected programs: 138 -> 102.62 (-25.63%)
helped: 3
HURT: 0
helped stats (abs) min: 0.4583330000000001 max: 31.0 x̄: 11.79 x̃: 3
helped stats (rel) min: 15.28% max: 65.28% x̄: 34.86% x̃: 24.03%

total arith in shared programs: 73690.13 -> 73690.58 (<.01%)
arith in affected programs: 29.71 -> 30.17 (1.54%)
helped: 1
HURT: 2
helped stats (abs) min: 0.0833339999999998 max: 0.0833339999999998 x̄: 0.08 x̃: 0
helped stats (rel) min: 3.85% max: 3.85% x̄: 3.85% x̃: 3.85%
HURT stats (abs)   min: 0.125 max: 0.4166659999999993 x̄: 0.27 x̃: 0
HURT stats (rel)   min: 1.66% max: 5.17% x̄: 3.42% x̃: 3.42%

total ldst in shared programs: 135611 -> 135571 (-0.03%)
ldst in affected programs: 138 -> 98 (-28.99%)
helped: 3
HURT: 0
helped stats (abs) min: 3.0 max: 31.0 x̄: 13.33 x̃: 6
helped stats (rel) min: 24.03% max: 100.00% x̄: 74.68% x̃: 100.00%

total quadwords in shared programs: 1674599 -> 1674523 (<.01%)
quadwords in affected programs: 838 -> 762 (-9.07%)
helped: 3
HURT: 0
helped stats (abs) min: 2.0 max: 65.0 x̄: 25.33 x̃: 9
helped stats (rel) min: 3.39% max: 15.00% x̄: 9.14% x̃: 9.04%

total spills in shared programs: 37 -> 40 (8.11%)
spills in affected programs: 17 -> 20 (17.65%)
helped: 0
HURT: 1

total fills in shared programs: 190 -> 196 (3.16%)
fills in affected programs: 34 -> 40 (17.65%)
helped: 0
HURT: 1

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
..
test pan/bi: Constant fold MKVEC.v2i8 2022-06-21 22:42:34 +00:00
valhall pan/va: Replace MKVEC.v4i8 with MKVEC.v2i8 2022-06-21 22:42:34 +00:00
bi_builder.h.py pan/bi: Implement fquantize2f16 2022-04-25 16:29:31 +00:00
bi_helper_invocations.c pan/bi: Export helper termination analysis 2022-06-01 16:14:38 +00:00
bi_layout.c pan/bi: Rename bi_block->name to bi_block->index 2022-05-03 17:56:16 +00:00
bi_liveness.c pan/bi: Remove liveness metadata tracking 2022-05-19 16:08:26 +00:00
bi_lower_divergent_indirects.c pan/bi: Add divergent intrinsic lowering pass 2021-05-07 18:20:30 +00:00
bi_lower_swizzle.c pan/bi: Scalarize bi_lower_swizzle 2022-05-19 16:08:26 +00:00
bi_lower_xfb.c pan/bi: Add transform feedback lowering pass 2022-06-04 14:35:56 +00:00
bi_opcodes.c.py pan/bi: Track instruction size in opcode table 2021-06-15 20:27:22 +00:00
bi_opcodes.h.py pan/bi: Make some headers compilable with C++ 2021-11-08 19:02:01 +00:00
bi_opt_constant_fold.c pan/bi: Constant fold MKVEC.v2i8 2022-06-21 22:42:34 +00:00
bi_opt_copy_prop.c pan/bi: Optimize split of collect 2022-05-19 16:08:26 +00:00
bi_opt_cse.c pan/bi: Allow CSEing LEA_BUF_IMM 2022-05-25 15:51:15 +00:00
bi_opt_dce.c pan/bi: Mark bi_postra_liveness_ins as MUST_CHECK 2022-06-21 22:19:59 +00:00
bi_opt_dual_tex.c pan/bi: Add dual texture fusing pass 2021-11-12 16:30:02 +00:00
bi_opt_message_preload.c pan/bi: Simplify register precolouring in the IR 2022-05-19 16:08:26 +00:00
bi_opt_mod_props.c pan/bi: Fuse result types 2022-05-27 12:14:22 +00:00
bi_opt_push_ubo.c pan/bi: Create COLLECT during isel 2022-05-19 16:08:26 +00:00
bi_pack.c pan/bi: Implement fquantize2f16 2022-04-25 16:29:31 +00:00
bi_packer.c.py pan/bi: Use consistent modifier lists in packing 2022-03-25 19:00:13 +00:00
bi_pressure_schedule.c pan/bi: Schedule for pressure pre-RA 2022-05-25 14:40:12 +00:00
bi_print.c pan/bi: Use a dynarray for predecessors 2022-05-03 17:56:16 +00:00
bi_print_common.c pan/bi: Move modifier prints out of common code 2020-12-23 12:48:06 -05:00
bi_print_common.h pan/bi: Move modifier prints out of common code 2020-12-23 12:48:06 -05:00
bi_printer.c.py pan/bi: Print flow control on instructions 2022-06-01 16:14:38 +00:00
bi_quirks.h pan/bi: Assume future Valhall is 16-wide warps 2022-01-28 17:47:46 +00:00
bi_ra.c pan/bi: Rework Valhall register alignment 2022-06-02 17:13:16 +00:00
bi_schedule.c pan/bi: Extract MUX to CSEL optimization 2022-06-06 16:08:25 +00:00
bi_scoreboard.c pan/bi: Use a dynarray for predecessors 2022-05-03 17:56:16 +00:00
bi_test.h pan/bi: Extract bit_block helper 2022-06-01 16:14:38 +00:00
bi_validate.c pan/bi: Validate vector widths 2022-05-19 16:08:26 +00:00
bifrost.h pan/bi: Schedule for pressure pre-RA 2022-05-25 14:40:12 +00:00
bifrost_compile.c pan/bi: Tune lower_vars_to_scratch 2022-06-21 22:42:34 +00:00
bifrost_compile.h gallium/drivers: set force_indirect_unrolling_sampler for all required drivers 2022-05-17 02:12:21 +00:00
bifrost_isa.py pan/bi: Imply round mode most of the time 2022-04-07 18:03:57 +00:00
bifrost_nir.h pan/bi: Add transform feedback lowering pass 2022-06-04 14:35:56 +00:00
bifrost_nir_algebraic.py pan/bi: Switch to lower_bool_to_bitsize 2022-02-19 03:02:10 +00:00
bir.c pan/bi: Extract MUX to CSEL optimization 2022-06-06 16:08:25 +00:00
cmdline.c panfrost,panvk: Make fixed_sysval_ubo < 0 mean compiler-assigned 2022-05-12 10:53:15 +00:00
compiler.h pan/bi: Constify bi_is_staging_src argument 2022-06-21 22:19:59 +00:00
disassemble.c pan/bi: Fix format specifiers in disassembler 2021-08-25 20:03:08 +00:00
disassemble.h pan/bi: Print the clause of branch targets 2021-08-01 13:04:20 +00:00
gen_disasm.py pan/bi: Make disassembler build reproducibly 2022-03-05 14:55:00 -05:00
ISA.xml pan/bi: Model MKVEC.v2i8 2022-06-21 22:42:34 +00:00
meson.build pan/va: Unit test va_mark_last 2022-06-21 22:19:59 +00:00
nodearray.h pan/bi: Add nodearray datastructure 2022-06-02 17:13:16 +00:00
Notes.txt pan/bi: Move notes on ADD ops to notes file 2020-03-03 00:03:50 +00:00
README.md pan/bi: Document register conventions 2021-04-03 12:47:29 -04:00

Bifrost compiler

Register file

Defined partially in software, partially in hardware.

Blend shaders

R0 - R3: input (color #0) R4 - R7: input (color #1) R8 - R15: general purpose R48: return address

Fragment

Anything live during BLEND must respect blend shader registers.

R0 - R3: preloaded (message #0) R4 - R7: preloaded (message #1) R57 - R63: preloaded (various)

R0 - R15: general purpose (full threads) R48 - R63: general purpose (full threads)

R32 - R47: general purpose (half threads, or v6)