mesa/src
Christian Gmeiner cb3ac95d03 etnaviv: nir: improve uniform usage for ALU opc
The current code in lower_alu(..) counts how many const values
are used by one ALU opc. If there are used more then one the
compiler tries to fix this issues by e.g. resolve with a single
combined const src.

We are doing this as some GPUs only allow one const src per
ISA instruction. But it is allowed to use the same const for
multiple srcs.

Lets have a closer look at a real world shader:

impl main {
        /* preds: */
        vec1 32 ssa_0 = load_const (0x3f800000 = 1.000000)
        vec1 32 ssa_1 = load_const (0x00000000 = 0.000000)
        vec4 32 ssa_2 = intrinsic load_uniform (ssa_1) (base=0, range=1, dest_type=bool32 /*38*/)       /* u_var */
        vec1 32 ssa_4 = fmul ssa_2.x, ssa_2.y
        vec1 32 ssa_11 = load_const (0x00000000 = 0.000000)
        vec1 32 ssa_13 = seq ssa_2.w, ssa_11
        vec1 32 ssa_6 = fmul ssa_2.z, ssa_13
        vec1 32 ssa_7 = fmul ssa_4, ssa_6
        vec1 32 ssa_9 = deref_var &gl_FragColor (shader_out vec4)
        vec4 32 ssa_10 = vec4 ssa_7, ssa_7, ssa_7, ssa_0
        intrinsic store_deref (ssa_9, ssa_10) (wrmask=xyzw /*15*/, access=0)
        /* succs: block_1 */
        block block_1:
}

The current compiler transforms it to:

impl main {
        block block_0:
        /* preds: */
        vec1 32 ssa_0 = load_const (0x3f800000 = 1.000000)
        vec4 32 ssa_14 = load_const (0x00000000, 0x00000001, 0x00000002, 0x00000003) = (0.000000, 0.000000, 0.000000, 0.000000)
        vec2 32 ssa_15 = load_const (0x00000000, 0x00000001) = (0.000000, 0.000000)
        vec1 32 ssa_4 = fmul ssa_15.x, ssa_15.y
        vec2 32 ssa_16 = load_const (0x00000003, 0x00000000) = (0.000000, 0.000000)
        vec1 32 ssa_13 = seq ssa_16.x, ssa_16.y
        vec1 32 ssa_6 = fmul ssa_14.z, ssa_13
        vec1 32 ssa_7 = fmul ssa_4, ssa_6
        vec1 32 ssa_9 = deref_var &gl_FragColor (shader_out vec4)
        vec1 32 ssa_17 = mov ssa_0
        vec4 32 ssa_10 = vec4 ssa_7, ssa_7, ssa_7, ssa_17
        intrinsic store_deref (ssa_9, ssa_10) (wrmask=xyzw /*15*/, access=0)
        /* succs: block_1 */
        block block_1:
}

There is no need to create ssa_15 as we can use ssa_14 for the first fmul.

With this change the compiler creates the following shader:

impl main {
        block block_0:
        /* preds: */
        vec1 32 ssa_0 = load_const (0x3f800000 = 1.000000)
        vec4 32 ssa_14 = load_const (0x00000000, 0x00000001, 0x00000002, 0x00000003) = (0.000000, 0.000000, 0.000000, 0.000000)
        vec1 32 ssa_4 = fmul ssa_14.x, ssa_14.y
        vec2 32 ssa_15 = load_const (0x00000003, 0x00000000) = (0.000000, 0.000000)
        vec1 32 ssa_13 = seq ssa_15.x, ssa_15.y
        vec1 32 ssa_6 = fmul ssa_14.z, ssa_13
        vec1 32 ssa_7 = fmul ssa_4, ssa_6
        vec1 32 ssa_9 = deref_var &gl_FragColor (shader_out vec4)
        vec1 32 ssa_16 = mov ssa_0
        vec4 32 ssa_10 = vec4 ssa_7, ssa_7, ssa_7, ssa_16
        intrinsic store_deref (ssa_9, ssa_10) (wrmask=xyzw /*15*/, access=0)
        /* succs: block_1 */
        block block_1:
}

This change reduces immediate pressure and reduces spend CPU cycles.

No piglit or deqp regression seen.

shader-db results for GC2000:

total instructions in shared programs: 955128 -> 955128 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0

total temps in shared programs: 85689 -> 85689 (0.00%)
temps in affected programs: 0 -> 0
helped: 0
HURT: 0

total immediates in shared programs: 155428 -> 155240 (-0.12%)
immediates in affected programs: 1840 -> 1652 (-10.22%)
helped: 34
HURT: 1
helped stats (abs) min: 4 max: 16 x̄: 5.65 x̃: 4
helped stats (rel) min: 2.94% max: 33.33% x̄: 16.92% x̃: 16.67%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 14.29% max: 14.29% x̄: 14.29% x̃: 14.29%
95% mean confidence interval for immediates value: -6.57 -4.17
95% mean confidence interval for immediates %-change: -19.83% -12.23%
Immediates are helped.

total loops in shared programs: 0 -> 0
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Total CPU time (seconds): 102.55 -> 96.35 (-6.05%)

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23323>
2023-05-31 09:19:29 +00:00
..
amd Fix DGC bug where indirect count > maxSequencesCount. 2023-05-31 07:49:54 +00:00
android_stub util/log: improve logger_android 2023-02-22 17:55:40 +00:00
asahi asahi: Reformat using the new style 2023-05-29 21:06:12 +00:00
broadcom v3dv: Update texture padding logic to match v3d changes 2023-05-31 05:27:08 +00:00
c11
compiler nir/print: Print locations for geometry shader inputs 2023-05-30 16:25:07 -04:00
drm-shim drm-shim: Use anonymous file for file override 2023-05-16 04:31:22 +00:00
egl meson: remove needless c++17-overrides 2023-05-19 12:45:31 +00:00
etnaviv mesa/main: drop use_legacy_math_rules 2023-05-04 06:11:44 +00:00
freedreno freedreno/drm: Don't try to export suballoc bo 2023-05-30 21:37:12 +00:00
gallium etnaviv: nir: improve uniform usage for ALU opc 2023-05-31 09:19:29 +00:00
gbm gbm: drop unnecessary vulkan dependency 2023-02-23 18:31:22 +00:00
getopt
glx glx: fix build with APPLEGL 2023-05-15 03:50:30 +00:00
gtest gtest: Update to 1.13.0 2023-05-14 11:09:02 +00:00
imagination pvr: Fix page faults in occlusion query tests 2023-05-30 10:53:41 +00:00
imgui
intel intel/dev: switch defect identifiers to use lineage numbers 2023-05-30 22:13:41 +00:00
loader loader/dri3: temporarily work around a crash when front is NULL 2023-05-18 06:25:46 +00:00
mapi mesa: Add EXT_instanced_arrays support 2023-04-11 10:22:35 +00:00
mesa treewide: Use nir_replicate 2023-05-30 16:24:21 -04:00
microsoft treewide: Use nir_replicate 2023-05-30 16:24:21 -04:00
nouveau treewide: Avoid nir_lower_regs_to_ssa calls 2023-05-24 17:30:03 +00:00
panfrost pan/lower_framebuffer: Use nir_replicate 2023-05-30 16:24:21 -04:00
tool meson: remove needless c++17-overrides 2023-05-19 12:45:31 +00:00
util anv: override vendorID for Cyberpunk 2077 2023-05-30 01:05:36 -07:00
virtio venus: enable VK_EXT_image_2d_view_of_3d 2023-05-30 22:52:12 +00:00
vulkan vulkan: use cmd size array for queued cmd allocations 2023-05-31 03:13:22 +00:00
.clang-format treewide: Add a .clang-format file 2023-05-29 21:06:12 +00:00
meson.build hgl: remove 2023-02-18 00:44:43 +00:00