intel/rt: Don't directly generate umul_32x16

The optimization pass will (eventually) turn the imul into a
umul_32x16. In many cases, the multiply will be converted to something
else.

I also tried cloning a bunch of existing imul algebraic patterns for
[iu]mul_32x16. This produced the same result, but it was a lot more
churn.

All of the shaders affected were ray tracing shaders in Q2RTX. This is
the only ray tracing workload in my fossil-db.

DG2
Totals:
Instrs: 191995626 -> 191995079 (-0.00%); split: -0.00%, +0.00%
Cycles: 14003803561 -> 14003798040 (-0.00%); split: -0.00%, +0.00%
Spill count: 108320 -> 108288 (-0.03%)
Fill count: 200695 -> 200663 (-0.02%)
Scratch Memory Size: 8755200 -> 8754176 (-0.01%)

Totals from 7 (0.00% of 652118) affected shaders:
Instrs: 14998 -> 14451 (-3.65%); split: -3.94%, +0.29%
Cycles: 137222 -> 131701 (-4.02%); split: -4.10%, +0.07%
Spill count: 32 -> 0 (-inf%)
Fill count: 32 -> 0 (-inf%)
Scratch Memory Size: 19456 -> 18432 (-5.26%)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27161>
This commit is contained in:
Ian Romanick 2024-01-16 20:00:40 -08:00 committed by Marge Bot
parent bc0178af57
commit 118e0bdc1f

View file

@ -204,7 +204,7 @@ lower_shader_trace_ray_instr(struct nir_builder *b, nir_instr *instr, void *data
nir_def *hit_sbt_stride_B =
nir_load_ray_hit_sbt_stride_intel(b);
nir_def *hit_sbt_offset_B =
nir_umul_32x16(b, sbt_offset, nir_u2u32(b, hit_sbt_stride_B));
nir_imul(b, sbt_offset, nir_u2u32(b, hit_sbt_stride_B));
nir_def *hit_sbt_addr =
nir_iadd(b, nir_load_ray_hit_sbt_addr_intel(b),
nir_u2u64(b, hit_sbt_offset_B));
@ -213,7 +213,7 @@ lower_shader_trace_ray_instr(struct nir_builder *b, nir_instr *instr, void *data
nir_def *miss_sbt_stride_B =
nir_load_ray_miss_sbt_stride_intel(b);
nir_def *miss_sbt_offset_B =
nir_umul_32x16(b, miss_index, nir_u2u32(b, miss_sbt_stride_B));
nir_imul(b, miss_index, nir_u2u32(b, miss_sbt_stride_B));
nir_def *miss_sbt_addr =
nir_iadd(b, nir_load_ray_miss_sbt_addr_intel(b),
nir_u2u64(b, miss_sbt_offset_B));