iris: Perform load_constant address math in 32-bit rather than 64-bit

We lower NIR's load_constant to load_global_constant, which uses A64 bindless messages. As such, we do the following math to produce the address for each load: base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH base@64 <- pack_64_2x32_split(base_lo, base_hi) addr@64 <- iadd(base@64, u2u64(offset@32)) On platforms that emulate 64-bit math, we have to emit additional code for the 64-bit iadd to handle the possibility of a carry happening and affecting the top bits. However, NIR constant data is always uploaded adjacent to the shader assembly, in the same buffer. These buffers are required to live in a 4GB region of memory starting at Instruction State Base Address. We always place the base address at a 4GB address. So the constant data always lives in a buffer entirely contained within a 4GB region, which means any offsets from the start of the buffer cannot possibly affect the high bits. So instead, we can simply do a 32-bit addition between the low bits of the base and the offset, then pack that with the unchanged high bits. On iris, IRIS_MEMZONE_SHADER is at [0, 4GB) so the high bits are always zero. We don't even need to patch that portion of the address and can simply use u2u64 to promote the 32-bit add result to a 64-bit value where the top bits are 0. shader-db on Icelake indicates that this: - Helps instructions: -1.13% in 135 affected programs - Helps spills/fills: -4.08% / -4.18% in 4 affected programs - Gains us 1 SIMD16 compute shader instead of SIMD8 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
2025-12-24 19:40:10 +01:00 · 2023-01-23 11:57:18 -08:00 · 2023-01-23 11:57:18 -08:00 · a0e7e7ff41
commit a0e7e7ff41
parent 95d06343c6
1 changed files with 10 additions and 5 deletions
--- a/src/gallium/drivers/iris/iris_program.c
+++ b/src/gallium/drivers/iris/iris_program.c
@ -532,13 +532,18 @@ iris_setup_uniforms(ASSERTED const struct intel_device_info *devinfo,
            unsigned max_offset = b.shader->constant_data_size - load_size;
            offset = nir_umin(&b, offset, nir_imm_int(&b, max_offset));

-            nir_ssa_def *const_data_base_addr = nir_pack_64_2x32_split(&b,
-               nir_load_reloc_const_intel(&b, BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW),
-               nir_load_reloc_const_intel(&b, BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH));
+            /* Constant data lives in buffers within IRIS_MEMZONE_SHADER
+             * and cannot cross that 4GB boundary, so we can do the address
+             * calculation with 32-bit adds.  Also, we can ignore the high
+             * bits because IRIS_MEMZONE_SHADER is in the [0, 4GB) range.
+             */
+            assert(IRIS_MEMZONE_SHADER_START >> 32 == 0ull);
+
+            nir_ssa_def *const_data_addr =
+               nir_iadd(&b, nir_load_reloc_const_intel(&b, BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW), offset);

            nir_ssa_def *data =
-               nir_load_global_constant(&b, nir_iadd(&b, const_data_base_addr,
-                                                     nir_u2u64(&b, offset)),
+               nir_load_global_constant(&b, nir_u2u64(&b, const_data_addr),
                                        load_align,
                                        intrin->dest.ssa.num_components,
                                        intrin->dest.ssa.bit_size);