mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2025-12-24 19:40:10 +01:00
iris: Perform load_constant address math in 32-bit rather than 64-bit
We lower NIR's load_constant to load_global_constant, which uses A64 bindless messages. As such, we do the following math to produce the address for each load: base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH base@64 <- pack_64_2x32_split(base_lo, base_hi) addr@64 <- iadd(base@64, u2u64(offset@32)) On platforms that emulate 64-bit math, we have to emit additional code for the 64-bit iadd to handle the possibility of a carry happening and affecting the top bits. However, NIR constant data is always uploaded adjacent to the shader assembly, in the same buffer. These buffers are required to live in a 4GB region of memory starting at Instruction State Base Address. We always place the base address at a 4GB address. So the constant data always lives in a buffer entirely contained within a 4GB region, which means any offsets from the start of the buffer cannot possibly affect the high bits. So instead, we can simply do a 32-bit addition between the low bits of the base and the offset, then pack that with the unchanged high bits. On iris, IRIS_MEMZONE_SHADER is at [0, 4GB) so the high bits are always zero. We don't even need to patch that portion of the address and can simply use u2u64 to promote the 32-bit add result to a 64-bit value where the top bits are 0. shader-db on Icelake indicates that this: - Helps instructions: -1.13% in 135 affected programs - Helps spills/fills: -4.08% / -4.18% in 4 affected programs - Gains us 1 SIMD16 compute shader instead of SIMD8 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
This commit is contained in:
parent
95d06343c6
commit
a0e7e7ff41
1 changed files with 10 additions and 5 deletions
|
|
@ -532,13 +532,18 @@ iris_setup_uniforms(ASSERTED const struct intel_device_info *devinfo,
|
|||
unsigned max_offset = b.shader->constant_data_size - load_size;
|
||||
offset = nir_umin(&b, offset, nir_imm_int(&b, max_offset));
|
||||
|
||||
nir_ssa_def *const_data_base_addr = nir_pack_64_2x32_split(&b,
|
||||
nir_load_reloc_const_intel(&b, BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW),
|
||||
nir_load_reloc_const_intel(&b, BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH));
|
||||
/* Constant data lives in buffers within IRIS_MEMZONE_SHADER
|
||||
* and cannot cross that 4GB boundary, so we can do the address
|
||||
* calculation with 32-bit adds. Also, we can ignore the high
|
||||
* bits because IRIS_MEMZONE_SHADER is in the [0, 4GB) range.
|
||||
*/
|
||||
assert(IRIS_MEMZONE_SHADER_START >> 32 == 0ull);
|
||||
|
||||
nir_ssa_def *const_data_addr =
|
||||
nir_iadd(&b, nir_load_reloc_const_intel(&b, BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW), offset);
|
||||
|
||||
nir_ssa_def *data =
|
||||
nir_load_global_constant(&b, nir_iadd(&b, const_data_base_addr,
|
||||
nir_u2u64(&b, offset)),
|
||||
nir_load_global_constant(&b, nir_u2u64(&b, const_data_addr),
|
||||
load_align,
|
||||
intrin->dest.ssa.num_components,
|
||||
intrin->dest.ssa.bit_size);
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue