mesa/src/gallium/drivers/iris
Kenneth Graunke a0e7e7ff41 iris: Perform load_constant address math in 32-bit rather than 64-bit
We lower NIR's load_constant to load_global_constant, which uses A64
bindless messages.  As such, we do the following math to produce the
address for each load:

   base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW
   base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH
   base@64 <- pack_64_2x32_split(base_lo, base_hi)
   addr@64 <- iadd(base@64, u2u64(offset@32))

On platforms that emulate 64-bit math, we have to emit additional code
for the 64-bit iadd to handle the possibility of a carry happening and
affecting the top bits.

However, NIR constant data is always uploaded adjacent to the shader
assembly, in the same buffer.  These buffers are required to live in a
4GB region of memory starting at Instruction State Base Address.  We
always place the base address at a 4GB address.  So the constant data
always lives in a buffer entirely contained within a 4GB region, which
means any offsets from the start of the buffer cannot possibly affect
the high bits.

So instead, we can simply do a 32-bit addition between the low bits of
the base and the offset, then pack that with the unchanged high bits.

On iris, IRIS_MEMZONE_SHADER is at [0, 4GB) so the high bits are always
zero.  We don't even need to patch that portion of the address and can
simply use u2u64 to promote the 32-bit add result to a 64-bit value
where the top bits are 0.

shader-db on Icelake indicates that this:
- Helps instructions: -1.13% in 135 affected programs
- Helps spills/fills: -4.08% / -4.18% in 4 affected programs
- Gains us 1 SIMD16 compute shader instead of SIMD8

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
2023-02-02 02:45:04 +00:00
..
driinfo_iris.h driconf/Intel: Add lower_depth_range_rate option workaround for Homerun Clash misrendering issue 2022-09-19 10:08:48 +00:00
iris_batch.c intel/ds: Fix crash when allocating more intel_ds_queues than u_vector was initialized 2023-02-01 18:31:29 +00:00
iris_batch.h intel/ds: Fix crash when allocating more intel_ds_queues than u_vector was initialized 2023-02-01 18:31:29 +00:00
iris_binder.c iris: Add BO_ALLOC_PLAIN flag 2022-12-19 05:37:34 -08:00
iris_binder.h iris: Use more efficient binding table pointer formats on Icelake+. 2022-03-09 09:18:59 +00:00
iris_blit.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_blorp.c intel/utrace: make blorp tracepoints more readable 2022-09-21 12:38:34 +00:00
iris_border_color.c iris: Add BO_ALLOC_PLAIN flag 2022-12-19 05:37:34 -08:00
iris_bufmgr.c iris: Make iris_bo_export_gem_handle() static 2022-12-23 18:22:29 +00:00
iris_bufmgr.h iris: Make iris_bo_export_gem_handle() static 2022-12-23 18:22:29 +00:00
iris_clear.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_context.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_context.h iris: add restrictions for 3DSTATE_RASTER::AntiAliasingEnable 2023-01-20 12:50:04 +00:00
iris_defines.h intel: Rename genx keyword to gfxx in source files 2021-04-02 18:33:07 +00:00
iris_disk_cache.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_draw.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_fence.c mesa: fix SignalSemaphoreEXT behavior 2022-07-10 16:15:17 +00:00
iris_fence.h iris: signal the syncobj after a failed batch 2021-09-07 19:03:03 +00:00
iris_fine_fence.c iris: Don't flush the render cache for a compute batch 2023-01-20 11:09:24 +00:00
iris_fine_fence.h
iris_formats.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_genx_macros.h iris: Rename bo->gtt_offset to bo->address 2021-08-11 08:05:00 +00:00
iris_genx_protos.h iris: Add genX(emit_depth_state_workarounds) 2021-08-20 17:50:35 +00:00
iris_measure.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_measure.h gallium: rename pipe_draw_start_count -> pipe_draw_start_count_bias 2021-04-30 03:59:19 +00:00
iris_monitor.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_monitor.h
iris_perf.c iris: Fix more BO alignments 2022-09-22 03:33:00 +00:00
iris_perf.h intel: Rename gen_perf prefix to intel_perf in source files 2021-04-20 20:06:34 +00:00
iris_performance_query.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_pipe.h gallium/iris/crocus: collapse a bunch of conversion functions. 2022-08-04 08:17:39 +00:00
iris_pipe_control.c iris: Don't flush the render cache for a compute batch 2023-01-20 11:09:24 +00:00
iris_program.c iris: Perform load_constant address math in 32-bit rather than 64-bit 2023-02-02 02:45:04 +00:00
iris_program_cache.c iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_query.c iris: Don't flush the render cache for a compute batch 2023-01-20 11:09:24 +00:00
iris_resolve.c iris: Check for zero in clear color compatibility fn 2022-12-15 21:20:37 +00:00
iris_resource.c iris: let isl set tiling mode for external resources 2023-01-09 22:38:29 +00:00
iris_resource.h iris: Delete map->dest_had_defined_contents 2022-12-09 21:46:03 +00:00
iris_screen.c intel: Add kmd_type parameter to necessary intel_gem.h functions 2023-01-25 09:17:15 -08:00
iris_screen.h iris: Store intel_device_info in iris_bufmgr 2022-12-15 18:55:02 +00:00
iris_state.c intel: enable existing workaround for ICL platform 2023-02-01 11:09:19 +00:00
iris_utrace.c intel/ds: Fix crash when allocating more intel_ds_queues than u_vector was initialized 2023-02-01 18:31:29 +00:00
iris_utrace.h iris: utrace/perfetto support 2022-01-14 20:17:44 +00:00
meson.build iris: utrace/perfetto support 2022-01-14 20:17:44 +00:00