fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	a90edad9f7	intel/brw: Fix generate_mov_indirect to check has_64bit_int not float We are overriding the type to Q/UQ, so we need to split to two MOVs if 64-bit integer math is not supported. For reference, Meteorlake does support 64-bit floats but would still not work correctly here. See also brw_broadcast(), which does similar indirects but correctly checks has_64bit_int instead of has_64bit_float. Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>	2024-04-02 00:00:59 +00:00
Caio Oliveira	dae9795628	intel/brw: Remove vestiges of sources on IF opcode, only valid on Gfx6 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>	2024-03-29 22:44:01 +00:00
Rohan Garg	48376ac3b8	intel/brw: Cleanup send generation Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28306>	2024-03-20 15:46:44 -07:00
Kenneth Graunke	97bf3d3b2d	intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND There's no need for special handling here, it's just a send message with a trivial g0 header and descriptor. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>	2024-03-05 11:16:20 +00:00
Caio Oliveira	865ef36609	intel/brw: Remove brw_shader.h Find a better home for its existing content. Some functions are now just static functions at the usage sites. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>	2024-02-29 19:28:06 +00:00
Caio Oliveira	803a1a5ada	intel/brw: Remove automatic_exec_sizes As Ken describes: "This was only used by legacy SF/Clip/FFGS programs." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	dae59e7078	intel/brw: Remove runtime_check_aads_emit It was used for Gfx4 payload. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	8f3c52c1da	intel/brw: Remove MRF type Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	5c93a0e125	intel/brw: Remove Gfx8- remaining opcodes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	99d41ca90d	intel/brw: Remove Gfx4-5 manual compression selection Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	f321e555b6	intel/brw: Remove Gfx8- code from EU emission Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	9569ea82a8	intel/brw: Remove Gfx8- code from generator Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Ian Romanick	8fb37ef985	intel/fs: Add fast path for ballot(true) This doesn't help very much now. A later commit adds a NIR optimization pass, tentatively called nir_opt_uniform_subgroup, that converts many kinds of subgroup operations to things involving bitCount(ballot(true)). This commit makes a huge difference in the results of that later commit. No shader-db changes on any Intel platform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165558033 -> 165557519 (-0.00%) Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00% Totals from 299 (0.05% of 656117) affected shaders: Instrs: 88293 -> 87779 (-0.58%) Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03% v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 08:37:46 -08:00
Sagar Ghuge	6f0ab5e4d5	intel/compiler: Add texture gather offset LOD/Bias message support v2: (Ian) - Space formatting on conditional statement Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>	2024-02-27 00:22:46 +00:00
Sagar Ghuge	79af0ac29a	intel/compiler: Add gather4_i/l/[_c]/b sampler message v2: (Ian) - Format comment Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>	2024-02-27 00:22:46 +00:00
Kenneth Graunke	1497f4e0c2	intel/fs: Don't include sync.nop in instruction count statistics With the advent of software scoreboarding, we emit sync instructions in various places to synchronize the execution pipelines. This results in assembly being littered with a bunch of sync.nop instructions. That means that when you reorder anything in the program, the scoreboarding changes, and the number of sync.nops can vary wildly - even if the code isn't really materially better or worse. This makes it hard to use tools like shader-db or fossil-db on post-Icelake platforms. For now, exclude sync.nops from the instruction count statistic. One day we may want to consider improving the software scoreboarding pass to emit fewer redundant sync.nop instructions, at which point tracking this as a separate stat might be useful. For now though, it's simply cluttering and confusing our results. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27701>	2024-02-20 22:26:09 +00:00
Caio Oliveira	468a0ffe9c	intel/compiler: Include brw_disasm_info.h where its used Acked-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27579>	2024-02-15 09:26:46 +00:00
Francisco Jerez	3e710a84ad	intel/fs: Set the default execution group to 0 when not representable by the platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27165>	2024-01-20 19:55:31 +00:00
Francisco Jerez	6e56a4b474	intel/compiler/xe2: Fix for the removal of AccWrCtrl. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26860>	2024-01-12 20:18:03 +00:00
Sviatoslav Peleshko	dbf6f0291a	intel/fs: Set group 0 for Wa_14010017096 MOV instruction We always set exec size to 16 for this MOV, but the execution group remains from the previous emitted instruction. This can cause emitting a group which violates PRM restriction for ChanOff: "The execution size (ExecSize) must be a factor of the chosen offset." Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>	2024-01-09 11:35:52 +00:00
Sviatoslav Peleshko	b5c0b90402	intel/compiler: Set flag reg to 0 when disabling predication Having the reg set with predication disabled shouldn't cause any problems during the execution. But when decompiling such instruction the flag won't be shown in the output, so the recompiling will cause functionally-identical but binary-different code. Fixing this makes disasm/asm testing easier. Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>	2024-01-09 11:35:52 +00:00
Sviatoslav Peleshko	4f41c44df2	intel/compiler: Add variable to dump binaries of all compiled shaders This can be useful for testing i965_disasm and i965_asm by comparing bin -> asm -> bin results. Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>	2024-01-09 11:35:51 +00:00
Ian Romanick	e666872c75	intel/compiler: Initial bits for DPAS instruction v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix overlapping register allocation (via has_source_and_destination_hazard). Fix incorrect destination register file encoding. v3: Prevent lower_regioning from trying to "fix" DPAS sources. v4: Add instruction latency information for scheduling and perf estimates. v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update the comment in fs_inst::has_source_and_destination_hazard. Suggested by Caio. v6: Add some comments near the src2 calculation in fs_inst::size_read. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:16 -08:00
Francisco Jerez	1eff2fcb62	intel/compiler: Add polygon count statistic to brw_compile_stats. And use it in ANV in order to return a "SIMDNxM" name from vkGetPipelineExecutablePropertiesKHR. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:30 +00:00
Rohan Garg	2444a3cd46	intel/compiler: migrate WA 14013672992 to use WA framework Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26006>	2023-11-02 16:39:25 +00:00
Kenneth Graunke	fddad4d5f9	intel/compiler: Assert that FS_OPCODE_[REP_]FB_WRITE is for pre-Gfx7 We use SHADER_OPCODE_SEND directly instead of FS_OPCODE_FB_WRITE (for a while now) and FS_OPCODE_REP_FB_WRITE (since the previous commit). Assert that it isn't used on Gfx7+. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Francisco Jerez	8b1dc77521	intel/fs/xe2+: Scale BRW_MAX_MSG_LENGTH by native register size. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Felix DeGrood	d04be9770b	intel/compiler: use shader source hash in shader dump code Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>	2023-07-20 09:08:08 +00:00
Lionel Landwerlin	3384f029be	intel/compiler: rework input parameters Use a struct for various common parameters rather than per stage structure or arguments to stage specific entrypoints. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>	2023-07-20 09:08:08 +00:00
Lionel Landwerlin	049c791a63	intel/fs: fix pull-constant-load prior to gfx7 In `ad9bc1ffb5` ("intel/fs: enable UBO accesses through bindless heap") we added a new source, we need to fixup the source index for the generator. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ad9bc1ffb5` ("intel/fs: enable UBO accesses through bindless heap") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23405>	2023-06-06 14:47:41 +00:00
Lionel Landwerlin	6d6877bf99	intel/fs: enable extended bindless surface offset Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	e09cfda0de	intel/fs: lower get_buffer_size like other logical sends This will also enable the use of the bindless heap. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:36 +00:00
Lionel Landwerlin	2acc2f18ea	intel/compiler: report max dispatch width statistic Most tools looking at shader stats assume that there is only a single resulting binary shader out of a single input. On Intel HW this is not always the case. So having a statistic on each variant that reports the maximum dispatch width helps showing improvement on a single shader in terms of how large we manage to compile it. For shaders that can be compiled in multiple SIMD width (like fragment shaders), this will report the maximum dispatch width in the statistics of each variants. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22014>	2023-03-21 11:53:04 +00:00
Sagar Ghuge	9a34b2ab0e	intel/compiler: Add swsb_stall debug option When enabled, on gfx12 plus, we will add the sync nop instruction after each instruction to make sure that current instruction depends on the previous instruction explicitly. This option will help us to get a hint if something is missing or broken in software scoreboard pass. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21797>	2023-03-10 06:55:39 +00:00
Kenneth Graunke	dfe652fb03	intel/eu: Simplify brw_F32TO16 and brw_F16TO32 Now that we aren't using them on Gfx8+ we can drop a lot of cruft. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	c590a3eadf	intel/fs: Move packHalf2x16 handling to lower_pack() This mainly lets the software scoreboarding pass correctly mark the instructions, without needing to resort to fragile manual handling in the generator. We can also make small improvements. On Gfx 8LP-12.0, we no longer have the restrictions about DWord alignment, so we can simply write each half into its intended location, rather than writing it to the low DWord and then shifting it in place. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Lionel Landwerlin	09cdb77a92	intel/fs: report max register pressure in shader stats Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21756>	2023-03-08 13:37:07 +00:00
Mark Janes	b96019f82b	intel/fs: use generated workaround helpers for Wa_14010017096 This workaround does not apply beyond gen 12.0. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21746>	2023-03-07 00:10:33 +00:00
Jason Ekstrand	714a291673	intel/compiler: Use SHADER_OPCODE_SEND for PI messages Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Lionel Landwerlin	13cca48920	intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7 We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more generic sends and drop this internal opcode. The idea behind this change is to allow bindless surfaces to be used for UBO pulls and why it's interesting to be able to reuse setup_surface_descriptors(). But that will come in a later change. No shader-db changes on TGL & DG2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>	2023-01-26 11:26:53 +00:00
Jason Ekstrand	c80c0ed943	intel/fs: Always use integer types for indirect MOVs There's a new Gen12.5 restriction which forbids using the VxH or Vx1 on the floating-point pipe. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Lionel Landwerlin	37b3601052	intel/fs: switch register allocation spilling to use LSC on Gfx12.5+ v2: drop the hardcoded inst->mlen=1 (Rohan) v3: Move back to LOAD/STORE messages (limited to SIMD16 for LSC) v4: Also use 4 GRFs transpose loads for fills (Curro) v5: Reduce amount of needed register to build per lane offsets (Curro) Drop some now useless SIMD32 code Unify unspill code Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>	2022-08-24 17:51:40 +00:00
Kenneth Graunke	9afd955353	intel/compiler: Delete unused Gfx8+ code in brw_find_live_channel() We now handle this in fs_visitor::lower_find_live_channel(). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17530>	2022-08-02 08:41:43 +00:00
Ian Romanick	bbcb881f46	intel/fs: Remove non-_LOGICAL URB messages The _LOGICAL versions are lowered direct to SEND, so nothing can ever generate these messages. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>	2022-07-08 19:45:34 +00:00
Kenneth Graunke	72e9843991	intel/compiler: Introduce a new brw_isa_info structure This structure will contain the opcode mapping tables in the next commit. For now, this is the mechanical change to plumb it into all the necessary places, and it continues simply holding devinfo. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Ian Romanick	676acfe956	intel/fs: Add missing synchronization for WaW dependency v2: Do the synchronization in the correct place. Noticed by Curro. Fixes: `b5fa43952a` ("intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Tested-by: Felix DeGrood <felix.j.degrood@intel.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17037>	2022-06-17 17:05:43 +00:00
Jason Ekstrand	dfedeccc13	intel: Only set VectorMaskEnable when needed For cases with lots of very small primitives, this may improve performance because we're not executing those dead channels all the time. Shader-db reports no instruction or cycle-count changes. However, by hacking up the driver to report when this optimization triggers, it appears to affect about 10% of shader-db. v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now, because using VMask on those platforms allows us to perform the eliminate_find_live_channel() optimization. However, XeHP doesn't seem to have packed fragment shader dispatch, so we lose that optimization regardless, and there's no reason not to avoid vmask. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>	2022-05-27 21:52:48 +00:00
Kenneth Graunke	9886615958	intel/compiler: Move spill/fill tracking to the register allocator Originally, we had virtual opcodes for scratch access, and let the generator count spills/fills separately from other sends. Later, we started using the generic SHADER_OPCODE_SEND for spills/fills on some generations of hardware, and simply detected stateless messages there. But then we started using stateless messages for other things: - anv uses stateless messages for the buffer device address feature. - nir_opt_large_constants generates stateless messages. - XeHP curbe setup can generate stateless messages. So counting stateless messages is not accurate. Instead, we move the spill/fill accounting to the register allocator, as it generates such things, as well as the load/store_scratch intrinsic handling, as those are basically spill/fills, just at a higher level. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>	2022-05-25 06:56:01 +00:00
Ian Romanick	b5fa43952a	intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT I noticed that a LOT of fragment shaders in Shadow of the Tomb Raider, for instance, end up with a sequence of NIR like: vec1 32 ssa_2 = load_const (0x00000000 = 0.000000) ... vec1 32 ssa_191 = pack_half_2x16_split ssa_188, ssa_2 vec1 32 ssa_192 = pack_half_2x16_split ssa_189, ssa_2 vec1 32 ssa_193 = pack_half_2x16_split ssa_190, ssa_2 This results in an assembly sequence like: mov(8) g28<1>UD 0x00000000UD mov(8) g21<2>HF g28<8,8,1>F shl(8) g21<1>UD g21<8,8,1>UD 0x00000010UD mov(8) g21<2>HF g25<8,8,1>F mov(8) g19<2>HF g28<8,8,1>F shl(8) g19<1>UD g19<8,8,1>UD 0x00000010UD mov(8) g19<2>HF g23<8,8,1>F mov(8) g20<2>HF g28<8,8,1>F shl(8) g20<1>UD g20<8,8,1>UD 0x00000010UD mov(8) g20<2>HF g24<8,8,1>F After this commit, the generated assembly is: mov(8) g21<1>UD 0x00000000UD mov(8) g21<2>HF g23<8,8,1>F mov(8) g19<1>UD 0x00000000UD mov(8) g19<2>HF g17<8,8,1>F mov(8) g20<1>UD 0x00000000UD mov(8) g20<2>HF g18<8,8,1>F Tiger Lake, Ice Lake, Skylake, and Haswell had similar results. (Ice Lake shown) total instructions in shared programs: 20119086 -> 20119034 (<.01%) instructions in affected programs: 9056 -> 9004 (-0.57%) helped: 8 HURT: 0 helped stats (abs) min: 2 max: 16 x̄: 6.50 x̃: 4 helped stats (rel) min: 0.29% max: 1.75% x̄: 1.00% x̃: 0.98% 95% mean confidence interval for instructions value: -11.01 -1.99 95% mean confidence interval for instructions %-change: -1.56% -0.44% Instructions are helped. total cycles in shared programs: 861019414 -> 861021044 (<.01%) cycles in affected programs: 279862 -> 281492 (0.58%) helped: 4 HURT: 2 helped stats (abs) min: 6 max: 936 x̄: 239.00 x̃: 7 helped stats (rel) min: 0.03% max: 8.13% x̄: 2.09% x̃: 0.09% HURT stats (abs) min: 18 max: 2568 x̄: 1293.00 x̃: 1293 HURT stats (rel) min: 0.36% max: 1.14% x̄: 0.75% x̃: 0.75% 95% mean confidence interval for cycles value: -972.56 1515.89 95% mean confidence interval for cycles %-change: -4.77% 2.49% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 17812327 -> 17812263 (<.01%) instructions in affected programs: 9867 -> 9803 (-0.65%) helped: 8 HURT: 0 helped stats (abs) min: 2 max: 28 x̄: 8.00 x̃: 4 helped stats (rel) min: 0.32% max: 1.80% x̄: 1.00% x̃: 0.95% 95% mean confidence interval for instructions value: -15.46 -0.54 95% mean confidence interval for instructions %-change: -1.54% -0.47% Instructions are helped. total cycles in shared programs: 904768620 -> 904773291 (<.01%) cycles in affected programs: 454799 -> 459470 (1.03%) helped: 4 HURT: 4 helped stats (abs) min: 36 max: 586 x̄: 344.50 x̃: 378 helped stats (rel) min: 0.47% max: 4.04% x̄: 2.01% x̃: 1.77% HURT stats (abs) min: 1 max: 5572 x̄: 1512.25 x̃: 238 HURT stats (rel) min: <.01% max: 2.77% x̄: 1.46% x̃: 1.53% 95% mean confidence interval for cycles value: -1122.40 2290.15 95% mean confidence interval for cycles %-change: -2.26% 1.71% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 18581 -> 18579 (-0.01%) spills in affected programs: 323 -> 321 (-0.62%) helped: 1 HURT: 0 total fills in shared programs: 24985 -> 24981 (-0.02%) fills in affected programs: 1348 -> 1344 (-0.30%) helped: 1 HURT: 0 Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) Instructions in all programs: 143585431 -> 143513657 (-0.0%) Instructions helped: 14403 Cycles in all programs: 8439312778 -> 8439371578 (+0.0%) Cycles helped: 10570 Cycles hurt: 3290 Gained: 146 Lost: 74 All of the lost and gained fossil-db shaders are SIMD32 fragment shaders. 14,247 of the affected shaders are from Shadow of the Tomb Raider. 154 are from Batman Arkham Origins, and the remaining two are from Octopath Traveler. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15089>	2022-04-07 18:26:23 +00:00
Kenneth Graunke	6fa66ac228	intel/compiler: Implement nir_intrinsic_last_invocation We haven't exposed this intrinsic as it doesn't directly correspond to anything in SPIR-V. However, it's used internally by some NIR passes, namely nir_opt_uniform_atomics(). We reuse most of the infrastructure in brw_find_live_channel, but with LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>	2022-03-26 00:28:19 +00:00

1 2 3 4 5

239 commits