fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 11:00:11 +01:00

Author	SHA1	Message	Date
Matt Turner	a3714b55f4	intel/elk: Use REG_CLASS_COUNT Fixes: `d44462c08d` ("intel/elk: Fork Gfx8- compiler by copying existing code") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30314>	2024-07-25 14:55:09 +00:00
Matt Turner	5e24c21625	intel/brw: Use REG_CLASS_COUNT Fixes: `5d87f41a54` ("intel/fs/ra: Define REG_CLASS_COUNT constant specifying the number of register classes.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30314>	2024-07-25 14:55:09 +00:00
Matt Turner	aae82061af	intel/clc: Free disk_cache Fixes: `c15bf88f01` ("intel: Add a little OpenCL C compiler binary") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30313>	2024-07-24 20:46:28 +00:00
Matt Turner	1574372de4	intel/clc: Free parsed_spirv_data This declaration shadowed a variable by the same type and name in an outer scope. That variable is passed to clc_free_parsed_spirv(). Fixes: `4fd7495c69` ("intel/clc: add ability to output NIR") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30313>	2024-07-24 20:46:28 +00:00
Marek Olšák	b2d32ae246	nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag Instead of having 1 bit in nir_io_semantics indicating a per-primitive FS input, add a dedicated intrinsic for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>	2024-07-23 16:13:16 +00:00
Kenneth Graunke	c429d5025e	intel/brw: Don't force g1's live range to be the entire program The idea here was that pixel shader framebuffer writes used the g0 and g1 thread payload register values to construct the message header. However, most messages are headerless and don't use either. There's a 2012-era comment that the simulator at one point had a bug where certain headerless messages would incorrectly take the values from the g0/g1 register contents rather than using sideband. But, that was likely fixed eons ago. So we really don't need to do this. Furthermore, there are many more shader stages these days: - VS: r1 contains output URB handles - TCS: r1 contains ICP handles - TES: r1 contains gl_TessCoord.x (r4 contains output URB handles) - GS: r1 contains output URB handles - CS: r1 contains LocalID.X on DG2+ but nothing on older hardware - Task/Mesh: r1 contains LocalID.X - BS: r1 contains bindless stack handles Vertex and geometry aren't likely to benefit here because r1 is needed for their output messages, which are also what terminate the shader. TES will definitely benefit because we were making a value pointlessly live for the whole program. Same for TCS, to a lesser extent. Compute prior to DG2 was the worst, as g1 literally has no meaningful content, so there is no point to keeping it live. fossil-db on Alchemist shows substantial spill/fill improvements: Totals: Instrs: 148782351 -> 148741996 (-0.03%); split: -0.03%, +0.01% Cycles: 12602907531 -> 12605795191 (+0.02%); split: -0.70%, +0.72% Subgroup size: 7518608 -> 7518632 (+0.00%) Send messages: 7341727 -> 7341762 (+0.00%) Spill count: 54633 -> 52575 (-3.77%) Fill count: 104694 -> 100680 (-3.83%) Scratch Memory Size: 3375104 -> 3287040 (-2.61%) Totals from 301172 (48.21% of 624670) affected shaders: Instrs: 95531927 -> 95491572 (-0.04%); split: -0.05%, +0.01% Cycles: 9643531593 -> 9646419253 (+0.03%); split: -0.91%, +0.94% Subgroup size: 4492512 -> 4492536 (+0.00%) Send messages: 4399737 -> 4399772 (+0.00%) Spill count: 20034 -> 17976 (-10.27%) Fill count: 41530 -> 37516 (-9.67%) Scratch Memory Size: 1522688 -> 1434624 (-5.78%) Assassin's Creed Odyssey in particular has 20% fewer fills. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30146>	2024-07-23 02:26:52 +00:00
Caio Oliveira	8ba8e33c39	intel/brw: Simplify @file annotations Doxygen documentation says > If the file name is omitted (i.e. the line after \file is left > blank) then the documentation block that contains the \file command will > belong to the file it is located in. so we can omit the filename itself when using the annotation. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>	2024-07-22 22:48:03 +00:00
José Roberto de Souza	de5d767f9a	intel/brw: Add a maximum scratch size restriction Gfx 12.5 moved scratch to a surface and SURFTYPE_SCRATCH has this pitch restriction: RENDER_SURFACE_STATE::Surface Pitch For surfaces of type SURFTYPE_SCRATCH, valid range of pitch is: [63,262143] -> [64B, 256KB] The pitch of the surface is the scratch size per thread and the surface should be large enough to accommodate every physical thread. So here adding a new field to intel_device_info, setting it in intel_device_info_init_common() so even offline tools can have it set. And finally adding a check to fail shader compilation if needed scratch is larger than supported. This issue can be reproduced in debug builds when running dEQP-VK.protected_memory.stack.stacksize_1024 on Gfx 12.5 or newer platforms. Ref: BSpec 43862 (r52666) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30271>	2024-07-22 18:17:38 +00:00
Francisco Jerez	b98eebbcb2	intel/brw: Implement null push constant workaround. This implements an undocumented workaround for a hardware bug that affects draw calls with a pixel shader that has 0 push constant cycles when TBIMR is enabled, which has been seen to lead to a hang with Fallout 3 and Metal Gear Rising Revengeance. This hardware bug has been reported as HSDES#22020184996 which is still pending a resolution by the hardware team. However since this workaround found empirically has been confirmed to fix the issue reliably and it's relatively harmless it seems worth checking in already even though no final W/A number is available nor has the W/A json file been updated. To avoid the issue we simply pad the push constant payload to be at least 1 register. This is enabled via a brw_wm_prog_key since the driver needs to be in agreement with the compiler on whether the dummy push constant cycle is present, and it can be avoided in cases where the driver knows that TBIMR will be disabled (e.g. for BLORP). Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10728 Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11399 Fixes: `57decad976` ("intel/xehp: Enable TBIMR by default.") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30031>	2024-07-20 01:13:19 +00:00
Daniel Stone	e05415a82e	format: Generate endian-independent format aliases Instead of having a hardcoded list of endian-independent format aliases in the header, generate them from the format definitions. Signed-off-by: Daniel Stone <daniels@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>	2024-07-19 13:50:42 +00:00
Lionel Landwerlin	67b778445a	brw: fix uniform rebuild of sources If you have something like this : con 32 %66 = @load_reg (%62) (base=0, legacy_fabs=0, legacy_fneg=0) con 32 %27 = @resource_intel (%22 (0xdeaddead), %66, %67, %17 (0x0)) (desc_set=2, binding=96, resource_intel=0, resource_block_intel=-1) Just copying the brw_reg in ssa_values[] is not enough for the load_reg intrinsic. We need to call get_nir_src() to force some logic to create the register correct. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b8209d69ff` ("intel/fs: Add support for new-style registers") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30050>	2024-07-18 19:58:46 +00:00
Kenneth Graunke	d630ff1f79	intel/brw: Disallow scalar byte to float conversions on DG2+ I haven't been able to find this restriction mentioned anywhere in the hardware documentation, but the simulator has code to reject this case as invalid, and it doesn't appear to work on hardware anymore. Having lower_regioning() handle this takes care of the issue so we don't have to worry about generating it in random places. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11489 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30140>	2024-07-18 18:51:35 +00:00
Kenneth Graunke	534f0019d7	intel/brw: Don't mix types for unary extended math instructions We were generating odd instructions like: math inv(8) g93<1>HF g85<8,8,1>HF null<8,8,1>F { align1 1Q @7 $4 }; It's unclear whether the type of the null operand matters, but sometimes these things don't get ignored properly. Out of caution, retype the null source to match the actual operand's type. It'll at least look less surprising in assembly dumps. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30193>	2024-07-18 03:25:06 +00:00
Caio Oliveira	e3e712e74e	intel/elk: Convert missing uses of ralloc to linear in fs_live_variables And use the non-zeroing variant in cases we are filling the data immediately. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30201>	2024-07-16 23:53:45 +00:00
Caio Oliveira	3700e49fff	intel/brw: Convert missing uses of ralloc to linear in fs_live_variables And use the non-zeroing variant in cases we are filling the data immediately. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30201>	2024-07-16 23:53:45 +00:00
Caio Oliveira	f48b3bee31	intel/brw: Split off assembler logic into library Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30006>	2024-07-12 19:34:23 +00:00
Caio Oliveira	c2d1e10315	intel/brw: Don't print extra newlines in assembler Handle '\n' when inside the MSGDESC start condition, otherwise the lexer would apply its default rule (write to stdout). Without that, newlines were "leaking" to the output when parsing a multiple line "MsgDesc". E.g. given the file example.asm below ``` send(8) nullUD g126UD nullUD 0x02000000 0x00000000 thread_spawner MsgDesc: mlen 1 ex_mlen 0 rlen 0 { align1 WE_all 1Q @1 EOT }; ``` the assembler would produce one extra newline ``` $ brw_asm -t hex -g tgl example.asm 31 01 03 80 04 00 00 00 0c 7e 00 70 00 00 00 00 ``` Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30100>	2024-07-11 21:07:54 +00:00
Caio Oliveira	e63b0571bc	intel/brw: Account for reg_unit() in assembler Use reg_unit() to match the internal representation in brw_reg. Fixes the assembler tool when targetting Xe2. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30060>	2024-07-11 16:38:54 +00:00
Caio Oliveira	6cdd56e7ed	intel/brw: Use brw_inst_set_group() to set QtrCtrl and NibCtrl The function handles the Xe2 case where NibCtrl is gone. Also add error messages for invalid input when assembling for Xe2, e.g. "2N". Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30060>	2024-07-11 16:38:54 +00:00
Caio Oliveira	c3c65e8821	intel/brw: Don't set acc_wr_control for Xe2 Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30060>	2024-07-11 16:38:54 +00:00
Kenneth Graunke	837c441acb	intel/nir: Don't needlessly split u2f16 for nir_type_uint32 Commit `f695a9fed2` moved the 64-bit float <-> 16-bit float conversion splitting into a core NIR pass, so the code remaining here is only needed for 64-bit integer types. Presumably in an attempt to remove the float handling, it replaced simple bit_size == 64 checks with this expression: (full_type & (nir_type_int64 \| nir_type_uint64)) I believe that the intended expression was: (full_type == nir_type_int64 \|\| full_type == nir_type_uint64) Unfortunately, the former is incorrect. Any integer or unsigned NIR type would trigger the former expression. For example: nir_type_uint32 & (nir_type_int64 \| nir_type_uint64) => nir_type_uint This meant that we were splitting e.g. u2f16 on 32-bit unsigned types into u2f32 and f2f16, when we can easily natively handle that case. To fix this, we go back to simple bit_size == 64 checks. This pass is already run after nir_lower_fp16_casts which will split the float case, so we will never see it here. fossil-db on Alchemist shows a -1.14% reduction in affected shaders for google-meet-clvk shaders. In another ChromeOS workload, it improves performance by around 8% on Meteorlake. Thanks to Sushma Venkatesh Reddy for finding this performance issue! Fixes: `f695a9fed2` ("intel/compiler: use nir_lower_fp16_casts") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30091>	2024-07-11 02:37:05 -07:00
Romaric Jodin	65c0ef859f	intel/brw: allocate large table in the heap instead of the stack When having a large number of virtual register this table can be too large to be allocated on the stack. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30008>	2024-07-03 12:10:28 +00:00
Caio Oliveira	260a5fc7b3	intel/brw: Move brw_reg helpers into brw_reg.h Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	71ccf8e4cd	intel/brw: Rename fs_reg_* helpers to brw_reg_* Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	fe46efa647	intel/brw: Make fs_reg an alias of brw_reg And rename the brw_reg_from_fs_reg() function to something more appropriate. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	69f4ed3102	intel/brw: Rename brw_reg() helper to brw_make_reg() To avoid conflict with the name of the type later on. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	6b2405e1f5	intel/brw: Remove duplicated functions between fs_reg/brw_reg Update the brw_reg ones and use them. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	d00329e821	intel/brw: Replace some fs_reg constructors with functions Create three helper functions for ATTR, UNIFORM and VGRF creation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	06fbab3a74	intel/brw: Remove conversion from fs_reg to brw_reg They are effectively the same now. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	e4f37c6ab9	intel/brw: Move most member functions from fs_reg to brw_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	ca1afe2726	intel/brw: Use public inheritance for fs_reg/brw_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	f54dfbf4fe	intel/brw: Move fs_reg data members up to brw_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	2ce6dcf043	intel/brw: Remove unused variable from test This would cause warning (and error in GitLab CI) after later changes to fs_reg/brw_reg. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	0d9f58db04	intel/brw: Remove RALLOC helper from fs_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	def70c1673	intel/brw: Remove unused brw_reg related functions Most of these were used by the vec4 backend that was removed from brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Qiang Yu	3151f5ec47	nir: add filter parameter to nir_lower_array_deref_of_vec To be used by latter commits to limit the lowering to specific variables. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29799>	2024-07-03 02:06:56 +00:00
Sagar Ghuge	99ce8b5a07	intel/compiler: Add indirect mov lowering pass Indirect addressing(vx1 and vxh) not supported with UB/B datatype for src0, so we need to change the data type for both dest and src0. This fixes following tests cases on Xe2+ - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_16* - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_32* Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>	2024-07-01 19:06:31 +00:00
Kenneth Graunke	1e69ec3b8d	intel/brw: Add a lower_csel pass and allow building it for all types We can do CSEL on F, HF, W, and D on Gfx11+. Gfx9 can only do F. We can lower unsupported types to CMP+CSEL, allowing us to use CSEL in the IR and not worry about the limitations. Rework: (Sagar) - Update validation pass for CSEL Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>	2024-07-01 19:06:31 +00:00
David Heidelberg	68215332a8	build: pass licensing information in SPDX form Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Dylan Baker <dylan.c.baker@intel.com> Acked-by: Eric Engestrom <eric@igalia.com> Acked-by: Daniel Stone <daniels@collabora.com> Signed-off-by: David Heidelberg <david@ixit.cz> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29972>	2024-06-29 12:42:49 -07:00
Caio Oliveira	d89bfb1ff7	intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task Reorganize the code to make clearer all the lowering cases: (a) Single invocation workgroup. Index and IDs are all zero. (b) Local ID provided by hardware. (c) Local Index provided by the hardware. Depending on the case this might not be the final local index, e.g. heuristics for tile. (d) Neither provided by the hardware. Case (c) is new and supported by Mesh/Task shaders. At the moment the nir_lower_compute_system_values handle lowering of LocalID for Task/Mesh, but a later patch will flip that on ANV. This will make the Task/Mesh use the same lowering as Compute shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29828>	2024-06-28 16:30:38 +00:00
Sagar Ghuge	edcad250ed	intel/compiler: Don't use half float param for sample_b Looks like some of the tests uses the bias which does not fit into half float parameter, so it's better to use float param for sample_b. If we have cube arrays, we anyway combine BIAS and array index properly so we don't have to worry about the first parameter. This fixes: GTF-GL46.gtf21.GL3Tests.texture_lod_bias.texture_lod_bias_clamp_m_g_M Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29533>	2024-06-28 03:33:18 +00:00
Dylan Baker	35298e84f1	intel/compiler: move predicated_break out of backend loop This has no impact on the generated shaders, but does have a small (positive) impact on the amount of time spent in shader compilation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29126>	2024-06-27 15:20:19 -07:00
Jordan Justen	7b3149c99b	intel/brw: Retype some regs to BRW_TYPE_UD for Xe2 indirect accesses Following https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957, some Xe2 code paths started triggering asserts. In the cases fixed by this patch, it was because of the assert added to brw_type_larger_of() in `cf8ed9925f` ("intel/brw: Make a helper for finding the largest of two types"), and then brw_type_larger_of() is used in `674e89953f`. (For example, the assert was triggering when the SHL types differed between D and UD.) Fixes: `674e89953f` ("intel/brw: Use new builder helpers that allocate a VGRF destination") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29925>	2024-06-27 21:51:07 +00:00
Ian Romanick	531461d576	intel/brw: Test corner case CSE of ADD3 instructions When the destination of both instructions is NULL and the conditional modifier matches, operands_match (by way of instructions_match) will only test the first two operands. This can result in bad CSE happening. This is a very, very narrow edge case. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>	2024-06-27 18:34:53 +00:00
Kenneth Graunke	7adccbd48d	intel/brw: Support CSE of ADD3 This one is a bit more complex in that we need to handle 3-source commutative opcodes. But it's also quite useful: fossil-db results on Alchemist (A770): Instrs: 151659750 -> 150164959 (-0.99%); split: -0.99%, +0.01% Cycles: 12822686329 -> 12574996669 (-1.93%); split: -2.05%, +0.12% Subgroup size: 7589608 -> 7589592 (-0.00%) Send messages: 7375047 -> 7375053 (+0.00%); split: -0.00%, +0.00% Loop count: 46313 -> 46315 (+0.00%); split: -0.01%, +0.01% Spill count: 110184 -> 54670 (-50.38%); split: -50.79%, +0.41% Fill count: 213724 -> 104802 (-50.96%); split: -51.43%, +0.47% Scratch Memory Size: 9406464 -> 3375104 (-64.12%); split: -64.35%, +0.23% Our older Shadow of the Tomb Raider fossil is particularly helped with over a 90% reduction in scratch access (spills, fills, and scratch size). However, benchmarking in the actual game shows no change in performance. We're thinking the game's shaders have been updated since our capture. Ian noted that there was a bug here where we'd accidentally CSE two ADD3 instructions with null destinations and different src[2] that couldn't be dead code eliminated due to conditional mods. However, this is only a bug in the new cse_defs pass so we don't need to nominate this for stable branches. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>	2024-06-27 18:34:53 +00:00
Francisco Jerez	79fa3eba11	intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel sample. This implements a replacement for the previous implementation of nir_intrinsic_load_barycentric_at_sample that relied on the Pixel Interpolator shared function, since it's going to be removed from the hardware from Xe2 onwards. This implementation simply looks up the X/Y offsets of each sample index on the table provided in the PS thread payload by using indirect addressing, then does the actual interpolation by recursing into emit_pixel_interpolater_alu_at_offset() introduced in the previous commit. Note that even though this is only immediately useful on Xe2+ platforms there's no reason why it shouldn't work on earlier platforms, as long as we have the sample X/Y offsets available in the thread payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Francisco Jerez	95eec5a0dd	intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel offset. This implements a replacement for the previous implementation of nir_intrinsic_load_barycentric_at_offset that relied on the Pixel Interpolator shared function, since it's going to be removed from the hardware from Xe2 onwards. That's okay since we can get all the primitive setup information needed for interpolation at an arbitrary coordinate: We use the X/Y offset relative to the "X/Y Start" coordinates from the thread payload order to evaluate the plane equations also provided in the thread payload for each barycentric coordinate of each polygon. The evaluation of the barycentric plane equations (and the RHW plane equation for perspective-correct interpolation) uses the accumulator and MAD/MAC for ALU efficiency, but that means we need to manually split instructions to fit the width of the accumulator. The division and scaling for perspective-correct interpolation is also now done in the shader if necessary. Note that even though this is only immediately useful on Xe2+, the thread payload numbers are filled out for older platforms, and the EU restrictions of previous Xe platforms are taken into account, mostly for the purposes of testing and performance evaluation. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Francisco Jerez	e8007c9325	intel/fs/xe2+: Don't lower barycentric load offsets to fixed-point format on Xe2+. Floating-point offsets work fine in combination with the floating-point arithmetic we're about to lower these intrinsics into, and they require less instructions than converting to fixed-point and then back. No reason to take the precision/range hit nor the extra instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Francisco Jerez	3d30cc82f9	intel/fs/xe2+: Ask driver for PS payload registers based on barycentric load intrinsics in use. The ALU-based implementation of the barycentric interpolation intrinsics introduced by a subsequent commit will require some primitive setup information not delivered in the PS thread payload unless explicitly requested: - "Source Depth and/or W Attribute Vertex Deltas" if a perspective-correct interpolation mode is used -- Note that this is already requested for CPS interpolation, we just need to enable it in more cases. - "Perspective Bary Planes" if a perspective-correct interpolation mode is used. - "Non-Perspective Bary Planes" if a non-perspective-corrected interpolation mode is used. - "Sample offsets" if any at_sample interpolation is used so the coordinate offsets of the sample can be calculated. This ALU implementation of barycentric interpolation will only be needed for _at_offset and _at_sample interpolation, since the fixed function hardware still computes barycentrics for us at the current sample coordinates, only the cases that previously relied on the Pixel Interpolator shared function need to be re-implemented with ALU instructions, since that shared function will no longer exist on Xe2 hardware. Thanks to Rohan for a bugfix of the uses_sample_offsets calculation, this patch includes his fix squashed in. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00

1 2 3 4 5 ...

3613 commits