fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 11:20:11 +01:00

Author	SHA1	Message	Date
Daniel Stone	e05415a82e	format: Generate endian-independent format aliases Instead of having a hardcoded list of endian-independent format aliases in the header, generate them from the format definitions. Signed-off-by: Daniel Stone <daniels@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>	2024-07-19 13:50:42 +00:00
Lionel Landwerlin	67b778445a	brw: fix uniform rebuild of sources If you have something like this : con 32 %66 = @load_reg (%62) (base=0, legacy_fabs=0, legacy_fneg=0) con 32 %27 = @resource_intel (%22 (0xdeaddead), %66, %67, %17 (0x0)) (desc_set=2, binding=96, resource_intel=0, resource_block_intel=-1) Just copying the brw_reg in ssa_values[] is not enough for the load_reg intrinsic. We need to call get_nir_src() to force some logic to create the register correct. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b8209d69ff` ("intel/fs: Add support for new-style registers") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30050>	2024-07-18 19:58:46 +00:00
Kenneth Graunke	d630ff1f79	intel/brw: Disallow scalar byte to float conversions on DG2+ I haven't been able to find this restriction mentioned anywhere in the hardware documentation, but the simulator has code to reject this case as invalid, and it doesn't appear to work on hardware anymore. Having lower_regioning() handle this takes care of the issue so we don't have to worry about generating it in random places. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11489 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30140>	2024-07-18 18:51:35 +00:00
Kenneth Graunke	534f0019d7	intel/brw: Don't mix types for unary extended math instructions We were generating odd instructions like: math inv(8) g93<1>HF g85<8,8,1>HF null<8,8,1>F { align1 1Q @7 $4 }; It's unclear whether the type of the null operand matters, but sometimes these things don't get ignored properly. Out of caution, retype the null source to match the actual operand's type. It'll at least look less surprising in assembly dumps. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30193>	2024-07-18 03:25:06 +00:00
Caio Oliveira	e3e712e74e	intel/elk: Convert missing uses of ralloc to linear in fs_live_variables And use the non-zeroing variant in cases we are filling the data immediately. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30201>	2024-07-16 23:53:45 +00:00
Caio Oliveira	3700e49fff	intel/brw: Convert missing uses of ralloc to linear in fs_live_variables And use the non-zeroing variant in cases we are filling the data immediately. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30201>	2024-07-16 23:53:45 +00:00
Caio Oliveira	f48b3bee31	intel/brw: Split off assembler logic into library Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30006>	2024-07-12 19:34:23 +00:00
Caio Oliveira	c2d1e10315	intel/brw: Don't print extra newlines in assembler Handle '\n' when inside the MSGDESC start condition, otherwise the lexer would apply its default rule (write to stdout). Without that, newlines were "leaking" to the output when parsing a multiple line "MsgDesc". E.g. given the file example.asm below ``` send(8) nullUD g126UD nullUD 0x02000000 0x00000000 thread_spawner MsgDesc: mlen 1 ex_mlen 0 rlen 0 { align1 WE_all 1Q @1 EOT }; ``` the assembler would produce one extra newline ``` $ brw_asm -t hex -g tgl example.asm 31 01 03 80 04 00 00 00 0c 7e 00 70 00 00 00 00 ``` Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30100>	2024-07-11 21:07:54 +00:00
Caio Oliveira	e63b0571bc	intel/brw: Account for reg_unit() in assembler Use reg_unit() to match the internal representation in brw_reg. Fixes the assembler tool when targetting Xe2. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30060>	2024-07-11 16:38:54 +00:00
Caio Oliveira	6cdd56e7ed	intel/brw: Use brw_inst_set_group() to set QtrCtrl and NibCtrl The function handles the Xe2 case where NibCtrl is gone. Also add error messages for invalid input when assembling for Xe2, e.g. "2N". Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30060>	2024-07-11 16:38:54 +00:00
Caio Oliveira	c3c65e8821	intel/brw: Don't set acc_wr_control for Xe2 Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30060>	2024-07-11 16:38:54 +00:00
Kenneth Graunke	837c441acb	intel/nir: Don't needlessly split u2f16 for nir_type_uint32 Commit `f695a9fed2` moved the 64-bit float <-> 16-bit float conversion splitting into a core NIR pass, so the code remaining here is only needed for 64-bit integer types. Presumably in an attempt to remove the float handling, it replaced simple bit_size == 64 checks with this expression: (full_type & (nir_type_int64 \| nir_type_uint64)) I believe that the intended expression was: (full_type == nir_type_int64 \|\| full_type == nir_type_uint64) Unfortunately, the former is incorrect. Any integer or unsigned NIR type would trigger the former expression. For example: nir_type_uint32 & (nir_type_int64 \| nir_type_uint64) => nir_type_uint This meant that we were splitting e.g. u2f16 on 32-bit unsigned types into u2f32 and f2f16, when we can easily natively handle that case. To fix this, we go back to simple bit_size == 64 checks. This pass is already run after nir_lower_fp16_casts which will split the float case, so we will never see it here. fossil-db on Alchemist shows a -1.14% reduction in affected shaders for google-meet-clvk shaders. In another ChromeOS workload, it improves performance by around 8% on Meteorlake. Thanks to Sushma Venkatesh Reddy for finding this performance issue! Fixes: `f695a9fed2` ("intel/compiler: use nir_lower_fp16_casts") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30091>	2024-07-11 02:37:05 -07:00
Romaric Jodin	65c0ef859f	intel/brw: allocate large table in the heap instead of the stack When having a large number of virtual register this table can be too large to be allocated on the stack. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30008>	2024-07-03 12:10:28 +00:00
Caio Oliveira	260a5fc7b3	intel/brw: Move brw_reg helpers into brw_reg.h Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	71ccf8e4cd	intel/brw: Rename fs_reg_* helpers to brw_reg_* Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	fe46efa647	intel/brw: Make fs_reg an alias of brw_reg And rename the brw_reg_from_fs_reg() function to something more appropriate. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	69f4ed3102	intel/brw: Rename brw_reg() helper to brw_make_reg() To avoid conflict with the name of the type later on. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	6b2405e1f5	intel/brw: Remove duplicated functions between fs_reg/brw_reg Update the brw_reg ones and use them. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	d00329e821	intel/brw: Replace some fs_reg constructors with functions Create three helper functions for ATTR, UNIFORM and VGRF creation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	06fbab3a74	intel/brw: Remove conversion from fs_reg to brw_reg They are effectively the same now. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	e4f37c6ab9	intel/brw: Move most member functions from fs_reg to brw_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	ca1afe2726	intel/brw: Use public inheritance for fs_reg/brw_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	f54dfbf4fe	intel/brw: Move fs_reg data members up to brw_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	2ce6dcf043	intel/brw: Remove unused variable from test This would cause warning (and error in GitLab CI) after later changes to fs_reg/brw_reg. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	0d9f58db04	intel/brw: Remove RALLOC helper from fs_reg Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Caio Oliveira	def70c1673	intel/brw: Remove unused brw_reg related functions Most of these were used by the vec4 backend that was removed from brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Qiang Yu	3151f5ec47	nir: add filter parameter to nir_lower_array_deref_of_vec To be used by latter commits to limit the lowering to specific variables. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29799>	2024-07-03 02:06:56 +00:00
Sagar Ghuge	99ce8b5a07	intel/compiler: Add indirect mov lowering pass Indirect addressing(vx1 and vxh) not supported with UB/B datatype for src0, so we need to change the data type for both dest and src0. This fixes following tests cases on Xe2+ - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_16* - dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_32* Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>	2024-07-01 19:06:31 +00:00
Kenneth Graunke	1e69ec3b8d	intel/brw: Add a lower_csel pass and allow building it for all types We can do CSEL on F, HF, W, and D on Gfx11+. Gfx9 can only do F. We can lower unsupported types to CMP+CSEL, allowing us to use CSEL in the IR and not worry about the limitations. Rework: (Sagar) - Update validation pass for CSEL Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>	2024-07-01 19:06:31 +00:00
David Heidelberg	68215332a8	build: pass licensing information in SPDX form Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Dylan Baker <dylan.c.baker@intel.com> Acked-by: Eric Engestrom <eric@igalia.com> Acked-by: Daniel Stone <daniels@collabora.com> Signed-off-by: David Heidelberg <david@ixit.cz> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29972>	2024-06-29 12:42:49 -07:00
Caio Oliveira	d89bfb1ff7	intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task Reorganize the code to make clearer all the lowering cases: (a) Single invocation workgroup. Index and IDs are all zero. (b) Local ID provided by hardware. (c) Local Index provided by the hardware. Depending on the case this might not be the final local index, e.g. heuristics for tile. (d) Neither provided by the hardware. Case (c) is new and supported by Mesh/Task shaders. At the moment the nir_lower_compute_system_values handle lowering of LocalID for Task/Mesh, but a later patch will flip that on ANV. This will make the Task/Mesh use the same lowering as Compute shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29828>	2024-06-28 16:30:38 +00:00
Sagar Ghuge	edcad250ed	intel/compiler: Don't use half float param for sample_b Looks like some of the tests uses the bias which does not fit into half float parameter, so it's better to use float param for sample_b. If we have cube arrays, we anyway combine BIAS and array index properly so we don't have to worry about the first parameter. This fixes: GTF-GL46.gtf21.GL3Tests.texture_lod_bias.texture_lod_bias_clamp_m_g_M Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29533>	2024-06-28 03:33:18 +00:00
Dylan Baker	35298e84f1	intel/compiler: move predicated_break out of backend loop This has no impact on the generated shaders, but does have a small (positive) impact on the amount of time spent in shader compilation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29126>	2024-06-27 15:20:19 -07:00
Jordan Justen	7b3149c99b	intel/brw: Retype some regs to BRW_TYPE_UD for Xe2 indirect accesses Following https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957, some Xe2 code paths started triggering asserts. In the cases fixed by this patch, it was because of the assert added to brw_type_larger_of() in `cf8ed9925f` ("intel/brw: Make a helper for finding the largest of two types"), and then brw_type_larger_of() is used in `674e89953f`. (For example, the assert was triggering when the SHL types differed between D and UD.) Fixes: `674e89953f` ("intel/brw: Use new builder helpers that allocate a VGRF destination") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29925>	2024-06-27 21:51:07 +00:00
Ian Romanick	531461d576	intel/brw: Test corner case CSE of ADD3 instructions When the destination of both instructions is NULL and the conditional modifier matches, operands_match (by way of instructions_match) will only test the first two operands. This can result in bad CSE happening. This is a very, very narrow edge case. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>	2024-06-27 18:34:53 +00:00
Kenneth Graunke	7adccbd48d	intel/brw: Support CSE of ADD3 This one is a bit more complex in that we need to handle 3-source commutative opcodes. But it's also quite useful: fossil-db results on Alchemist (A770): Instrs: 151659750 -> 150164959 (-0.99%); split: -0.99%, +0.01% Cycles: 12822686329 -> 12574996669 (-1.93%); split: -2.05%, +0.12% Subgroup size: 7589608 -> 7589592 (-0.00%) Send messages: 7375047 -> 7375053 (+0.00%); split: -0.00%, +0.00% Loop count: 46313 -> 46315 (+0.00%); split: -0.01%, +0.01% Spill count: 110184 -> 54670 (-50.38%); split: -50.79%, +0.41% Fill count: 213724 -> 104802 (-50.96%); split: -51.43%, +0.47% Scratch Memory Size: 9406464 -> 3375104 (-64.12%); split: -64.35%, +0.23% Our older Shadow of the Tomb Raider fossil is particularly helped with over a 90% reduction in scratch access (spills, fills, and scratch size). However, benchmarking in the actual game shows no change in performance. We're thinking the game's shaders have been updated since our capture. Ian noted that there was a bug here where we'd accidentally CSE two ADD3 instructions with null destinations and different src[2] that couldn't be dead code eliminated due to conditional mods. However, this is only a bug in the new cse_defs pass so we don't need to nominate this for stable branches. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>	2024-06-27 18:34:53 +00:00
Francisco Jerez	79fa3eba11	intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel sample. This implements a replacement for the previous implementation of nir_intrinsic_load_barycentric_at_sample that relied on the Pixel Interpolator shared function, since it's going to be removed from the hardware from Xe2 onwards. This implementation simply looks up the X/Y offsets of each sample index on the table provided in the PS thread payload by using indirect addressing, then does the actual interpolation by recursing into emit_pixel_interpolater_alu_at_offset() introduced in the previous commit. Note that even though this is only immediately useful on Xe2+ platforms there's no reason why it shouldn't work on earlier platforms, as long as we have the sample X/Y offsets available in the thread payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Francisco Jerez	95eec5a0dd	intel/fs/xe2+: Add ALU-based implementation of barycentric interpolation at a per-channel offset. This implements a replacement for the previous implementation of nir_intrinsic_load_barycentric_at_offset that relied on the Pixel Interpolator shared function, since it's going to be removed from the hardware from Xe2 onwards. That's okay since we can get all the primitive setup information needed for interpolation at an arbitrary coordinate: We use the X/Y offset relative to the "X/Y Start" coordinates from the thread payload order to evaluate the plane equations also provided in the thread payload for each barycentric coordinate of each polygon. The evaluation of the barycentric plane equations (and the RHW plane equation for perspective-correct interpolation) uses the accumulator and MAD/MAC for ALU efficiency, but that means we need to manually split instructions to fit the width of the accumulator. The division and scaling for perspective-correct interpolation is also now done in the shader if necessary. Note that even though this is only immediately useful on Xe2+, the thread payload numbers are filled out for older platforms, and the EU restrictions of previous Xe platforms are taken into account, mostly for the purposes of testing and performance evaluation. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Francisco Jerez	e8007c9325	intel/fs/xe2+: Don't lower barycentric load offsets to fixed-point format on Xe2+. Floating-point offsets work fine in combination with the floating-point arithmetic we're about to lower these intrinsics into, and they require less instructions than converting to fixed-point and then back. No reason to take the precision/range hit nor the extra instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Francisco Jerez	3d30cc82f9	intel/fs/xe2+: Ask driver for PS payload registers based on barycentric load intrinsics in use. The ALU-based implementation of the barycentric interpolation intrinsics introduced by a subsequent commit will require some primitive setup information not delivered in the PS thread payload unless explicitly requested: - "Source Depth and/or W Attribute Vertex Deltas" if a perspective-correct interpolation mode is used -- Note that this is already requested for CPS interpolation, we just need to enable it in more cases. - "Perspective Bary Planes" if a perspective-correct interpolation mode is used. - "Non-Perspective Bary Planes" if a non-perspective-corrected interpolation mode is used. - "Sample offsets" if any at_sample interpolation is used so the coordinate offsets of the sample can be calculated. This ALU implementation of barycentric interpolation will only be needed for _at_offset and _at_sample interpolation, since the fixed function hardware still computes barycentrics for us at the current sample coordinates, only the cases that previously relied on the Pixel Interpolator shared function need to be re-implemented with ALU instructions, since that shared function will no longer exist on Xe2 hardware. Thanks to Rohan for a bugfix of the uses_sample_offsets calculation, this patch includes his fix squashed in. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>	2024-06-27 00:18:00 +00:00
Ian Romanick	556e78f737	intel/brw/xe2+: Allow vec16 for cooperative matrix Xe2 will allow a B matrix large enough that it will be stored in a vec16. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	b6236dd8f3	intel/brw/xe2+: Adjust DPAS lowering to DP4A to accommodate larger GRF and SIMD16 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	77ef241577	intel/brw/xe2+: Scale size_written by reg_unit for DPAS Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	e368b8e01b	intel/brw/xe2+: Adjust size_read() for DPAS v2: Remov "DG2" from a comment because it applies to DG2 and Xe2. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	b051602754	intel/brw/xe2+: Catch invalid uses of writes_accumulator earlier It turns out the problem I was trying to catch in `be4fa59a72` ("intel/brw: Clear write_accumulator flag when changing the destination") also came from the DPAS lowering pass itself. Checking for invalid uses of the feature in fs_validate helped detect the problem. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	7a773ac53e	intel/brw: Major rework of lower_cmat_load_store The original goal was to get rid of a bunch of the magic constants sprinkled through the function. Once I did that, I realized that there was a lot my symmertry between the row-major and column-major paths possible. It's +6 lines of code, but about 15 of those lines are comments explaining things that were not obvious in the original code. v2: Save duplicated condition in a variable with a meaningful name. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:16:48 -07:00
Ian Romanick	ea6e10c0b2	intel/brw: Temporarily disable result=float16 matrix configs Even though the hardware does not naively support these configurations, there are many potential benefits to advertising them. These configurations can theoretically use half the memory bandwidth for loads and stores. For large matrices, that can be the limiting in performance. The current implementation, however, has a number of significant problems. The conversion from float16 to float32 is performed in the driver during conversion from NIR. As a result, many common usage patterns end up doing back-to-back conversions to and from float16 between matrix multiplications (when the result of one multiplication is used as the accumulator for the next). The float16 version of the matrix waste half the possible register space. Each float16 value sits alone in a dword. This is done so that the per-invocation slice of an 8x8 float16 result matrix and an 8x8 float32 result matrix will have the same number of elements. This makes it possible to do straightforward implementations of all the unary_op type conversions in NIR. It would be possible to perform N:M element type conversions in the backend using specialized NIR intrinsics. However, per #10961, this would be very, very painful. My hope is that, once a suitable resolution for that issue can be found, support for these configs can be restored. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 13:52:12 -07:00
Kenneth Graunke	5cb15a6c67	intel/brw: Make bld.ADD(x, 0) emit no instructions and return x directly There are a lot of places where we add 0 to an offset. Avoiding generating this can save us algebraic + copy_propagation later. Cuts compile time in Borderlands 3 by -0.590631% +/- 0.170108% (n=25). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29849>	2024-06-24 19:12:21 -07:00
Kenneth Graunke	068865ce81	intel/brw: Make an alu2 builder helper Instead of replicating the whole thing in macros, just make an alu2() function and use that in the wrappers. It ought to get inlined anyway. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29849>	2024-06-24 19:12:19 -07:00

... 2 3 4 5 6 ...

3754 commits