fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 00:38:48 +02:00

Author	SHA1	Message	Date
Ganesh Belgur Ramachandra	d5ef8a0ac0	radeonsi: enable nir pass for 64 bit operations Enables optimisations for divide-by-constant which are required in some shaders. e.g. si_create_query_result_cs() Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25972>	2024-01-02 10:53:30 +00:00
Konstantin Seurer	b88ac6b381	nir: Optimize fpow with small constant exponents They would be turned into exp(log(a)*b) instead, which is slow. Totals from 2146 (2.52% of 85071) affected shaders: MaxWaves: 35769 -> 35779 (+0.03%); split: +0.03%, -0.01% Instrs: 6476835 -> 6465494 (-0.18%); split: -0.18%, +0.00% CodeSize: 35382288 -> 35347092 (-0.10%); split: -0.10%, +0.00% SpillSGPRs: 1055 -> 1017 (-3.60%) Latency: 75211743 -> 75063623 (-0.20%); split: -0.20%, +0.00% InvThroughput: 17525115 -> 17501745 (-0.13%); split: -0.14%, +0.00% VClause: 200089 -> 200077 (-0.01%); split: -0.01%, +0.01% SClause: 293566 -> 293480 (-0.03%); split: -0.03%, +0.00% Copies: 649631 -> 640516 (-1.40%); split: -1.44%, +0.03% Branches: 268441 -> 268325 (-0.04%) PreSGPRs: 146868 -> 146045 (-0.56%) PreVGPRs: 134125 -> 134128 (+0.00%); split: -0.00%, +0.01% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26727>	2024-01-02 11:16:14 +01:00
Juan A. Suarez Romero	8b3496df30	ci/v3dv: update results Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26846>	2024-01-02 10:23:24 +01:00
Luca Weiss	2e46dd0624	freedreno: Enable A305B Enable the Adreno 305B that is found on the MSM8226(v2) SoC (Snadragon 400). Signed-off-by: Luca Weiss <luca@z3ntu.xyz> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26434>	2024-01-01 20:30:46 +00:00
Mark Collins	80a319c0b4	freedreno/rddecompiler: Add ability to read GPU buffer into file While running tests, it is be useful to have non-sequenced dumps of certain buffers to see their contents from changes in the decompiled CS. This introduces a function gpu_read_into_file(...) for specifying a file to read a specific GPU buffer into after replaying the CS. Signed-off-by: Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>	2024-01-01 18:47:48 +00:00
Mark Collins	3c89b2882f	freedreno/rddecompiler: Print pkt values in hex As most of the pkt values are arbitrarily encoded numbers, they are less readable as integers and printing them as hex is preferable. Signed-off-by: Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>	2024-01-01 18:47:48 +00:00
Mark Collins	84e5b28514	freedreno/rddecompiler: Reset buffers after RD_CMDSTREAM_ADDR This is necessary to correctly decode certain traces such as those that use FDM. Signed-off-by: Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>	2024-01-01 18:47:48 +00:00
Mark Collins	fa735aacbf	freedreno/rddecompiler: Decode ELSE branches using NOPs In newer traces, in any cases where instructions need to be executed for both cases of a predicate, such as for GMEM/sysmem. The proprietary driver emits the TRUE and FALSE body one after another with a NOP at the end of the TRUE condition body so the CP skips over the FALSE body. Currently, the NOP skips over all instructions in the ELSE body which results in them not being decoded whatsoever. This commit checks if we encounter any NOPs while in a conditional block and appropriately parses out them out into their own ELSE scope when we do. Signed-off-by: Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>	2024-01-01 18:47:48 +00:00
Mark Collins	cfc2a85b89	freedreno/rddecompiler: Emit explicit scope for CP_COND_REG_EXEC Due to the larger amount of conditional execution in newer traces the flat view makes it hard to parse what might be conditionally executed and what might now. This makes it easier to view by adding a scope for conditionally executed commands which is indented to the next level. Signed-off-by: Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>	2024-01-01 18:47:48 +00:00
Rhys Perry	10e0518a85	nir/loop_analyze: remove invariance analysis compute_invariance_information() wasn't doing anything. The only variables not skipped in the list are phis (which are never considered invariant) and ALU instructions which use the phi as one of it's sources. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23726>	2024-01-01 14:15:39 +00:00
Yonggang Luo	0210b554d6	treewide: Replace the include of nir_types.h with glsl_types.h Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26753>	2023-12-30 15:08:11 +00:00
Ian Romanick	2e75d71c1f	intel/cmat: Generate better code for nir_intrinsic_cmat_insert When the source destination index is a constant, we can avoid generating a lot of the intermediate code. At the very least, this makes initial NIR dumps much easier to read. v2: Simplify tracking of dst_index. Suggested by Caio. Suggested-by: Caio Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	c6d44284aa	intel/dev: Enable VK_KHR_cooperative_matrix on all Gfx9+ GPUs Gfx12.5 (DG2) will use DPAS instructions to accelerate the implementation. Earlier platforms will use equivalent discrete instructions (basically subgroup operations). Gfx12 (Tigerlake) will use DP4A for 8-bit integer matrix multiplication. Older platforms, which lack DP4A, will use a suboptimal instruction sequence. There is plenty of room for improvement here. On DG2 (Gfx12.5) gets the following results from the CTS: Test run totals: Passed: 1642/13982 (11.7%) Failed: 0/13982 (0.0%) Not supported: 12340/13982 (88.3%) Warnings: 0/13982 (0.0%) Waived: 0/13982 (0.0%) On DG2 (Gfx12.5) with forced lowering, Raptor Lake (Gfx12) and Ice Lake (Gfx11): Test run totals: Passed: 1662/13982 (11.9%) Failed: 0/13982 (0.0%) Not supported: 12320/13982 (88.1%) Warnings: 0/13982 (0.0%) Waived: 0/13982 (0.0%) The difference in the number of tests run is due to saturatingAccumulation not being set on DG2 when DPAS is used. There is a comment in "intel/dev: Advertise integer configs with saturatingAccumulation too" that explains how this could be added should the need arise. v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	8ea032b78e	intel/dev: Advertise integer configs with saturatingAccumulation too VUID-RuntimeSpirv-saturatingAccumulation-08983 says: For OpCooperativeMatrixMulAddKHR, the SaturatingAccumulation cooperative matrix operand must be present if and only if VkCooperativeMatrixPropertiesKHR::saturatingAccumulation is VK_TRUE. As a result, we have to advertise integer configs both with and without this flag set. v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	f952dd510e	anv: Select the SIMD mode very early when cooperative matrices are used The commit is a little ugly. The definition of anv_fixup_subgroup_size is moved before the added call site. In addition, the bit starting at the "Cooperative matrix extension requires..." comment is added. v2: Dramatic simplification of SIMD selection. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	511f91e307	anv: Lower indirect derefs again after lowering cooperative matrices The cooperative matrix lowering can generate a lot of indirect array accesses, and these need to be eliminated. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	b741a9a851	anv: Set PIPELINE_SELECT systolic mode enable flag Set the flag on compute shaders when the application has enabled the cooperative matrix feature. We might still want to enable this only when DPAS is actually used. The current method is based on many suggestions from Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	7bfbeb79a7	anv: Set COMPUTE_WALKER systolic mode enable flag Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	67739b02de	anv: Add anv_physical_device::has_cooperative_matrix This flag tracks whether or not cooperative matrices are fully enabled on the physica device (i.e., both the configs exist and the environment varible is set). This is mainly to support a later commit "anv: Set PIPELINE_SELECT systolic mode enable flag." This could be squashed into "anv: Implement VK_KHR_cooperative_matrix." I left it separate because we might go back to the previous method. v3: Don't hide the extension behind an environment variable (ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting PIPELINE_SELECT. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Caio Oliveira	0a6f8b40bf	anv: Implement VK_KHR_cooperative_matrix v2: Rebase on moving lowering pass to src/intel/compiler. v3: Don't hide the extension behind an environment variable (ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting PIPELINE_SELECT. v4: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Also rebase on `f99e43d606` ("anv: switch to use runtime physical device properties infrastructure"). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Caio Oliveira	ff16458478	intel/dev: Add cooperative matrix configuration information v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	6b14da33ad	intel/fs: nir: Add nir_intrinsic_dpas_intel v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion. v3: Fix float16 destination DPAS on DG2. v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. v5: Rebase on !26323. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:43 -08:00
Ian Romanick	3756f60558	intel/fs: DPAS lowering Implements integer dot product lowering both with and without DP4A. Implements half-float dot product lowering. There are a couple FINISHME comments describing future optimizations. v2: Add a brw_compiler::lower_dpas flag to track when the lowering should be applied. v3: Use is_null() instead of checking file != ARF. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:15 -08:00
Ian Romanick	3cb9625539	intel/fs: Fix scoreboarding for DPAS v2: Remove all mention of DPASW. Suggested by Curro and Caio. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:15 -08:00
Ian Romanick	eb1f19d7bf	intel/compiler: Validation for DPAS instructions v2: s/regiser/register/g in messages. Noticed by Caio. Add more context to the sub-byte precision error message. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:15 -08:00
Ian Romanick	1c92dad5cb	intel/disasm: Disassembly support for DPAS v2: Fix regioning in src[012]_dpas_3src. Noticed by Caio. Treat DPAS as unordered. Suggested by Curro. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:13 -08:00
Ian Romanick	e666872c75	intel/compiler: Initial bits for DPAS instruction v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix overlapping register allocation (via has_source_and_destination_hazard). Fix incorrect destination register file encoding. v3: Prevent lower_regioning from trying to "fix" DPAS sources. v4: Add instruction latency information for scheduling and perf estimates. v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update the comment in fs_inst::has_source_and_destination_hazard. Suggested by Caio. v6: Add some comments near the src2 calculation in fs_inst::size_read. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:16 -08:00
Ian Romanick	3a35f8b29b	intel/cmat: Lower cmat_load and cmat_store v2: Add support for non-constant stride. v3: Explain B matrices (a little bit) in get_slice_type_from_desc. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:16 -08:00
Ian Romanick	502be565da	intel/cmat: Add lowering for cmat_bitcast v2: Use nir_component_mask(...) instead of 0xffff. Assert that source and destination are same size. Both suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	7303315a8b	intel/cmat: Enable packed formats for scalar ops v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_scalar handling. This saved 13 lines of code. v3: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	26c4acd8ee	intel/cmat: Enable packed formats for binary ops v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_binary handling. This saved 13 lines of code. v3: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	0d314eb3cc	intel/cmat: Enable packed formats for unary, length, and construct With this, a minimum test case passes: void main() { coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA; coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR; matA = coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(2.0); matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA); coopMatStore(matR, result, 0, N, gl_CooperativeMatrixLayoutRowMajor); } v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in one of the 4x8 cases. v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify coop_unary handling. This saved 67 lines of code. v4: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. v5: Massive update to the comment in lower_cooperative_matrix_unary_op with some suggestions from Caio. Add a comment and assertion around `nir_def *v[4]`. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	75388a71c9	intel/cmat: Add lowering for cmat_insert and cmat_extract v2: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	a2ded5b26c	intel/cmat: Update get_slice_type for packed slices Also splits off another funciton get_slice_type_from_desc that will be used in future commits. v2: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. v3: Use glsl_base_type_get_bit_size. v4: Adjust packing so that a single row fills an entire GRF. v5: Add comment for get_packing_factor and some other cleanups there. s/cooperative_matrix/cmat/. Tighten the validation of len in gt_slice_from_desc. All suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Caio Oliveira	dba6451ce8	intel/cmat: Add pass to lower cooperative matrix to subgroup operations This is just the skeleton of the implementation. Future commits will fill it all in. v2: Move to src/intel/compiler v3 (idr): Use vecN instead of array[N] for slice type. v4 (idr): Refactor lower_cooperative_matrix_load and lower_cooperative_matrix_store into a single function. v5 (idr): Remove old, verbose debug logging. Assert that entry is not NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of 0xffff. s/cooperative_matrix/cmat/. All suggested by Caio. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> I put both R-b on this because, at this point, we've each done equal parts authoring and reviewing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Danylo Piliaiev	61c9cf9890	freedreno: Add a644 support The GPU is same as a660 but for SP_DBG_ECO_CNTL register value. Checked by comparing cmd streams between them. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10366 Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26836>	2023-12-29 09:44:07 +00:00
Nanley Chery	ee524e198b	iris: Fix lowered images in get_main_plane_for_plane This function was recently simplified based on the idea that if a modifier is not present, then the plane count should not exceed the plane count of the resource's external format. This seems to be true except for lowered images. We don't enable compression modifiers on lowered images, so this case was not handled during the transition. As an example of the lowering that may occur: PIPE_FORMAT_YVYU is a single plane, subsampled format that the gallium layer lowers to two planes/formats (R8G8_UNORM and B8G8R8A8_UNORM) if not natively supported by the hardware. Fixes the assert failure when running the piglit test case: ext_image_dma_buf_import-sample_yuv -fmt=YVYU -auto ext_image_dma_buf_import-sample_yuv: ../../src/gallium/drivers/iris/iris_resource.c:1384: iris_resource_from_handle: Assertion `main_res->aux.surf.row_pitch_B == plane_res->surf.row_pitch_B' failed. Also, replaces it with a new one in case this fails again: ext_image_dma_buf_import-sample_yuv: ../../src/gallium/drivers/iris/iris_resource.c:1381: iris_resource_from_handle: Assertion `isl_drm_modifier_has_aux(whandle->modifier)' failed. Fixes: `79222e5884` ("iris: Simplify get_main_plane_for_plane") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26826>	2023-12-29 09:08:57 +00:00
Nanley Chery	94e5b5d049	isl: Handle MOD_INVALID in clear color plane check In iris, if whandle->modifier is DRM_FORMAT_MOD_INVALID within iris_resource_from_handle, isl_drm_modifier_plane_is_clear_color will assert fail on non-existent modifier info. Update that function to return early instead. Fixes the assert failure when running the piglit test case: ext_image_dma_buf_import-sample_yuv -fmt=YVYU -auto ext_image_dma_buf_import-sample_yuv: ../../src/intel/isl/isl.h:2352: isl_drm_modifier_plane_is_clear_color: Assertion `mod_info' failed. Fixes: `81d132d5ea` ("iris: Use helpers for generic aux plane importing") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26826>	2023-12-29 09:08:57 +00:00
Rohan Garg	5fff6eac42	intel/compiler: Update disassembly for new LSC cache enums Rework: * Caio: Add remaining enum values. Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26837>	2023-12-28 15:13:24 -08:00
Francisco Jerez	b91fa057ab	intel/compiler/xe2: Don't disassemble non-existent fields. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26837>	2023-12-28 15:13:24 -08:00
Jordan Justen	6495fe3d37	intel/xe2+: Implement brw_wm_state_simd_width_for_ksp() on Xe2+. The mechanism for selecting dispatch modes has changed from previous platforms, add a new implementation brw_wm_state_simd_width_for_ksp() using the new kernel dispatch controls. [ Francisco Jerez: Split from a larger patch, handle multipolygon dispatch, add additional comments. ] Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	d8ad51ec76	intel/xe2+: Implement fragment shader dispatch state setup. This sets up the PS dispatch controls to a supported combination of Kernel0/Kernel1 dispatch modes, initializing the polygon packing controls to use a multipolygon dispatch mode if one was provided. Rework: * Jordan: Move into intel_update_ps_state() Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	ab0eff4388	intel/fs/xe2+: Attempt to build quad-SIMD8 and dual-SIMD16 FS variants on Xe2+ platforms. Extend the pre-existing dual-SIMD8 compilation path in brw_compile_fs() to attempt quad-SIMD8 and dual-SIMD16 compiles. Instead of building every possible dispatch mode and then picking one based on cycle-count heuristics, this attempts to only build a single multipolygon kernel -- The different mulipolygon dispatch modes are tried in the expected order of decreasing performance (quad-SIMD8, dual-SIMD16 then dual-SIMD8), the first one that successfully compiles without spills is taken as a simple heuristic, and no further multipolygon builds are attempted. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	8cd8d6bccc	intel: Add debug flags for enabling Xe2+ multipolygon fragment shader dispatch modes. Note that the multipolygon PS disptach modes supported by Xe2 aren't enabled by default yet, but they can be enabled manually via INTEL_SIMD_DEBUG=fs2x8,fs4x8,fs2x16. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	50d084ec29	intel/fs/xe2+: Lower SIMD width of instructions that access ATTR file from SIMD2x8/4x8 FS. This is needed because the information stored on the ATTR file for multipolygon fragment shaders isn't stored as a contiguous sequence in the GRF, instead the ATTR source may be lowered by assign_urb_setup() to use a <16;8,0> region, which reads 4 SIMD16 GRFs for a SIMD32 instruction, even though the result of fs_inst::size_read() is expected to be 2 GRFs. Special case ATTR sources for multipolygon PS shaders to calculate the number of physical GRFs that will actually be read by the instruction after lowering, based on the number of polygons processed by the instruction. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	0d332d0c49	intel/fs: Plumb shader instead of compiler to get_lowered_simd_width() and friends. This will allow making lowering decisions based on properties of the shader, like the multipolygon dispatch mode used. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	bd634bef12	intel/fs/xe2+: Implement layout of mesh shading per-primitive inputs in PS thread payloads. This is based on a previous patch by Marcin Ślusarz addressing the same issue, though it's largely rewritten, simplified and includes additional fixes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	4cebfaadf7	intel/fs/xe2+: Implement support for multi-polygon vertex setup data in PS payload. This fixes a number of assumptions made by the multipolygon input attribute handling code from assign_urb_setup() so it also works on Xe2+, which has additional multipolygon dispatch modes (like SIMD4x8 and SIMD2x16) and uses a different more compact representation of the plane parameters. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	702eabaaae	intel/fs/xe2+: Update for new layout of vertex setup data in PS payload. The interpolation deltas of PS inputs now show up as a 12B vec3 (A0, A1-A0, A2-A0) in the ATTR file, instead of the previously used 16B format with an unused component. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	d622e19f00	intel/fs/xe2+: Enable new format of barycentrics in PS payload. The X and Y barycentric vectors are no longer interleaved in SIMD8 chunks (yay), so this is mostly a matter of disabling the lower_barycentrics() pass and switching to a simpler implementation of fetch_barycentric_reg() that simply calls fetch_payload_reg() instead of the SIMD8 shuffling we had to do in previous generations. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00

1 2 3 4 5 ...

182651 commits