mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-06-20 23:28:23 +02:00
read-only mirror of https://gitlab.freedesktop.org/mesa/mesa
Add a new optimization pass that identifies sequences of scalar dot product operations and combines them into DPAS (Dot Product Accumulate Systolic) matrix multiplication instructions for XeHP+ EUs that have a systolic array pipeline (AKA XMX engine). This is possible because a matrix multiplication as performed by DPAS can be expressed like: E^i_k = D^i_k + Sum_j A^i_j B^j_k I.e. each scalar component of a matrix multiplication is just a (possibly large) dot product. This pass identifies such chains of sdot_4x8_iadd dot products in the program and bins them according to the A and B arguments used. Sets of dot products with consecutive components are transformed into a matrix product for each densely occupied interval of indices within each bin, as long as there is an efficient way to transpose one of the arguments in the register file. This enables programs to opportunistically take advantage of the systolic array pipeline for linear arithmetic, which has massively greater throughput than the regular FPUs (roughly a factor of 4x the throughput for the specific instructions replaced currently), without the application having to be updated in order to take advantage of it through a matrix multiplication API like KHR_cooperative_matrix. The immediate motivation for this is getting the open source driver to accelerate the matrix multiplications used for inference by the XeSS ML-driven upscaling library, since the Mesa driver was currently limited to the generic HLSL path that doesn't take advantage of the XMX pipeline. Alternative AI-driven upscaling libraries can be supported in theory though this hasn't been pursued yet, and there are some assumptions in the optimization pass that might get in the way currently: - Currently only the sdot_4x8_iadd intrinsic is supported for no particular reason other than it being the intrinsic generated by the XeSS library in its multivendor path. It would be straightforward to add support for additional types supported by the systolic pipeline. - Currently one of the arguments of the dot products is restricted to be an SSBO load because that's what we encounter in the XeSS library, but any other kind of memory load intrinsic could be supported easily. - Also accidental is the current limitation to run on Xe2+ hardware. Getting it to work on XeHP (e.g. DG2) is theoretically possible beyond some minor differences so it will probably be a future area for improvement. - The limitation of the shader subgroup size to 16 done at the end of the optimization pass is less accidental, because on all Intel Xe platforms released so far the DPAS instruction is limited to run at a fixed execution width (8 on XeHP and 16 on Xe2-3), so the backend would need a way to expose variable-width DPAS intrinsics e.g. by lowering them using SIMD splitting. I have some code to try to achieve that, but the naïf SIMD splitting approach of DPAS instructions appears to hurt more cases than it helps so I don't have a ready solution to lift this restriction yet. Evaluating the impact of this on the performance of XeSS kernels using our internal microbenchmarks shows a performance improvement for XeSS inference between 26% and 44% depending on the quality preset and resolution, with a geomean improvement of 35% across the rendering modes tested. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41814> |
||
|---|---|---|
| .ci-farms | ||
| .ci-farms-disabled | ||
| .github/workflows | ||
| .gitlab | ||
| .gitlab-ci | ||
| .marge/hooks | ||
| android | ||
| bin | ||
| build-support | ||
| docs | ||
| include | ||
| licenses | ||
| src | ||
| subprojects | ||
| .clang-format | ||
| .clang-format-ignore | ||
| .clang-format-include | ||
| .dir-locals.el | ||
| .editorconfig | ||
| .git-blame-ignore-revs | ||
| .gitattributes | ||
| .gitignore | ||
| .gitlab-ci.yml | ||
| .graphqlrc.yml | ||
| .mailmap | ||
| .mr-label-maker.yml | ||
| .shellcheckrc | ||
| clippy.toml | ||
| CODEOWNERS | ||
| meson.build | ||
| meson.options | ||
| README.rst | ||
| rustfmt.toml | ||
| VERSION | ||
`Mesa <https://mesa3d.org>`_ - The 3D Graphics Library ====================================================== Source ------ This repository lives at https://gitlab.freedesktop.org/mesa/mesa. Other repositories are likely forks, and code found there is not supported. Build & install --------------- You can find more information in our documentation (`docs/install.rst <https://docs.mesa3d.org/install.html>`_), but the recommended way is to use Meson (`docs/meson.rst <https://docs.mesa3d.org/meson.html>`_): .. code-block:: sh $ meson setup build $ ninja -C build/ $ sudo ninja -C build/ install Support ------- Many Mesa devs hang on IRC; if you're not sure which channel is appropriate, you should ask your question on `OFTC's #dri-devel <irc://irc.oftc.net/dri-devel>`_, someone will redirect you if necessary. Remember that not everyone is in the same timezone as you, so it might take a while before someone qualified sees your question. To figure out who you're talking to, or which nick to ping for your question, check out `Who's Who on IRC <https://dri.freedesktop.org/wiki/WhosWho/>`_. The next best option is to ask your question in an email to the mailing lists: `mesa-dev\@lists.freedesktop.org <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>`_ Bug reports ----------- If you think something isn't working properly, please file a bug report (`docs/bugs.rst <https://docs.mesa3d.org/bugs.html>`_). Contributing ------------ Contributions are welcome, and step-by-step instructions can be found in our documentation (`docs/submittingpatches.rst <https://docs.mesa3d.org/submittingpatches.html>`_). Note that Mesa uses gitlab for patches submission, review and discussions.