Commit graph

182651 commits

Author SHA1 Message Date
Ganesh Belgur Ramachandra
d5ef8a0ac0 radeonsi: enable nir pass for 64 bit operations
Enables optimisations for divide-by-constant which are
required in some shaders. e.g. si_create_query_result_cs()

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25972>
2024-01-02 10:53:30 +00:00
Konstantin Seurer
b88ac6b381 nir: Optimize fpow with small constant exponents
They would be turned into exp(log(a)*b) instead, which is slow.

Totals from 2146 (2.52% of 85071) affected shaders:
MaxWaves: 35769 -> 35779 (+0.03%); split: +0.03%, -0.01%
Instrs: 6476835 -> 6465494 (-0.18%); split: -0.18%, +0.00%
CodeSize: 35382288 -> 35347092 (-0.10%); split: -0.10%, +0.00%
SpillSGPRs: 1055 -> 1017 (-3.60%)
Latency: 75211743 -> 75063623 (-0.20%); split: -0.20%, +0.00%
InvThroughput: 17525115 -> 17501745 (-0.13%); split: -0.14%, +0.00%
VClause: 200089 -> 200077 (-0.01%); split: -0.01%, +0.01%
SClause: 293566 -> 293480 (-0.03%); split: -0.03%, +0.00%
Copies: 649631 -> 640516 (-1.40%); split: -1.44%, +0.03%
Branches: 268441 -> 268325 (-0.04%)
PreSGPRs: 146868 -> 146045 (-0.56%)
PreVGPRs: 134125 -> 134128 (+0.00%); split: -0.00%, +0.01%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26727>
2024-01-02 11:16:14 +01:00
Juan A. Suarez Romero
8b3496df30 ci/v3dv: update results
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26846>
2024-01-02 10:23:24 +01:00
Luca Weiss
2e46dd0624 freedreno: Enable A305B
Enable the Adreno 305B that is found on the MSM8226(v2) SoC (Snadragon
400).

Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26434>
2024-01-01 20:30:46 +00:00
Mark Collins
80a319c0b4 freedreno/rddecompiler: Add ability to read GPU buffer into file
While running tests, it is be useful to have non-sequenced dumps of
certain buffers to see their contents from changes in the decompiled
CS. This introduces a function gpu_read_into_file(...) for specifying
a file to read a specific GPU buffer into after replaying the CS.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
2024-01-01 18:47:48 +00:00
Mark Collins
3c89b2882f freedreno/rddecompiler: Print pkt values in hex
As most of the pkt values are arbitrarily encoded numbers, they are
less readable as integers and printing them as hex is preferable.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
2024-01-01 18:47:48 +00:00
Mark Collins
84e5b28514 freedreno/rddecompiler: Reset buffers after RD_CMDSTREAM_ADDR
This is necessary to correctly decode certain traces such as those
that use FDM.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
2024-01-01 18:47:48 +00:00
Mark Collins
fa735aacbf freedreno/rddecompiler: Decode ELSE branches using NOPs
In newer traces, in any cases where instructions need to be executed
for both cases of a predicate, such as for GMEM/sysmem. The proprietary
driver emits the TRUE and FALSE body one after another with a NOP at
the end of the TRUE condition body so the CP skips over the FALSE body.

Currently, the NOP skips over all instructions in the ELSE body which
results in them not being decoded whatsoever. This commit checks if
we encounter any NOPs while in a conditional block and appropriately
parses out them out into their own ELSE scope when we do.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
2024-01-01 18:47:48 +00:00
Mark Collins
cfc2a85b89 freedreno/rddecompiler: Emit explicit scope for CP_COND_REG_EXEC
Due to the larger amount of conditional execution in newer traces
the flat view makes it hard to parse what might be conditionally
executed and what might now. This makes it easier to view by adding
a scope for conditionally executed commands which is indented to the
next level.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
2024-01-01 18:47:48 +00:00
Rhys Perry
10e0518a85 nir/loop_analyze: remove invariance analysis
compute_invariance_information() wasn't doing anything. The only variables
not skipped in the list are phis (which are never considered invariant)
and ALU instructions which use the phi as one of it's sources.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23726>
2024-01-01 14:15:39 +00:00
Yonggang Luo
0210b554d6 treewide: Replace the include of nir_types.h with glsl_types.h
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26753>
2023-12-30 15:08:11 +00:00
Ian Romanick
2e75d71c1f intel/cmat: Generate better code for nir_intrinsic_cmat_insert
When the source destination index is a constant, we can avoid generating
a lot of the intermediate code. At the very least, this makes initial
NIR dumps much easier to read.

v2: Simplify tracking of dst_index. Suggested by Caio.

Suggested-by: Caio
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
c6d44284aa intel/dev: Enable VK_KHR_cooperative_matrix on all Gfx9+ GPUs
Gfx12.5 (DG2) will use DPAS instructions to accelerate the
implementation. Earlier platforms will use equivalent discrete
instructions (basically subgroup operations). Gfx12 (Tigerlake) will use
DP4A for 8-bit integer matrix multiplication. Older platforms, which
lack DP4A, will use a suboptimal instruction sequence. There is plenty
of room for improvement here.

On DG2 (Gfx12.5) gets the following results from the CTS:

Test run totals:
  Passed:        1642/13982 (11.7%)
  Failed:        0/13982 (0.0%)
  Not supported: 12340/13982 (88.3%)
  Warnings:      0/13982 (0.0%)
  Waived:        0/13982 (0.0%)

On DG2 (Gfx12.5) with forced lowering, Raptor Lake (Gfx12) and Ice Lake
(Gfx11):

Test run totals:
  Passed:        1662/13982 (11.9%)
  Failed:        0/13982 (0.0%)
  Not supported: 12320/13982 (88.1%)
  Warnings:      0/13982 (0.0%)
  Waived:        0/13982 (0.0%)

The difference in the number of tests run is due to
saturatingAccumulation not being set on DG2 when DPAS is used. There is
a comment in "intel/dev: Advertise integer configs with
saturatingAccumulation too" that explains how this could be added should
the need arise.

v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
8ea032b78e intel/dev: Advertise integer configs with saturatingAccumulation too
VUID-RuntimeSpirv-saturatingAccumulation-08983 says:

   For OpCooperativeMatrixMulAddKHR, the SaturatingAccumulation
   cooperative matrix operand must be present if and only if
   VkCooperativeMatrixPropertiesKHR::saturatingAccumulation is VK_TRUE.

As a result, we have to advertise integer configs both with and without
this flag set.

v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
f952dd510e anv: Select the SIMD mode very early when cooperative matrices are used
The commit is a little ugly. The definition of anv_fixup_subgroup_size
is moved before the added call site. In addition, the bit starting at
the "Cooperative matrix extension requires..." comment is added.

v2: Dramatic simplification of SIMD selection. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
511f91e307 anv: Lower indirect derefs again after lowering cooperative matrices
The cooperative matrix lowering can generate a lot of indirect array
accesses, and these need to be eliminated.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
b741a9a851 anv: Set PIPELINE_SELECT systolic mode enable flag
Set the flag on compute shaders when the application has enabled the
cooperative matrix feature. We might still want to enable this only when
DPAS is actually used. The current method is based on many suggestions
from Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
7bfbeb79a7 anv: Set COMPUTE_WALKER systolic mode enable flag
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
67739b02de anv: Add anv_physical_device::has_cooperative_matrix
This flag tracks whether or not cooperative matrices are fully enabled
on the physica device (i.e., both the configs exist and the environment
varible is set).  This is mainly to support a later commit "anv: Set
PIPELINE_SELECT systolic mode enable flag."

This could be squashed into "anv: Implement VK_KHR_cooperative_matrix."
I left it separate because we might go back to the previous method.

v3: Don't hide the extension behind an environment variable
(ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting
PIPELINE_SELECT.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Caio Oliveira
0a6f8b40bf anv: Implement VK_KHR_cooperative_matrix
v2: Rebase on moving lowering pass to src/intel/compiler.

v3: Don't hide the extension behind an environment variable
(ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting
PIPELINE_SELECT.

v4: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Also rebase
on f99e43d606 ("anv: switch to use runtime physical device properties
infrastructure").

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Caio Oliveira
ff16458478 intel/dev: Add cooperative matrix configuration information
v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
6b14da33ad intel/fs: nir: Add nir_intrinsic_dpas_intel
v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion.

v3: Fix float16 destination DPAS on DG2.

v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.

v5: Rebase on !26323.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:43 -08:00
Ian Romanick
3756f60558 intel/fs: DPAS lowering
Implements integer dot product lowering both with and without
DP4A. Implements half-float dot product lowering.

There are a couple FINISHME comments describing future optimizations.

v2: Add a brw_compiler::lower_dpas flag to track when the lowering
should be applied.

v3: Use is_null() instead of checking file != ARF. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:27:15 -08:00
Ian Romanick
3cb9625539 intel/fs: Fix scoreboarding for DPAS
v2: Remove all mention of DPASW. Suggested by Curro and Caio.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:27:15 -08:00
Ian Romanick
eb1f19d7bf intel/compiler: Validation for DPAS instructions
v2: s/regiser/register/g in messages. Noticed by Caio. Add more context
to the sub-byte precision error message. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:27:15 -08:00
Ian Romanick
1c92dad5cb intel/disasm: Disassembly support for DPAS
v2: Fix regioning in src[012]_dpas_3src. Noticed by Caio. Treat DPAS as
unordered. Suggested by Curro.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:27:13 -08:00
Ian Romanick
e666872c75 intel/compiler: Initial bits for DPAS instruction
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.

v3: Prevent lower_regioning from trying to "fix" DPAS sources.

v4: Add instruction latency information for scheduling and perf
estimates.

v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.

v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:16 -08:00
Ian Romanick
3a35f8b29b intel/cmat: Lower cmat_load and cmat_store
v2: Add support for non-constant stride.

v3: Explain B matrices (a little bit) in
get_slice_type_from_desc. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:16 -08:00
Ian Romanick
502be565da intel/cmat: Add lowering for cmat_bitcast
v2: Use nir_component_mask(...) instead of 0xffff. Assert that source
and destination are same size. Both suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
7303315a8b intel/cmat: Enable packed formats for scalar ops
v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_scalar
handling. This saved 13 lines of code.

v3: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
26c4acd8ee intel/cmat: Enable packed formats for binary ops
v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_binary
handling. This saved 13 lines of code.

v3: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
0d314eb3cc intel/cmat: Enable packed formats for unary, length, and construct
With this, a minimum test case passes:

    void main()
    {
        coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA;
        coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR;

        matA = coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(2.0);
        matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA);

        coopMatStore(matR, result, 0, N, gl_CooperativeMatrixLayoutRowMajor);
    }

v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in
one of the 4x8 cases.

v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify
coop_unary handling. This saved 67 lines of code.

v4: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.

v5: Massive update to the comment in lower_cooperative_matrix_unary_op
with some suggestions from Caio. Add a comment and assertion around
`nir_def *v[4]`. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
75388a71c9 intel/cmat: Add lowering for cmat_insert and cmat_extract
v2: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
a2ded5b26c intel/cmat: Update get_slice_type for packed slices
Also splits off another funciton get_slice_type_from_desc that will be
used in future commits.

v2: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.

v3: Use glsl_base_type_get_bit_size.

v4: Adjust packing so that a single row fills an entire GRF.

v5: Add comment for get_packing_factor and some other cleanups
there. s/cooperative_matrix/cmat/. Tighten the validation of len in
gt_slice_from_desc. All suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Caio Oliveira
dba6451ce8 intel/cmat: Add pass to lower cooperative matrix to subgroup operations
This is just the skeleton of the implementation. Future commits will
fill it all in.

v2: Move to src/intel/compiler

v3 (idr): Use vecN instead of array[N] for slice type.

v4 (idr): Refactor lower_cooperative_matrix_load and
lower_cooperative_matrix_store into a single function.

v5 (idr): Remove old, verbose debug logging. Assert that entry is not
NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of
0xffff. s/cooperative_matrix/cmat/. All suggested by Caio.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>

I put both R-b on this because, at this point, we've each done equal
parts authoring and reviewing.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Danylo Piliaiev
61c9cf9890 freedreno: Add a644 support
The GPU is same as a660 but for SP_DBG_ECO_CNTL register value.
Checked by comparing cmd streams between them.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10366

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26836>
2023-12-29 09:44:07 +00:00
Nanley Chery
ee524e198b iris: Fix lowered images in get_main_plane_for_plane
This function was recently simplified based on the idea that if a
modifier is not present, then the plane count should not exceed the
plane count of the resource's external format. This seems to be true
except for lowered images. We don't enable compression modifiers on
lowered images, so this case was not handled during the transition.

As an example of the lowering that may occur: PIPE_FORMAT_YVYU is a
single plane, subsampled format that the gallium layer lowers to two
planes/formats (R8G8_UNORM and B8G8R8A8_UNORM) if not natively supported
by the hardware.

Fixes the assert failure when running the piglit test case:

   ext_image_dma_buf_import-sample_yuv -fmt=YVYU -auto

   ext_image_dma_buf_import-sample_yuv:
      ../../src/gallium/drivers/iris/iris_resource.c:1384:
   iris_resource_from_handle:
      Assertion `main_res->aux.surf.row_pitch_B ==
                 plane_res->surf.row_pitch_B' failed.

Also, replaces it with a new one in case this fails again:

   ext_image_dma_buf_import-sample_yuv:
      ../../src/gallium/drivers/iris/iris_resource.c:1381:
   iris_resource_from_handle:
      Assertion `isl_drm_modifier_has_aux(whandle->modifier)' failed.

Fixes: 79222e5884 ("iris: Simplify get_main_plane_for_plane")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26826>
2023-12-29 09:08:57 +00:00
Nanley Chery
94e5b5d049 isl: Handle MOD_INVALID in clear color plane check
In iris, if whandle->modifier is DRM_FORMAT_MOD_INVALID within
iris_resource_from_handle, isl_drm_modifier_plane_is_clear_color will
assert fail on non-existent modifier info. Update that function to
return early instead.

Fixes the assert failure when running the piglit test case:

   ext_image_dma_buf_import-sample_yuv -fmt=YVYU -auto

   ext_image_dma_buf_import-sample_yuv: ../../src/intel/isl/isl.h:2352:
      isl_drm_modifier_plane_is_clear_color: Assertion `mod_info' failed.

Fixes: 81d132d5ea ("iris: Use helpers for generic aux plane importing")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26826>
2023-12-29 09:08:57 +00:00
Rohan Garg
5fff6eac42 intel/compiler: Update disassembly for new LSC cache enums
Rework:
* Caio: Add remaining enum values.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26837>
2023-12-28 15:13:24 -08:00
Francisco Jerez
b91fa057ab intel/compiler/xe2: Don't disassemble non-existent fields.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26837>
2023-12-28 15:13:24 -08:00
Jordan Justen
6495fe3d37 intel/xe2+: Implement brw_wm_state_simd_width_for_ksp() on Xe2+.
The mechanism for selecting dispatch modes has changed from previous
platforms, add a new implementation brw_wm_state_simd_width_for_ksp()
using the new kernel dispatch controls.

[ Francisco Jerez: Split from a larger patch, handle multipolygon
                   dispatch, add additional comments. ]

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 14:12:59 -08:00
Francisco Jerez
d8ad51ec76 intel/xe2+: Implement fragment shader dispatch state setup.
This sets up the PS dispatch controls to a supported combination of
Kernel0/Kernel1 dispatch modes, initializing the polygon packing
controls to use a multipolygon dispatch mode if one was provided.

Rework:
 * Jordan: Move into intel_update_ps_state()

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 14:12:59 -08:00
Francisco Jerez
ab0eff4388 intel/fs/xe2+: Attempt to build quad-SIMD8 and dual-SIMD16 FS variants on Xe2+ platforms.
Extend the pre-existing dual-SIMD8 compilation path in
brw_compile_fs() to attempt quad-SIMD8 and dual-SIMD16 compiles.
Instead of building every possible dispatch mode and then picking one
based on cycle-count heuristics, this attempts to only build a single
multipolygon kernel -- The different mulipolygon dispatch modes are
tried in the expected order of decreasing performance (quad-SIMD8,
dual-SIMD16 then dual-SIMD8), the first one that successfully compiles
without spills is taken as a simple heuristic, and no further
multipolygon builds are attempted.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 14:12:59 -08:00
Francisco Jerez
8cd8d6bccc intel: Add debug flags for enabling Xe2+ multipolygon fragment shader dispatch modes.
Note that the multipolygon PS disptach modes supported by Xe2 aren't
enabled by default yet, but they can be enabled manually via
INTEL_SIMD_DEBUG=fs2x8,fs4x8,fs2x16.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 14:12:59 -08:00
Francisco Jerez
50d084ec29 intel/fs/xe2+: Lower SIMD width of instructions that access ATTR file from SIMD2x8/4x8 FS.
This is needed because the information stored on the ATTR file for
multipolygon fragment shaders isn't stored as a contiguous sequence in
the GRF, instead the ATTR source may be lowered by assign_urb_setup()
to use a <16;8,0> region, which reads 4 SIMD16 GRFs for a SIMD32
instruction, even though the result of fs_inst::size_read() is
expected to be 2 GRFs.  Special case ATTR sources for multipolygon PS
shaders to calculate the number of physical GRFs that will actually be
read by the instruction after lowering, based on the number of
polygons processed by the instruction.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 14:12:59 -08:00
Francisco Jerez
0d332d0c49 intel/fs: Plumb shader instead of compiler to get_lowered_simd_width() and friends.
This will allow making lowering decisions based on properties of the
shader, like the multipolygon dispatch mode used.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 14:12:59 -08:00
Francisco Jerez
bd634bef12 intel/fs/xe2+: Implement layout of mesh shading per-primitive inputs in PS thread payloads.
This is based on a previous patch by Marcin Ślusarz addressing the
same issue, though it's largely rewritten, simplified and includes
additional fixes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 14:12:59 -08:00
Francisco Jerez
4cebfaadf7 intel/fs/xe2+: Implement support for multi-polygon vertex setup data in PS payload.
This fixes a number of assumptions made by the multipolygon input
attribute handling code from assign_urb_setup() so it also works on
Xe2+, which has additional multipolygon dispatch modes (like SIMD4x8
and SIMD2x16) and uses a different more compact representation of the
plane parameters.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 14:12:59 -08:00
Francisco Jerez
702eabaaae intel/fs/xe2+: Update for new layout of vertex setup data in PS payload.
The interpolation deltas of PS inputs now show up as a 12B vec3 (A0,
A1-A0, A2-A0) in the ATTR file, instead of the previously used 16B
format with an unused component.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 11:07:03 -08:00
Francisco Jerez
d622e19f00 intel/fs/xe2+: Enable new format of barycentrics in PS payload.
The X and Y barycentric vectors are no longer interleaved in SIMD8
chunks (yay), so this is mostly a matter of disabling the
lower_barycentrics() pass and switching to a simpler implementation of
fetch_barycentric_reg() that simply calls fetch_payload_reg() instead
of the SIMD8 shuffling we had to do in previous generations.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 11:07:03 -08:00