v2: Add support for non-constant stride.
v3: Explain B matrices (a little bit) in
get_slice_type_from_desc. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
v2: Use nir_component_mask(...) instead of 0xffff. Assert that source
and destination are same size. Both suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_scalar
handling. This saved 13 lines of code.
v3: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_binary
handling. This saved 13 lines of code.
v3: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
With this, a minimum test case passes:
void main()
{
coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA;
coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR;
matA = coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(2.0);
matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA);
coopMatStore(matR, result, 0, N, gl_CooperativeMatrixLayoutRowMajor);
}
v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in
one of the 4x8 cases.
v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify
coop_unary handling. This saved 67 lines of code.
v4: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
v5: Massive update to the comment in lower_cooperative_matrix_unary_op
with some suggestions from Caio. Add a comment and assertion around
`nir_def *v[4]`. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
Also splits off another funciton get_slice_type_from_desc that will be
used in future commits.
v2: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
v3: Use glsl_base_type_get_bit_size.
v4: Adjust packing so that a single row fills an entire GRF.
v5: Add comment for get_packing_factor and some other cleanups
there. s/cooperative_matrix/cmat/. Tighten the validation of len in
gt_slice_from_desc. All suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
This is just the skeleton of the implementation. Future commits will
fill it all in.
v2: Move to src/intel/compiler
v3 (idr): Use vecN instead of array[N] for slice type.
v4 (idr): Refactor lower_cooperative_matrix_load and
lower_cooperative_matrix_store into a single function.
v5 (idr): Remove old, verbose debug logging. Assert that entry is not
NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of
0xffff. s/cooperative_matrix/cmat/. All suggested by Caio.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
I put both R-b on this because, at this point, we've each done equal
parts authoring and reviewing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
This function was recently simplified based on the idea that if a
modifier is not present, then the plane count should not exceed the
plane count of the resource's external format. This seems to be true
except for lowered images. We don't enable compression modifiers on
lowered images, so this case was not handled during the transition.
As an example of the lowering that may occur: PIPE_FORMAT_YVYU is a
single plane, subsampled format that the gallium layer lowers to two
planes/formats (R8G8_UNORM and B8G8R8A8_UNORM) if not natively supported
by the hardware.
Fixes the assert failure when running the piglit test case:
ext_image_dma_buf_import-sample_yuv -fmt=YVYU -auto
ext_image_dma_buf_import-sample_yuv:
../../src/gallium/drivers/iris/iris_resource.c:1384:
iris_resource_from_handle:
Assertion `main_res->aux.surf.row_pitch_B ==
plane_res->surf.row_pitch_B' failed.
Also, replaces it with a new one in case this fails again:
ext_image_dma_buf_import-sample_yuv:
../../src/gallium/drivers/iris/iris_resource.c:1381:
iris_resource_from_handle:
Assertion `isl_drm_modifier_has_aux(whandle->modifier)' failed.
Fixes: 79222e5884 ("iris: Simplify get_main_plane_for_plane")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26826>
In iris, if whandle->modifier is DRM_FORMAT_MOD_INVALID within
iris_resource_from_handle, isl_drm_modifier_plane_is_clear_color will
assert fail on non-existent modifier info. Update that function to
return early instead.
Fixes the assert failure when running the piglit test case:
ext_image_dma_buf_import-sample_yuv -fmt=YVYU -auto
ext_image_dma_buf_import-sample_yuv: ../../src/intel/isl/isl.h:2352:
isl_drm_modifier_plane_is_clear_color: Assertion `mod_info' failed.
Fixes: 81d132d5ea ("iris: Use helpers for generic aux plane importing")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26826>
The mechanism for selecting dispatch modes has changed from previous
platforms, add a new implementation brw_wm_state_simd_width_for_ksp()
using the new kernel dispatch controls.
[ Francisco Jerez: Split from a larger patch, handle multipolygon
dispatch, add additional comments. ]
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This sets up the PS dispatch controls to a supported combination of
Kernel0/Kernel1 dispatch modes, initializing the polygon packing
controls to use a multipolygon dispatch mode if one was provided.
Rework:
* Jordan: Move into intel_update_ps_state()
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
Extend the pre-existing dual-SIMD8 compilation path in
brw_compile_fs() to attempt quad-SIMD8 and dual-SIMD16 compiles.
Instead of building every possible dispatch mode and then picking one
based on cycle-count heuristics, this attempts to only build a single
multipolygon kernel -- The different mulipolygon dispatch modes are
tried in the expected order of decreasing performance (quad-SIMD8,
dual-SIMD16 then dual-SIMD8), the first one that successfully compiles
without spills is taken as a simple heuristic, and no further
multipolygon builds are attempted.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
Note that the multipolygon PS disptach modes supported by Xe2 aren't
enabled by default yet, but they can be enabled manually via
INTEL_SIMD_DEBUG=fs2x8,fs4x8,fs2x16.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This is needed because the information stored on the ATTR file for
multipolygon fragment shaders isn't stored as a contiguous sequence in
the GRF, instead the ATTR source may be lowered by assign_urb_setup()
to use a <16;8,0> region, which reads 4 SIMD16 GRFs for a SIMD32
instruction, even though the result of fs_inst::size_read() is
expected to be 2 GRFs. Special case ATTR sources for multipolygon PS
shaders to calculate the number of physical GRFs that will actually be
read by the instruction after lowering, based on the number of
polygons processed by the instruction.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This is based on a previous patch by Marcin Ślusarz addressing the
same issue, though it's largely rewritten, simplified and includes
additional fixes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This fixes a number of assumptions made by the multipolygon input
attribute handling code from assign_urb_setup() so it also works on
Xe2+, which has additional multipolygon dispatch modes (like SIMD4x8
and SIMD2x16) and uses a different more compact representation of the
plane parameters.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
The interpolation deltas of PS inputs now show up as a 12B vec3 (A0,
A1-A0, A2-A0) in the ATTR file, instead of the previously used 16B
format with an unused component.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
The X and Y barycentric vectors are no longer interleaved in SIMD8
chunks (yay), so this is mostly a matter of disabling the
lower_barycentrics() pass and switching to a simpler implementation of
fetch_barycentric_reg() that simply calls fetch_payload_reg() instead
of the SIMD8 shuffling we had to do in previous generations.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This extends fetch_payload_reg() to support fetching vector registers
like barycentrics stored on the payload as a contiguous sequence of
SIMD-wide vectors. In the SIMD32 case, both halves of the SIMD16
vector registers specified as regs[0] and regs[1] are zipped to
construct a single SIMD32-wide vector.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This includes the render target array index, viewport index, and
front/back facing fields, which are now replicated per pair of
subspans in order to support fixed-layout multi-polygon PS dispatch.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
Note from Caio: proper handling of brw_sample_mask_reg
will appear in later patches.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
The PS thread payload format has changed enough in Xe2 that it
probably doesn't make sense to share code with gfx6. See BSpec page
"PS Thread Payload for Normal Dispatch - 512 bit GRF" for the new
format.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
These are new variants of the existing brw_reg GRF constructors that
take registers numbers in the new 512b units. Mainly useful for
thread payload setup code to use register numbers in a format that
matches the BSpec.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
We only need it for indirect draws.
Improves performance on an i7-12700 and A770:
- Piglit's drawoverhead base case +150.639% +/- 2.86933% (n=15).
- gfxbench5 gl_driver2_off +19.7219% +/- 1.13778% (n=15)
- SPECviewperf2020 catiav5test1 +1.6831% +/- 0.552052% (n=10).
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26806>
Whenever we use a BO in a batch, we need to find its corresponding exec
list entry, either to a) record that it's been used, b) update whether
it's being written, c) check for cross-batch implicit dependencies.
bo->index exists to accelerate these lookups. If a BO is used multiple
times by a batch, bo->index is its location in the list. Because the
field is global, and a BO can in theory be used concurrently by multiple
contexts, we need to double-check whether it's still there. If not, we
fall back to a linear search of all BOs in the list, looking to see if
our index was simply wrong (but presumably right for another context).
However, there's one glaringly obvious case that we missed here. If
bo->index is -1, then it's wrong for /all/ contexts, and in fact implies
that said BO has never been added to any exec list, ever. This is quite
common in fact: a new BO, never been used before, say from the BO cache,
or streaming uploaders, gets used for the first time.
In this case we can simply conclude that it's not in the list and skip
the linear walk through all buffers referenced by the batch.
Improves performance on an i7-12700 and A770:
- SPECviewperf2020 catiav5test1: 72.9214% +/- 0.312735% (n=45)
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26806>
A value of -1 means that the buffer has never been used in an execbuf
buffer list in any of our contexts. While setting this isn't critical,
doing so will allow us to short-circuit some looping in the next patch.
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26806>