fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 13:30:12 +01:00

Author	SHA1	Message	Date
Dave Airlie	8f73cc802c	intel/compiler: revert part of "Move earlier scheduler code that is not mode-specific" This removed a bunch of calls from the vec4 code that aren't called anywhere else. Bring back the bits that were removed. Fixes glxgears on gen5 Fixes: `81594d0db1` ("intel/compiler: Move earlier scheduler code that is not mode-specific") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26862>	2024-01-04 00:38:38 +00:00
Dave Airlie	37366fef68	intel/compiler: fix release build unused variable. This is only used in an assert. Fixes: `158ac265df` ("intel/fs: Make helpers for saving/restoring instruction order") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26863>	2024-01-03 23:52:11 +00:00
Daniel Schürmann	a3ed36da1a	treewide: replace calls to nir_opt_trivial_continues() with nir_opt_loop() Totals from 850 (1.11% of 76636) affected shaders: (RADV, GFX11) MaxWaves: 18134 -> 18130 (-0.02%) Instrs: 3011298 -> 3008585 (-0.09%); split: -0.17%, +0.08% CodeSize: 15836804 -> 15841972 (+0.03%); split: -0.09%, +0.12% VGPRs: 63580 -> 63604 (+0.04%) SpillSGPRs: 966 -> 1148 (+18.84%); split: -0.83%, +19.67% Latency: 36102291 -> 30186144 (-16.39%); split: -16.41%, +0.02% InvThroughput: 9058100 -> 7011821 (-22.59%); split: -22.61%, +0.02% VClause: 65369 -> 65364 (-0.01%); split: -0.03%, +0.02% SClause: 100309 -> 100305 (-0.00%); split: -0.04%, +0.04% Copies: 335658 -> 336472 (+0.24%); split: -0.70%, +0.94% Branches: 110806 -> 108945 (-1.68%); split: -1.94%, +0.26% PreSGPRs: 73476 -> 73934 (+0.62%); split: -0.25%, +0.87% PreVGPRs: 58809 -> 58840 (+0.05%); split: -0.01%, +0.06% Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24940>	2024-01-03 20:48:04 +00:00
Yonggang Luo	8665ce27bc	intel: Use ALIGN_POT instead of ALIGN inside macro define These macro define is compute from literals, so use ALIGN_POT instead of ALIGN function so that it's can be computed at compile time Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26864>	2024-01-03 12:46:10 +00:00
Mark Janes	188c349e51	intel: remove workaround for preproduction DG2 steppings DG2_G10 was released with stepping C0. DG2_G11 was released with stepping B1. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26845>	2024-01-02 16:06:37 -08:00
Ian Romanick	2e75d71c1f	intel/cmat: Generate better code for nir_intrinsic_cmat_insert When the source destination index is a constant, we can avoid generating a lot of the intermediate code. At the very least, this makes initial NIR dumps much easier to read. v2: Simplify tracking of dst_index. Suggested by Caio. Suggested-by: Caio Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	7bfbeb79a7	anv: Set COMPUTE_WALKER systolic mode enable flag Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	6b14da33ad	intel/fs: nir: Add nir_intrinsic_dpas_intel v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion. v3: Fix float16 destination DPAS on DG2. v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. v5: Rebase on !26323. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:43 -08:00
Ian Romanick	3756f60558	intel/fs: DPAS lowering Implements integer dot product lowering both with and without DP4A. Implements half-float dot product lowering. There are a couple FINISHME comments describing future optimizations. v2: Add a brw_compiler::lower_dpas flag to track when the lowering should be applied. v3: Use is_null() instead of checking file != ARF. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:15 -08:00
Ian Romanick	3cb9625539	intel/fs: Fix scoreboarding for DPAS v2: Remove all mention of DPASW. Suggested by Curro and Caio. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:15 -08:00
Ian Romanick	eb1f19d7bf	intel/compiler: Validation for DPAS instructions v2: s/regiser/register/g in messages. Noticed by Caio. Add more context to the sub-byte precision error message. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:15 -08:00
Ian Romanick	1c92dad5cb	intel/disasm: Disassembly support for DPAS v2: Fix regioning in src[012]_dpas_3src. Noticed by Caio. Treat DPAS as unordered. Suggested by Curro. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:27:13 -08:00
Ian Romanick	e666872c75	intel/compiler: Initial bits for DPAS instruction v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix overlapping register allocation (via has_source_and_destination_hazard). Fix incorrect destination register file encoding. v3: Prevent lower_regioning from trying to "fix" DPAS sources. v4: Add instruction latency information for scheduling and perf estimates. v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update the comment in fs_inst::has_source_and_destination_hazard. Suggested by Caio. v6: Add some comments near the src2 calculation in fs_inst::size_read. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:16 -08:00
Ian Romanick	3a35f8b29b	intel/cmat: Lower cmat_load and cmat_store v2: Add support for non-constant stride. v3: Explain B matrices (a little bit) in get_slice_type_from_desc. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:16 -08:00
Ian Romanick	502be565da	intel/cmat: Add lowering for cmat_bitcast v2: Use nir_component_mask(...) instead of 0xffff. Assert that source and destination are same size. Both suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	7303315a8b	intel/cmat: Enable packed formats for scalar ops v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_scalar handling. This saved 13 lines of code. v3: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	26c4acd8ee	intel/cmat: Enable packed formats for binary ops v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_binary handling. This saved 13 lines of code. v3: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	0d314eb3cc	intel/cmat: Enable packed formats for unary, length, and construct With this, a minimum test case passes: void main() { coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA; coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR; matA = coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(2.0); matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA); coopMatStore(matR, result, 0, N, gl_CooperativeMatrixLayoutRowMajor); } v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in one of the 4x8 cases. v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify coop_unary handling. This saved 67 lines of code. v4: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. v5: Massive update to the comment in lower_cooperative_matrix_unary_op with some suggestions from Caio. Add a comment and assertion around `nir_def *v[4]`. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	75388a71c9	intel/cmat: Add lowering for cmat_insert and cmat_extract v2: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	a2ded5b26c	intel/cmat: Update get_slice_type for packed slices Also splits off another funciton get_slice_type_from_desc that will be used in future commits. v2: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. v3: Use glsl_base_type_get_bit_size. v4: Adjust packing so that a single row fills an entire GRF. v5: Add comment for get_packing_factor and some other cleanups there. s/cooperative_matrix/cmat/. Tighten the validation of len in gt_slice_from_desc. All suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Caio Oliveira	dba6451ce8	intel/cmat: Add pass to lower cooperative matrix to subgroup operations This is just the skeleton of the implementation. Future commits will fill it all in. v2: Move to src/intel/compiler v3 (idr): Use vecN instead of array[N] for slice type. v4 (idr): Refactor lower_cooperative_matrix_load and lower_cooperative_matrix_store into a single function. v5 (idr): Remove old, verbose debug logging. Assert that entry is not NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of 0xffff. s/cooperative_matrix/cmat/. All suggested by Caio. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> I put both R-b on this because, at this point, we've each done equal parts authoring and reviewing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Rohan Garg	5fff6eac42	intel/compiler: Update disassembly for new LSC cache enums Rework: * Caio: Add remaining enum values. Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26837>	2023-12-28 15:13:24 -08:00
Francisco Jerez	b91fa057ab	intel/compiler/xe2: Don't disassemble non-existent fields. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26837>	2023-12-28 15:13:24 -08:00
Jordan Justen	6495fe3d37	intel/xe2+: Implement brw_wm_state_simd_width_for_ksp() on Xe2+. The mechanism for selecting dispatch modes has changed from previous platforms, add a new implementation brw_wm_state_simd_width_for_ksp() using the new kernel dispatch controls. [ Francisco Jerez: Split from a larger patch, handle multipolygon dispatch, add additional comments. ] Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	ab0eff4388	intel/fs/xe2+: Attempt to build quad-SIMD8 and dual-SIMD16 FS variants on Xe2+ platforms. Extend the pre-existing dual-SIMD8 compilation path in brw_compile_fs() to attempt quad-SIMD8 and dual-SIMD16 compiles. Instead of building every possible dispatch mode and then picking one based on cycle-count heuristics, this attempts to only build a single multipolygon kernel -- The different mulipolygon dispatch modes are tried in the expected order of decreasing performance (quad-SIMD8, dual-SIMD16 then dual-SIMD8), the first one that successfully compiles without spills is taken as a simple heuristic, and no further multipolygon builds are attempted. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	50d084ec29	intel/fs/xe2+: Lower SIMD width of instructions that access ATTR file from SIMD2x8/4x8 FS. This is needed because the information stored on the ATTR file for multipolygon fragment shaders isn't stored as a contiguous sequence in the GRF, instead the ATTR source may be lowered by assign_urb_setup() to use a <16;8,0> region, which reads 4 SIMD16 GRFs for a SIMD32 instruction, even though the result of fs_inst::size_read() is expected to be 2 GRFs. Special case ATTR sources for multipolygon PS shaders to calculate the number of physical GRFs that will actually be read by the instruction after lowering, based on the number of polygons processed by the instruction. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	0d332d0c49	intel/fs: Plumb shader instead of compiler to get_lowered_simd_width() and friends. This will allow making lowering decisions based on properties of the shader, like the multipolygon dispatch mode used. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	bd634bef12	intel/fs/xe2+: Implement layout of mesh shading per-primitive inputs in PS thread payloads. This is based on a previous patch by Marcin Ślusarz addressing the same issue, though it's largely rewritten, simplified and includes additional fixes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	4cebfaadf7	intel/fs/xe2+: Implement support for multi-polygon vertex setup data in PS payload. This fixes a number of assumptions made by the multipolygon input attribute handling code from assign_urb_setup() so it also works on Xe2+, which has additional multipolygon dispatch modes (like SIMD4x8 and SIMD2x16) and uses a different more compact representation of the plane parameters. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 14:12:59 -08:00
Francisco Jerez	702eabaaae	intel/fs/xe2+: Update for new layout of vertex setup data in PS payload. The interpolation deltas of PS inputs now show up as a 12B vec3 (A0, A1-A0, A2-A0) in the ATTR file, instead of the previously used 16B format with an unused component. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	d622e19f00	intel/fs/xe2+: Enable new format of barycentrics in PS payload. The X and Y barycentric vectors are no longer interleaved in SIMD8 chunks (yay), so this is mostly a matter of disabling the lower_barycentrics() pass and switching to a simpler implementation of fetch_barycentric_reg() that simply calls fetch_payload_reg() instead of the SIMD8 shuffling we had to do in previous generations. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	49a867f67e	intel/fs: Add support for vector payload values to fetch_payload_reg(). This extends fetch_payload_reg() to support fetching vector registers like barycentrics stored on the payload as a contiguous sequence of SIMD-wide vectors. In the SIMD32 case, both halves of the SIMD16 vector registers specified as regs[0] and regs[1] are zipped to construct a single SIMD32-wide vector. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	f295494cee	intel/fs/xe2+: Update poly info PS payload for new multi-polygon dispatch format. This includes the render target array index, viewport index, and front/back facing fields, which are now replicated per pair of subspans in order to support fixed-layout multi-polygon PS dispatch. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	4cc9c37bba	intel/fs/xe2+: Update location of sample ID fields in PS payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	a0ae3c0dba	intel/fs/xe2+: Update uses of pixel/sample mask from PS thread payload. Note from Caio: proper handling of brw_sample_mask_reg will appear in later patches. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	6dae56cc57	intel/fs/xe2+: Fix for new layout of X/Y pixel coordinates in PS payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	ef6ef7aa8e	intel/fs/xe2+: Implement PS thread payload register offset setup. The PS thread payload format has changed enough in Xe2 that it probably doesn't make sense to share code with gfx6. See BSpec page "PS Thread Payload for Normal Dispatch - 512 bit GRF" for the new format. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Francisco Jerez	24e8709d8b	intel/eu/xe2+: Add helpers for constructing registers in 512b units. These are new variants of the existing brw_reg GRF constructors that take registers numbers in the new 512b units. Mainly useful for thread payload setup code to use register numbers in a format that matches the BSpec. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>	2023-12-28 11:07:03 -08:00
Rohan Garg	3e46ee61d5	intel/fs/xe2+: Lift CPS dispatch width restrictions on Xe2+. These restrictions don't seem to be applicable anymore, and limiting to SIMD8 wouldn't work since we're no longer building shaders with that dispatch width. [ Francisco: This one-liner change was squashed by Rohan Garg into a previous version of my patch "Stop building SIMD8 programs", but it makes more sense as a separate commit -- Formatted as a separate patch. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>	2023-12-22 10:37:00 -08:00
Ian Romanick	84b53e1a54	intel/fs/xe2+: Pass correct dispatch_width to fs_generator for geometry-processing stages. Instead of hard-coding a dispatch_width value which is no longer correct on Xe2+. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>	2023-12-22 10:37:00 -08:00
Francisco Jerez	3f92dde55e	intel/fs/xe2+: Stop building SIMD8 shaders for geometry stages (VS/TCS/TES/GS). They are no longer suppored by the fixed-function hardware. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>	2023-12-22 10:37:00 -08:00
Francisco Jerez	6877916155	intel/fs/xe2+: Stop building SIMD8 fragment shaders. They are no longer suppored by the fixed-function hardware. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>	2023-12-22 10:37:00 -08:00
Francisco Jerez	7397ba61c2	intel/fs/xe2+: Stop building SIMD8 compute-like shaders (CS/BS/TS/MS). SIMD8 kernels are no longer able to utilize the ALUs efficiently, since they have twice the vector width as previous platforms. However even though there aren't many reasons to use it, SIMD8 is still supported by the instruction set technically, and it will still be used for some SIMD-lowering sequences. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>	2023-12-22 10:37:00 -08:00
Francisco Jerez	1f2c44dc21	intel/compiler: Attempt to build dual-SIMD8 variant of fragment shaders on gfx12+ platforms. Similar to other FS dispatch modes, attempt to build a dual-SIMD8 program if the regular SIMD8 program didn't spill and doubling the amount of space for varyings doesn't cause us to go over the thread payload limit. Dual-SIMD8 builds in combination with coarse pixel shading are currently not handled. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Francisco Jerez	28aec45eed	intel/fs/gfx12: Implement multi-polygon format of render target array index in PS payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Francisco Jerez	5b1ab77423	intel/fs/gfx12: Implement multi-polygon format of back/front-facing flag in PS payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Francisco Jerez	4672fcbc76	intel/fs: Fix PS thread payload setup for depth_w_coef_reg. It's not replicated per SIMD16 half of a SIMD32 thread on the PS payload. Make fs_visitor::payload::depth_w_coef_reg a scalar rather than an array. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Francisco Jerez	09ea840987	intel/fs: No need to copy null destinations in lower_simd_width. The copy would be discarded immediately. Until now we were relying on DCE to eliminate these, but it seems like in some cases MOVs into the null register emitted by lower_simd_width() are never eliminated, likely because a lower_simd_width() call has been introduced close to the bottom of optimize() which isn't follow by any additional DCE passes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Francisco Jerez	5e0760a993	intel/fs/gfx12: Don't consider multipolygon PS to have packed dispatch. This fixes a number of regressions and hangs in multipolygon fragment shaders that have FIND_LIVE_CHANNEL sequences which would otherwise lead to access of a dead channel. Note that the failures don't seem to be reproducible in simulation. Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00
Francisco Jerez	8f92baa5d3	intel/fs/gfx12+: Don't set nir_divergence_single_prim_per_subgroup option for fragment shaders. Flat-shaded inputs and other per-primitive values can no longer be considered to be uniform across fragment shader subgroups due to multipolygon dispatch. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>	2023-12-22 18:05:31 +00:00

1 2 3 4 5 ...

2967 commits