fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-04-30 23:28:06 +02:00

Author	SHA1	Message	Date
Konstantin Seurer	da95f64a6f	lavapipe: Store immutable_samplers as lvp_sampler array We will need this to access the ycbcr conversion. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24295>	2023-07-25 08:22:27 +00:00
Konstantin Seurer	7dc6c4b581	lavapipe: Remove dummy sampler ycbcr conversion Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24295>	2023-07-25 08:22:27 +00:00
Konstantin Seurer	dbbd84ce8b	gallivm: Ignore nir_tex_src_plane Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24295>	2023-07-25 08:22:27 +00:00
Konstantin Seurer	c7914a84e9	gallivm: Fix subsampled format sampling under Vulkan Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24295>	2023-07-25 08:22:27 +00:00
Konstantin Seurer	1280cf5b2a	draw: Do not restart the primitive_id at 0 Otherwise the primitive_id will wrap around to 0 if more than 4096 patches are drawn. cc: mesa-stable Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24295>	2023-07-25 08:22:27 +00:00
Samuel Pitoiset	df98dca7ad	radv: pass submit info to radv_check_gpu_hangs() This will allow to dump preambles/postambles CS and eventually even more CS. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24191>	2023-07-25 06:50:33 +00:00
Samuel Pitoiset	9c95a74e5e	radv/amdgpu: rename old_ib to ib in radv_amdgpu_winsys_cs_dump() Forgot this variable when I renamed the ib_buffers array. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24191>	2023-07-25 06:50:33 +00:00
Samuel Pitoiset	7eb1105829	radv/amdgpu: fix dumping CS with the chained IBs path ib_buffer is now NULL in both paths, and the first IB is the beginning of the chain. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24191>	2023-07-25 06:50:33 +00:00
Samuel Pitoiset	7f173d1ff3	radv: use next_stage for determining the stage to lower NGG If the next stage is FS, it's also the last VGT API stage. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24273>	2023-07-25 06:31:08 +00:00
Samuel Pitoiset	340f74e468	radv: simplify getting next VS stage for VS prologs It's the VS shader info stage. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24273>	2023-07-25 06:31:08 +00:00
Samuel Pitoiset	ca520c49f5	radv: determine as_ls earlier by using the next stage Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24273>	2023-07-25 06:31:08 +00:00
Samuel Pitoiset	f68316d78b	radv: determine ES info for VS/TES with GS earlier By using the next stage, it's possible to compute these information earlier without having to link shaders info. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24273>	2023-07-25 06:31:08 +00:00
Samuel Pitoiset	4098e47ab6	radv: use the number of GS linked inputs to compute the ESGS itemsize It's similar. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24273>	2023-07-25 06:31:08 +00:00
Samuel Pitoiset	7c2d38f4d1	radv: add a helper to compute the ESGS itemsize Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24273>	2023-07-25 06:31:08 +00:00
Samuel Pitoiset	54ab7b24a2	radv: remove the pipeline dependency for creating a GS copy shader This is unnecessary. While we are at it, stop passing the array of shaders and use the GS stage only. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24273>	2023-07-25 06:31:08 +00:00
Jianxun Zhang	75452f611e	intel/common: Only set op mask on instructions in decoder When a default value of a struct's field, which is in the higher half of the first dword, is specified in a gen xml file, setting op mask makes decoder treat the field as a header (intel_field_is_header()). As a result, it won't output the field in batch dump. This is not a common case but can happen once a gen xml file includes such fields. The op mask is only meaningful to instructions, so we fix the above issue by not setting op mask of structs (also registers). Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24268>	2023-07-24 22:56:59 +00:00
Nanley Chery	046bba0be0	iris: Handle clear color compatibility in prepare_render Before this patch, iris_resource_render_aux_usage would disable compression when the clear color did not support format reinterpretation. With this patch, iris now replaces the clear color with zero and keeps compression enabled. Disabling fast clears would be enough for most aux usages, but replacement is also done to handle ISL_AUX_USAGE_FCV_CCS_E. Note that this also fixes a bug. Format reinterpretation with incompatible clear colors previously was not handled for the MCS aux usages. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>	2023-07-24 22:29:01 +00:00
Nanley Chery	1aa4e6bac0	iris: Create BLORP surfaces after resource preparation iris_resource_prepare_render will soon gain the ability to change a resource's clear color. iris_blorp_surf_for_resource will keep a copy of that clear color, so make sure calls to it happen after the render preparation helper. At the moment, this shouldn't have an impact besides improving debugging. While we're here, do the same for the generic access preparation helper. We may convert those to more specific helpers at a later time. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>	2023-07-24 22:29:01 +00:00
Nanley Chery	215b50ace1	iris: Pass the render format to prepare_render This will be used in an upcoming patch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>	2023-07-24 22:29:01 +00:00
Nanley Chery	c59ba8ac07	iris: Reorder render_aux_usage parameters Match the order of the parameters for iris_resource_texture_aux_usage. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>	2023-07-24 22:29:01 +00:00
Nanley Chery	1d12b29b3f	intel/blorp: Ambiguate after CCS resolves on gfx7-8 ISL's state-machine of CCS_D describes full resolves as leaving the aux buffer in the pass-through state. Hardware doesn't behave this way on gfx8 however. On that platform, full resolves transition the aux buffer to the resolved state. This was verified by dumping the CCS before and after a full resolve on BDW (gfx7 is simply assumed to behave the same). Ambiguate after resolving to match driver expectations. Prevents iris from failing piglit's fcc-write-after-clear on BDW with a future patch which relies on fast-clear encodings being removed after a resolve. The avoided failure is: Testing implicit read of partial block UNORM -> SNORM Probe color at (0,1,0) Expected: 1.000000 1.000000 1.000000 1.000000 Observed: 0.000000 0.000000 0.000000 0.000000 Cc: mesa-stable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>	2023-07-24 22:29:01 +00:00
Lionel Landwerlin	8cbf730145	intel/fs: don't try to rebuild sequences of non ssa values Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `04777171e0` ("intel/fs: try to rematerialize surface computation code") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9378 Reviewed-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24228>	2023-07-24 20:04:24 +00:00
Caio Oliveira	2f3230a736	meson: Ensure that LLVMSPIRVLib is not required for Clover Fixes: `cb588d5d6e` ("compiler/clc: Move related NIR passes to the common mesa clc") Closes: #9391 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24244>	2023-07-24 18:21:11 +00:00
Emma Anholt	61ec26db26	ci/tgl: Improve the info for ANGLE's MSAA regression on TGL. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24200>	2023-07-24 16:07:28 +00:00
Emma Anholt	3ef07e6c44	ci: Uprev ANGLE to 0518a3ff4d4e ("Android: Simplify power metrics collection") There have been some fixes for our drivers that we'd like to bring in. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24200>	2023-07-24 16:07:28 +00:00
Emma Anholt	48b725279e	ci/radv: Clarify when the ANGLE GS failures started happening. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24200>	2023-07-24 16:07:28 +00:00
Faith Ekstrand	079e8a9674	anv,hasvk,iris: sampler_prog_key::swizzles is only used on crocus The field is no longer consumed by brw_complie_* and is instead handled directly by the crocus driver. Therefore, it's safe to leave it zero and not even bother setting it. This removes our reliance on the SWIZZLE_* macros in prog_instructions.h. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24288>	2023-07-24 15:40:40 +00:00
Christian Gmeiner	1e29b3cee8	etnaviv: nir: convert to new-style NIR registers The initial plan was to use 'nir_legacy' helpers but it turns out that our RA pass is hard to confince to be happy with it. So we are useing the 'chasing' helpers now. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
Christian Gmeiner	935730a563	etnaviv: nir: switch to etna_nir_lower_to_source_mods(..) nir's source modifiers are going away soon and with it also the lowering pass. Lets switch to our own lowering pass. We need to run our own lowering pass almost at the end else opc_cse(..) etc. might do some wrong needed opts as nir does not see our modifiers. Also we need to remove the last nir_opt_dce(..) as it will remove not dead code caused by the used load_const hack. 32 %15 = load_const (0x00000000 = 0.000000) 32 %4 = fabs %15 (0.000000) nir_opt_dce is correct when it removes the two instructions. But in reality the load_const is a uniform that should not be removed. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
Christian Gmeiner	5ae3bd616c	etnaviv: nir: add etna_nir_lower_to_source_mods(..) This is more or less a copy of nir_lower_to_source_mods(..) with the following differences: - we store the source mods in pass_flags - we do not try to saturate the destination Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
Christian Gmeiner	276f91dad0	etnaviv: nir: look at parent instr in lower_alu(..) When we switch to our own lower_to_source_mods pass we will start to see such patterns: 32x4 %18 = fneg %5 (-5.125000, -30.000000, 5.500000, -6.500000) 32x4 %19 = ffma %18, %8, %4 (-6.500000, -7.750000, 6.500000, 6.000000) This is a problem as we will generate instruction that accesses two different uniforms, which is a problem on GPUs where has_no_oneconst_limit is false. Make lower_alu(..) smarter by looking in the parent for for the constant value. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
Christian Gmeiner	f3be07cb2d	etnaviv: do not clear all pass_flags before RA We only need to clear the 'dead' bits. The others are used for source mods. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
Christian Gmeiner	8d89e78cf5	etnaviv: extend etna_pass_flags with source modifiers As nir_lower_to_source_mods(..) will be deleted and with it the modifier storage in nir's core we need to find an other way store the information. We have have 6 bits left in nir's pass_flags - so lets go that route. This also adds some small helpers that will be used later. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
Christian Gmeiner	5b7104b7a0	etnaviv: add is_dead_instruction(..) helper As we are going to extend the enum etna_pass_flags it makes sense to add a small helper to test if an instruction is dead. An instruction is dead if BYPASS_DST or BYPASS_SRC is set. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
Christian Gmeiner	0c9c450f44	etnaviv: name the enum used for pass_flags This enum is used for the pass_flags that can be set on a nir_instr. Name it to make the intention of its usage clear. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
Christian Gmeiner	8305fb196c	etnaviv: make use of BITFIELD_BIT(..) macro It helps to make the code easier to read. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24216>	2023-07-24 15:22:56 +00:00
David Rosca	0a5fe1f524	frontends/va: Add YUV420 to NV12 postproc conversion Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7853 Reviewed-by: Thong Thai <thong.thai@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24272>	2023-07-24 14:46:02 +00:00
David Rosca	c0545f2a4f	gallium/auxiliary/vl: Fix blurry output of compute_shader_yuv There is a linear sampler used, so add half texel offset to avoid undesirable blur when input and output resolutions are the same. Reviewed-by: Thong Thai <thong.thai@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24272>	2023-07-24 14:46:02 +00:00
David Rosca	fc2b32c5d3	gallium/auxiliary/vl: Handle UV subsampling in compute_shader_yuv Also remove the 1px vertical shift as it results in a black line at the bottom of the picture. Reviewed-by: Thong Thai <thong.thai@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24272>	2023-07-24 14:46:02 +00:00
Georg Lehmann	92900d8bf4	aco: improve get_gfx11_true16_mask description Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24294>	2023-07-24 14:12:19 +00:00
Georg Lehmann	8fbebb6a2a	aco/gfx11: fix get_gfx11_true16_mask with v_cmp_class_f16 The second operand is 16bit, so the we need to use VOP3 to address v128-v255. Closes: #9413 Fixes: `6872f8d861` ("aco/gfx11: allow true 16-bit instructions to access v128+") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24294>	2023-07-24 14:12:19 +00:00
Rhys Perry	a53d3ff0b3	nir/tests: add nir_opt_dead_cf_test.jump_before_constant_if Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24235>	2023-07-24 14:06:16 +01:00
Rhys Perry	21f0aca948	nir/opt_dead_cf: remove nodes after a jump earlier In the case of: halt // succs: b9 if %618 { block b3:// preds: break // succs: b6 } else { block b4: // preds: , succs: b5 } block b5: // preds: b4 32 %556 = iadd %617, %2 (0x1) opt_constant_if() doesn't work because stitch_blocks() can't join blocks if the before ends in a jump and the after isn't empty. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24235>	2023-07-24 14:06:16 +01:00
Konstantin Seurer	1c8577b493	nir/tests: Use a single binary Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24249>	2023-07-24 11:44:46 +00:00
Konstantin Seurer	6eb0a3a5b7	nir/tests: Refactor boilerplate into a common header Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24249>	2023-07-24 11:44:46 +00:00
Danylo Piliaiev	eeb1fd90fc	tu,freedreno: Forbid blit event for R8G8_SRGB due to gpu faults Same cause as for other R8G8 formats - msaa resolve via blit event causes gpu fault. Fixes: dEQP-VK.api.image_clearing..clear_color_attachment..r8g8_srgb_* Fixes: `029919f3c8` ("tu: allow using resolve engine for SRGB MSAA resolves") Cc: mesa-stable Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24277>	2023-07-24 10:13:49 +00:00
Charles Giessen	f3d948eb6c	panvk: Use 1.0 in ICD Manifest json PanVK downgraded from supporting Vulkan 1.1 to 1.0, but did not change their ICD Manifest api_version to reflect that. This cause the Vulkan-Loader to interpret the ICD as a 1.1 driver erroneously. Originally discussed in this issue https://github.com/KhronosGroup/Vulkan-Loader/issues/1242 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24289>	2023-07-24 08:24:13 +00:00
Marcin Ślusarz	48885c7fe3	intel/compiler: load debug mesh compaction options once Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>	2023-07-24 07:55:29 +00:00
Marcin Ślusarz	c1685f08dd	intel/compiler,anv: put some vertex and primitive data in headers Both per-primitive and per-vertex space is allocated in MUE in 8 dword chunks and those 8-dword chunks (granularity of 3DSTATE_SBE_MESH.Per[Primitive\|Vertex]URBEntryOutputReadLength) are passed to fragment shaders as inputs (either non-interpolated for per-primitive and flat vertex attributes or interpolated for non-flat vertex attributes). Some attributes have a special meaning and must be placed in separate 8/16-dword slot called Primitive Header or Vertex Header. Primitive Header contains 4 such attributes (Cull Primitive, ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword slot) potentially unused. Vertex Header is similar - it starts with 3 unused dwords, 1 dword for Point Size (but if we declare that shader doesn't produce Point Size then we can reuse it), followed by 4 dwords for Position and optionally 8 dwords for clip distances. This means we have an interesting optimization problem - we can put some user attributes into holes in Primitive and Vertex Headers, which may lead to smaller MUE size and potentially more mesh threads running in parallel, but we have to be careful to use those holes only when we need it, otherwise we could force HW to pass too much data to fragment shader. Example 1: Let's assume that Primitive Header is enabled and user defined 12 dwords of per-primitive attributes. Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader. With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to fragment shader. 16/16 is better than 24/16, so packing makes sense. Example 2: Now let's assume that Primitive Header is enabled and user defined 16 dwords of per-primitive attributes. Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader. With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to fragment shader. 24/24 is worse than 24/16, so packing doesn't make sense. This change doesn't affect vk_meshlet_cadscene in default configuration, but it speeds it up by up to 25% with "-extraattributes N", where N is some small value divisible by 2 (by default N == 1) and we are bound by URB size. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>	2023-07-24 07:55:29 +00:00
Marcin Ślusarz	a252123363	intel/compiler/mesh: compactify MUE layout Instead of using 4 dwords for each output slot, use only the amount of memory actually needed by each variable. There are some complications from this "obvious" idea: - flat and non-flat variables can't be merged into the same vec4 slot, because flat inputs mask has vec4 stride - multi-slot variables can have different layout: float[N] requires N 1-dword slots, but i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot - some output variables occur both in single-channel/component split and combined variants - crossing vec4 boundary requires generating more writes, so avoiding them if possible is beneficial This patch fixes some issues with arrays in per-vertex and per-primitive data (func.mesh.ext.outputs.*.indirect_array.q0 in crucible) and by reduction in single MUE size it allows spawning more threads at the same time. Note: this patch doesn't improve vk_meshlet_cadscene performance because default layout is already optimal enough. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>	2023-07-24 07:55:29 +00:00

1 2 3 4 5 ...

174704 commits