fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 15:58:06 +02:00

Author	SHA1	Message	Date
Lionel Landwerlin	2bb98a8f99	anv: document UBO descriptor range alignments Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32347>	2024-12-12 07:35:18 +00:00
Lionel Landwerlin	99bb2a087a	intel/decoder: fix COMPUTE_WALKER handling Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `17096f87` ("intel: Switch to COMPUTE_WALKER_BODY") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32347>	2024-12-12 07:35:18 +00:00
Kenneth Graunke	6341b3cd87	brw: Combine convergent texture buffer fetches into fewer loads Borderlands 3 (both DX11 and DX12 renderers) have a common pattern across many shaders: con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture) ... con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture) A single basic block contains piles of texelFetches from a 1D buffer texture, with constant coordinates. In most cases, only the .x channel of the result is read. So we have something on the order of 28 sampler messages, each asking for...a single uint32_t scalar value. Because our sampler doesn't have any support for convergent block loads (like the untyped LSC transpose messages for SSBOs)...this means we were emitting SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar, replicating what's effectively a SIMD1 value to the entire register. This is hugely wasteful, both in terms of register pressure, and also in back-and-forth sending and receiving memory messages. The good news is we can take advantage of our explicit SIMD model to handle this more efficiently. This patch adds a new optimization pass that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic block, with constant offsets, from the same texture. It constructs a new divergent coordinate where each channel is one of the constants (i.e <10, 11, 12, ..., 26> in the above example). It issues a new NoMask divergent texel fetch which loads N useful channels in one go, and replaces the rest with expansion MOVs that splat the SIMD1 result back to the full SIMD width. (These get copy propagated away.) We can pick the SIMD size of the load independently of the native shader width as well. On Xe2, those 28 convergent loads become a single SIMD32 ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can use a smaller size when there aren't many to combine. In fossil-db, this cuts 27% of send messages in affected shaders, 3-6% of cycles, 2-3% of instructions, and 8-12% of live registers. On A770, this improves performance of Borderlands 3 by roughly 2.5-3.5%. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>	2024-12-12 00:05:42 +00:00
Caio Oliveira	abe41b1d2c	intel/compiler: Use #pragma once instead of header guards Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32534>	2024-12-11 19:47:44 +00:00
Tapani Pälli	97fc987497	intel/dev: update mesa_defs.json from internal database This updates entry for 14017823839 which fixes issues on BMG with: dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32550>	2024-12-11 17:32:52 +00:00
Caio Oliveira	638802d68f	intel/brw: Dump errors when brw_assemble() fails EU validation This will allow executor to show proper inline errors. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32490>	2024-12-10 20:23:25 +00:00
Jordan Justen	1027b071f9	intel/dev: Add intel_check_hwconfig_items() Rather than checking hwconfig items when using them, wait until after devinfo has been fully initialized. This includes having workarounds implemented. We can then check if the hwconfig data and final Mesa initialization agree. If the match fails, we need to investigate if Mesa or the hwconfig data is wrong. This code becomes a no-op when not on a release build. Fixes: `a4c5bfd34c` ("intel/dev: Use hwconfig for urb min/max entry values") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12141 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>	2024-12-10 09:01:45 +00:00
Jordan Justen	4eb10bc25e	intel/dev: Don't process hwconfig table to apply items when not required Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>	2024-12-10 09:01:45 +00:00
Jordan Justen	5a8107cef4	intel/dev: Split apply and check paths for hwconfig Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>	2024-12-10 09:01:45 +00:00
Jordan Justen	832de579e1	intel/dev: Split hwconfig warning check into hwconfig_item_warning() Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>	2024-12-10 09:01:45 +00:00
Benjamin Lee	74ccf6cbdc	nir: add option to use compact view indices In panvk we pass absolute view indices to the hardware, so we need to do the conversion from compacted to absolute at some point. Emitting absolute indices from nir_lower_multiview initially looks like the simplest option, but nir_lower_io_to_temporaries will emit a write for every element of array varyings. This results in unnecessary writes to disabled views. Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>	2024-12-09 20:31:49 +00:00
Benjamin Lee	becb014d27	nir: treat per-view outputs as arrayed IO This is needed for implementing multiview in panvk, where the address calculation for multiview outputs is not well-represented by lowering to nir_intrinsic_store_output with a single offset. The case where a variable is both per-view and per-{vertex,primitive} is now unsupported. This would come up with drivers implementing NV_mesh_shader or using nir_lower_multiview on geometry, tessellation, or mesh shaders. No drivers currently do either of these. There was some code that attempted to handle the nested per-view case by unwrapping per-view/arrayed types twice, but it's unclear to what extent this actually worked. ANV and Turnip both rely on per-view outputs being assigned a unique driver location for each view, so I've added on option to configure that behavior rather than removing it. Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>	2024-12-09 20:31:49 +00:00
Benjamin Lee	975c3ecd1e	nir: handle arbitrary per-view outputs in nir_lower_multiview This is needed for panvk, where multiview is "all or nothing". When multiview is enabled, all outputs may be written with separate values for each view. The edge case mentioned in the previous `nir_can_lower_multiview` is now handled because we now handle an arbitrary number of per-view output vars instead of expecting to find exactly one. Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>	2024-12-09 20:31:49 +00:00
Mi, Yanfeng	06d3eb8e01	anv:increase instruction heap to 3Gb Black Myth Wukong is generating more than 2Gb of shaders in pre-compiling stage after VK_EXT_shader_image_atomic_int64 extension enabled. Driver will crash in create shader stages due to dereference null pointer of kernel map. Signed-off-by: Mi, Yanfeng <yanfeng.mi@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32548>	2024-12-09 19:14:38 +00:00
Mi, Yanfeng	0a5a04f509	anv:Fix memory grow calculation overflow issue when old buffer size is large than 2G, 32bit cannot hold 2 times buffer size (>4G). Fixes: `8d813a90d6` ("anv: fail pool allocation when over the maximal size") Signed-off-by: Mi, Yanfeng <yanfeng.mi@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32551>	2024-12-09 18:49:17 +00:00
Paulo Zanoni	0dc2a5808e	brw: don't forget the base when emitting SHADER_OPCODE_MOV_RELOC_IMM The last argument seems to be used as brw_shader_reloc::delta (from brw_add_reloc), and we're unconditionally setting it to 0 here, while the other place where we handle nir_intrinsic_load_reloc_const_intel seems to be setting the base appropriately. I found this by inspection while debugging a bug related to this code, so I'm not aware of any workloads that get improved by this patch. Related patches: - `ecbec25e84` ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic") - `99047451c9` ("intel/fs: add plumbing for embedded samplers") Fixes: `ecbec25e84` ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32531>	2024-12-09 15:45:49 +00:00
Lionel Landwerlin	de00fe3f66	anv: add BVH building tracking through u_trace Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32483>	2024-12-09 14:45:00 +00:00
José Roberto de Souza	2aae000edb	intel/dev/xe: Fix size of eu_per_dss_mask Real Xe KMD actually returns a uint64, so here changing from uint32 to uint64. Fixes: `04bdbeec31` ("intel/dev/xe: Fix access to eu_per_dss_mask") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Nanley Chery <nanley.g.chery@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32527>	2024-12-06 19:52:50 +00:00
Alyssa Rosenzweig	972f8aa287	vulkan: rename depth bias graphics states "constant" is a special keyword in OpenCL C, and we'd like to #define it suitably in host C23 to facilitate compatiblity between host/device headers. That means we can't have any identifiers named "global" or "constant". Fortunately, this is the only 'constant' in any file I'm hitting. To avoid the clash, don't abbreviate "constant factor", use "constant_factor" instead. For consistency, "slope factor" then becomes "slope_factor". The new names are longer but match the Vulkan API exactly. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [Intel] Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> [NVK and panvk] Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [V3DV] Reviewed-by: Simon Perretta <simon.perretta@imgtec.com> [IMG] Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505>	2024-12-06 13:48:26 -05:00
Dylan Baker	33a1acb0da	clc: Tell clang to track imported dependencies Clang is capable of tacking what headers it imports, as long as we set it up to do that. While that isn't important for rusticl, it would be useful for the various `_clc` tools, as they can then tell Ninja which headers they read to make rebuilds more reliable. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Dylan Baker <dylan.c.baker@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505>	2024-12-06 13:48:26 -05:00
Nanley Chery	483c40a21d	anv: Allow compressed memtypes with default buffer types Source 2 games segfault if certain buffers are not able to use the same memory types as images. CS2 specifically expects this to be the case for vertex and index buffers (VK_BUFFER_USAGE_2_INDEX_BUFFER_BIT, VK_BUFFER_USAGE_2_VERTEX_BUFFER_BIT). I have not tested other Source 2 games to see how much the requirement differs for the usage (if at all). Up until now, we've disabled CCS for the Source 2 engine with the anv_disable_xe2_ccs driconf option. However, this option is not great for performance. So, replace this with a new option to allow the same memory types we use for images on buffers - anv_enable_buffer_comp. Compression of buffers is generally not good for performance. I collected the result of unconditionally enabling the feature in the performance CI on BMG. I used the default configuration to average the result of two runs of each trace. The CI reports that 4 game traces would regress between 0.44-1.01% FPS with buffer compression. However, the CI actually shows it to be beneficial in three of our game traces: * Cyberpunk-trace-dx12-1080p-high 106.51% * Hitman3-trace-dx12-1080p-med 101.59% * Blackops3-trace-dx11-1080p-high 100.44% So, enable the option for the two games we already have driconf entries for, Cyberpunk and Hitman3. Of course, also enable the option for Source 2 games. Casey Bowman reports that on BMG, some frame times drop from ~15ms to ~7ms in CS2. This is in large part due to the removal of HiZ resolves, which is a consequence of the game now using of HIZ_CCS_WT instead of plain HIZ. Ref: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11520 Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32519>	2024-12-06 17:21:06 +00:00
José Roberto de Souza	04bdbeec31	intel/dev/xe: Fix access to eu_per_dss_mask DRM_XE_TOPO_EU_PER_DSS and DRM_XE_TOPO_SIMD16_EU_PER_DSS can be any number of bytes long but it was assuming it was always 4 bytes long. That was not a issue because Xe KMD return 4 bytes even if only needs 1 or 2 bytes but that is a problem with our HW simulator that was returning 2 bytes. Fixes: `a24d93aa89` ("intel/dev: Query and compute hardware topology for Xe") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32307>	2024-12-05 20:30:44 +00:00
Lionel Landwerlin	371b7a9b0d	anv: set pipeline flags correct for imported libs Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3d49cdb71e` ("anv: implement VK_EXT_graphics_pipeline_library") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32507>	2024-12-05 19:53:34 +00:00
Lionel Landwerlin	6e396b400a	anv: fix missing bindings valid dynamic state change check Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ddd296cd3` ("anv: implement VK_EXT_vertex_input_dynamic_state") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32507>	2024-12-05 19:53:34 +00:00
Lionel Landwerlin	80c0d2718c	anv: report formats supported by the common bvh framework Enables DXR 1.1 with vkd3d-proton Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com> Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32487>	2024-12-05 15:54:10 +00:00
Ian Romanick	0754a18621	brw/copy: Allow copy prop into src1 of broadcast This is the selector, and it must always be a uniform UD, so there's no reason to not propagate into it. No shader-db change on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 220507131 -> 220507127 (-0.00%) Cycle count: 31607052398 -> 31607053364 (+0.00%); split: -0.00%, +0.00% Totals from 5 (0.00% of 702410) affected shaders: Instrs: 995 -> 991 (-0.40%) Cycle count: 86392 -> 87358 (+1.12%); split: -0.07%, +1.19% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>	2024-12-05 00:15:27 +00:00
Ian Romanick	662339a2ff	brw/build: Use SIMD8 temporaries in emit_uniformize The fossil-db results are very different from v1. This is now mostly helpful on older platforms. v2: When optimizing BROADCAST or FIND_LIVE_CHANNEL to a simple MOV, adjust the exec_size to match the size allocated for the destination register. Fixes EU validation failures in some piglit OpenCL tests (e.g., atomic_add-global-return.cl). v3: Use component_size() in emit_uniformize and BROADCAST to properly account for UQ vs UD destination. This doesn't matter for emit_uniformize because the type is always UD, but it is technically more correct. v4: Update trace checksums. Now amly expects the same checksum as several other platforms. v5: Use xbld.dispatch_width() in the builder for when scalar_group() eventually becomes SIMD1. Suggested by Lionel. shader-db: Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) total instructions in shared programs: 18091701 -> 18091586 (<.01%) instructions in affected programs: 29616 -> 29501 (-0.39%) helped: 28 / HURT: 18 total cycles in shared programs: 919250494 -> 919123828 (-0.01%) cycles in affected programs: 12201102 -> 12074436 (-1.04%) helped: 124 / HURT: 108 LOST: 0 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20480808 -> 20480624 (<.01%) instructions in affected programs: 58465 -> 58281 (-0.31%) helped: 61 / HURT: 20 total cycles in shared programs: 874860168 -> 874960312 (0.01%) cycles in affected programs: 18240986 -> 18341130 (0.55%) helped: 113 / HURT: 158 total spills in shared programs: 4557 -> 4555 (-0.04%) spills in affected programs: 93 -> 91 (-2.15%) helped: 1 / HURT: 0 total fills in shared programs: 5247 -> 5243 (-0.08%) fills in affected programs: 224 -> 220 (-1.79%) helped: 1 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 220486064 -> 220486959 (+0.00%); split: -0.00%, +0.00% Subgroup size: 14102592 -> 14102624 (+0.00%) Cycle count: 31602733838 -> 31604733270 (+0.01%); split: -0.01%, +0.02% Max live registers: 65371025 -> 65355084 (-0.02%) Totals from 12130 (1.73% of 702392) affected shaders: Instrs: 5162700 -> 5163595 (+0.02%); split: -0.06%, +0.08% Subgroup size: 388128 -> 388160 (+0.01%) Cycle count: 751721956 -> 753721388 (+0.27%); split: -0.54%, +0.81% Max live registers: 1538550 -> 1522609 (-1.04%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 241601142 -> 241599114 (-0.00%); split: -0.00%, +0.00% Subgroup size: 9631168 -> 9631216 (+0.00%) Cycle count: 25101781573 -> 25097909570 (-0.02%); split: -0.03%, +0.01% Max live registers: 41540611 -> 41514296 (-0.06%) Max dispatch width: 6993456 -> 7000928 (+0.11%); split: +0.15%, -0.05% Totals from 16852 (2.11% of 796880) affected shaders: Instrs: 6303937 -> 6301909 (-0.03%); split: -0.11%, +0.07% Subgroup size: 323592 -> 323640 (+0.01%) Cycle count: 625455880 -> 621583877 (-0.62%); split: -1.20%, +0.58% Max live registers: 1072491 -> 1046176 (-2.45%) Max dispatch width: 76672 -> 84144 (+9.75%); split: +14.04%, -4.30% Tiger Lake Totals: Instrs: 235190395 -> 235193286 (+0.00%); split: -0.00%, +0.00% Cycle count: 23130855720 -> 23128936334 (-0.01%); split: -0.02%, +0.01% Max live registers: 41644106 -> 41620052 (-0.06%) Max dispatch width: 6959160 -> 6981512 (+0.32%); split: +0.34%, -0.02% Totals from 15102 (1.90% of 793371) affected shaders: Instrs: 5771042 -> 5773933 (+0.05%); split: -0.06%, +0.11% Cycle count: 371062226 -> 369142840 (-0.52%); split: -1.04%, +0.52% Max live registers: 989858 -> 965804 (-2.43%) Max dispatch width: 61344 -> 83696 (+36.44%); split: +38.42%, -1.98% Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 236063150 -> 236063242 (+0.00%); split: -0.00%, +0.00% Cycle count: 24516187174 -> 24516027518 (-0.00%); split: -0.00%, +0.00% Spill count: 567071 -> 567049 (-0.00%) Fill count: 701323 -> 701273 (-0.01%) Max live registers: 41914047 -> 41913281 (-0.00%) Max dispatch width: 7042608 -> 7042736 (+0.00%); split: +0.00%, -0.00% Totals from 3904 (0.49% of 798473) affected shaders: Instrs: 2809690 -> 2809782 (+0.00%); split: -0.02%, +0.03% Cycle count: 182114259 -> 181954603 (-0.09%); split: -0.34%, +0.25% Spill count: 1696 -> 1674 (-1.30%) Fill count: 2523 -> 2473 (-1.98%) Max live registers: 341695 -> 340929 (-0.22%) Max dispatch width: 32752 -> 32880 (+0.39%); split: +0.44%, -0.05% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>	2024-12-05 00:15:27 +00:00
Ian Romanick	d2b266187d	brw: Use resize_sources several more places Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>	2024-12-05 00:15:27 +00:00
Ian Romanick	12d1886b87	brw/lower: Don't "fix" regioning of broadcast The next two commits modify the destination regioning in a way that, which still correct, trigger assertion failures if we try to fix the regioning here. Broadcast gets lowered in brw_eu_emit. For the purposes of region restrictions, let's assume that the final code emission will do the right thing. Doing a bunch of shuffling here is only going to make a mess of things. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>	2024-12-05 00:15:27 +00:00
Caio Oliveira	cbc45ac99e	intel/brw: Enable EU validation and compaction tests for PTL Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32195>	2024-12-04 23:03:11 +00:00
Sagar Ghuge	9afb0480c4	intel/compiler: Extend nir_intrinsic_load_topology_id_intel for xe3 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32426>	2024-12-04 19:20:51 +00:00
Michael Cheng	ed620bcd41	anv : Add tracepoint for as_build Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Kevin Chuang	5098c0c5df	anv: Add INTEL_DEBUG for bvh dump and visualization tools This commit allows you to dump different regions of memory related to bvh building. An additional script to decode the memory dump is also added, and you're able to view the built bvh in 3D view in html. See the included README.md for usage. Rework: - you can now view the actual child_coord in internalNode in html - change exponent to be int8_t in the interpreter - fix the actual coordinates using an updated formula - now you can have 3D view of the bvh - blockIncr could be 2 and vk_aabb should be first - Now, if any bvh dump is enabled, we will zero out tlas, to prevent gpu hang caused by incorrect tlas traversal - rootNodeOffset is back to the beginning - Add INTEL_DEBUG=bvh_no_build. - Fix type of dump_size - add assertion for a 4B alignment - when clearing out bvh, only clear out everything after (header+bvh_offset) - TODO: instead of dumping on destory, track in the command buffer Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	5561db68c3	anv: Add helper to copy data from src to dest anv_address Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	41baeb3810	anv: Implement acceleration structure API Rework: (Kevin) - Properly setup bvh_layout Our bvh resides in contiguous memory and can be divided into two sections: 1. anv_accel_struct_header, tightly followed by 2. actual bvh, which starts with root node, followed by interleaving leaves or internal nodes. - Update comments for some fields for BVH and nodes. - Properly populate the UUIDs in serialization header - separate header func into completely two paths based on compaction bit - Encode rt_uuid at second VK_UUID_SIZE. - Write query result at correct slot - add assertion for a 4B alignment - move bvh_layout to anv_bvh - Use meson option to decide which files to compile - The alignment of serialization size is not needed - Change static_assert to STATIC_ASSERT and move them inside functions Rework (Sagar) - Use anv_cmd_buffer_update_buffer instead of MI to copy data Rework (Lionel) - Remove flush after builds, and add flush in copy before dispatch - Handle the flushes in CmdWriteAccelerationStructuresPropertiesKHR properly Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	9002e52037	anv: Implement cmd_dispatch_unaligned callback Rework: (Kevin) - Calculate correct number of threads in GPGPU thread group based on SIMD size. - Instead of round up, just use the simple division and let the remainder part handle groupCount < local_size_x. - Drop indirect_unroll_off and fix the bug that we're not using is_unaligned_size_x Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	0cab02ca9b	anv: Implement flush_buffer_write_cp callbck Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	b2cffdb1ed	anv: Implement write_buffer_cp callback Rework: (Kevin) - Fix pointer arithmatic calculation. - Add assertion for a 4B alignment Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	8817ff26fc	anv: Move update buffer code in helper Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	0edf208ab9	anv: Implement cmd_fill_buffer_addr callback Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Kevin Chuang	2fe57947e3	anv: Implement encode shader to fit in ANV BVH This shader gets called and will construct ANV BVH from IR BVH. More specifically, each invocation will take care of one internal node. The internal nodes get processed starting from root node all the way to the bottom leaves. During processing, we keep track of the destination of where the internal node should be encoded (tracked in vk_ir_box.bvh_offset), and where its leaves should be encoded (tracked in vk_ir_header.dst_node_offset). The processed bvh is in contiguous memory, which starts with header, followed by interleaving internal nodes and leaves. The nodes information are also populated. Rework: (Sagar) - Return out of bounds threads early - Mimic GRL internal node encoding - Handle node mask - Fix block_incr_and_start_prim - Fix shader_index_and_geom_mask for instance node - Fix instance flag - Fix block_incr and instance_contribution_and_geom_flags initialized to be zero - Fix lower_x and upper_x to be properly flipped for invalid child - For invalid node, clear blockIncr and set startPrim to INVALID - Calculated things upfront and assign, cutting down more than ~200 instructions Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	692b5fa9f2	anv: Add shader to copy acceleration structures Rework (Kevin) - encode the address of anv_instance_leaf after header in order to handle serialization and deserialization part. - draw serialized data layout and explanation Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	a6b1a1fce1	anv: Add shader to build BVH header Rework: (Kevin) - Calculate the compacted_size properly - Update instance count and self pointer - The alignment of serialization size is not needed Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	ef94b7097b	anv: Add header to track BVH data structures This commit adds build interface and helper header for ANV BVH. Rework: (Kevin) - Use block_size macro to represent bvh node/leaf size - Rename BVH-related node/leaf size macros for clarity - Updated comments for some fields for bvh and nodes. - move bvh_layout to anv_bvh.h - Draw anv_bvh layout - rename child_offset to child_block_offset Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	617b7602ea	anv: Split GRL code path in separate file Rework (Kevin) - Remove genX_acceleration_structure.c from meson option to avoid linking error Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:44 +00:00
Sagar Ghuge	b002b2589c	anv: Update include dir for anv_tests Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:44 +00:00
Lionel Landwerlin	69edf4144a	brw: use transpose unspill messages when possible This simplifies the unspill messages quite a bit. A/B testing on DG2 : BlackOps3 : +0.96% TotalWarPharaoh: +0.31% DG2 shader changes : Assassin's Creed Valhalla: Totals from 19 (0.89% of 2131) affected shaders: Instrs: 70542 -> 64369 (-8.75%) Cycle count: 18810945 -> 18560169 (-1.33%); split: -1.40%, +0.06% Black Ops 3: Totals from 55 (3.41% of 1612) affected shaders: Instrs: 389549 -> 350646 (-9.99%) Cycle count: 344168275 -> 340652311 (-1.02%); split: -1.17%, +0.15% Control: Totals from 1 (0.11% of 878) affected shaders: Instrs: 3409 -> 3212 (-5.78%) Cycle count: 255991 -> 250411 (-2.18%) Cyberpunk 2077: Totals from 1 (0.08% of 1264) affected shaders: Instrs: 2363 -> 2337 (-1.10%) Cycle count: 69283 -> 69186 (-0.14%) Fallout 4: Totals from 1 (0.06% of 1601) affected shaders: Instrs: 27946 -> 20056 (-28.23%) Cycle count: 2391398 -> 2153658 (-9.94%) Fortnite: Totals from 273 (3.65% of 7470) affected shaders: Instrs: 634377 -> 601519 (-5.18%) Cycle count: 31870433 -> 31624089 (-0.77%); split: -0.78%, +0.01% Hogwarts Legacy: Totals from 50 (3.02% of 1656) affected shaders: Instrs: 110455 -> 103339 (-6.44%) Cycle count: 6613728 -> 6530832 (-1.25%); split: -1.28%, +0.03% Metro Exodus: Totals from 70 (0.16% of 43076) affected shaders: Instrs: 253847 -> 245321 (-3.36%) Cycle count: 13269473 -> 13209131 (-0.45%) Spill count: 1111 -> 1108 (-0.27%) Fill count: 2868 -> 2865 (-0.10%) Red Dead Redemption 2: Totals from 139 (2.38% of 5847) affected shaders: Instrs: 496551 -> 450180 (-9.34%) Cycle count: 43233944 -> 40947386 (-5.29%); split: -5.33%, +0.04% Spill count: 6322 -> 6326 (+0.06%) Fill count: 15558 -> 15568 (+0.06%) Rise Of The Tomb Raider: Totals from 1 (0.56% of 178) affected shaders: Instrs: 1682 -> 1437 (-14.57%) Cycle count: 603670 -> 586766 (-2.80%) Spiderman Remastered: Totals from 820 (11.77% of 6965) affected shaders: Instrs: 4622877 -> 3984893 (-13.80%) Cycle count: 235094963186 -> 234483925430 (-0.26%); split: -0.42%, +0.16% Spill count: 73414 -> 73581 (+0.23%); split: -0.02%, +0.25% Fill count: 215090 -> 215627 (+0.25%); split: -0.02%, +0.27% Scratch Memory Size: 3520512 -> 3528704 (+0.23%); split: -0.12%, +0.35% Some of stats show spilling changes which is telling of how our spill code is not adequate. Some of the spilled values are probably being respilled which shouldn't be the case. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32110>	2024-12-04 08:59:07 +00:00
Pavel Ondračka	dcfa8851bd	ci: bring back some i915g testing Only single g33 as part of r300 ci-tron-based farm. Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32376>	2024-12-04 08:18:43 +00:00
Kenneth Graunke	2ade3ec2a9	brw: Allow SIMD32 math instructions on Xe2 There's no restriction here AFAICT - only when HF types are involved. fossil-db results on Lunar Lake: Totals: Instrs: 143665291 -> 142654109 (-0.70%) Cycle count: 22516049016 -> 22514172014 (-0.01%); split: -0.02%, +0.01% Max live registers: 49038116 -> 49017687 (-0.04%); split: -0.04%, +0.00% Totals from 117623 (21.07% of 558370) affected shaders: Instrs: 25098642 -> 24087460 (-4.03%) Cycle count: 1038884570 -> 1037007568 (-0.18%); split: -0.48%, +0.29% Max live registers: 12423219 -> 12402790 (-0.16%); split: -0.16%, +0.00% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32471>	2024-12-04 02:42:34 +00:00
Kenneth Graunke	815236b417	brw: Fix register unit calculation in SIMD32 LOAD_PAYLOAD lowering We were wanting to check if the destination region spanned multiple registers. But we were checking against REG_SIZE, when the register size is actually REG_SIZE * reg_unit(devinfo) now. This meant that SIMD32 LOAD_PAYLOAD was always getting SIMD-split on Xe2 platforms, generating a lot of unnecessary mess for compute shaders. fossil-db results on Lunar Lake: Totals: Instrs: 146178614 -> 143291988 (-1.97%); split: -1.98%, +0.00% Subgroup size: 11089632 -> 11089376 (-0.00%); split: +0.00%, -0.00% Cycle count: 22528892444 -> 22507551650 (-0.09%); split: -0.12%, +0.03% Max live registers: 48834202 -> 48886685 (+0.11%); split: -0.09%, +0.20% Totals from 134306 (24.10% of 557327) affected shaders: Instrs: 28806335 -> 25919709 (-10.02%); split: -10.02%, +0.00% Subgroup size: 4297680 -> 4297424 (-0.01%); split: +0.00%, -0.01% Cycle count: 956867650 -> 935526856 (-2.23%); split: -2.84%, +0.61% Max live registers: 13085711 -> 13138194 (+0.40%); split: -0.33%, +0.73% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32471>	2024-12-04 02:42:34 +00:00

1 2 3 4 5 ...

13152 commits