fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 07:18:06 +02:00

Author	SHA1	Message	Date
José Roberto de Souza	04bdbeec31	intel/dev/xe: Fix access to eu_per_dss_mask DRM_XE_TOPO_EU_PER_DSS and DRM_XE_TOPO_SIMD16_EU_PER_DSS can be any number of bytes long but it was assuming it was always 4 bytes long. That was not a issue because Xe KMD return 4 bytes even if only needs 1 or 2 bytes but that is a problem with our HW simulator that was returning 2 bytes. Fixes: `a24d93aa89` ("intel/dev: Query and compute hardware topology for Xe") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32307>	2024-12-05 20:30:44 +00:00
Lionel Landwerlin	371b7a9b0d	anv: set pipeline flags correct for imported libs Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3d49cdb71e` ("anv: implement VK_EXT_graphics_pipeline_library") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32507>	2024-12-05 19:53:34 +00:00
Lionel Landwerlin	6e396b400a	anv: fix missing bindings valid dynamic state change check Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ddd296cd3` ("anv: implement VK_EXT_vertex_input_dynamic_state") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32507>	2024-12-05 19:53:34 +00:00
Lionel Landwerlin	80c0d2718c	anv: report formats supported by the common bvh framework Enables DXR 1.1 with vkd3d-proton Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com> Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32487>	2024-12-05 15:54:10 +00:00
Ian Romanick	0754a18621	brw/copy: Allow copy prop into src1 of broadcast This is the selector, and it must always be a uniform UD, so there's no reason to not propagate into it. No shader-db change on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 220507131 -> 220507127 (-0.00%) Cycle count: 31607052398 -> 31607053364 (+0.00%); split: -0.00%, +0.00% Totals from 5 (0.00% of 702410) affected shaders: Instrs: 995 -> 991 (-0.40%) Cycle count: 86392 -> 87358 (+1.12%); split: -0.07%, +1.19% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>	2024-12-05 00:15:27 +00:00
Ian Romanick	662339a2ff	brw/build: Use SIMD8 temporaries in emit_uniformize The fossil-db results are very different from v1. This is now mostly helpful on older platforms. v2: When optimizing BROADCAST or FIND_LIVE_CHANNEL to a simple MOV, adjust the exec_size to match the size allocated for the destination register. Fixes EU validation failures in some piglit OpenCL tests (e.g., atomic_add-global-return.cl). v3: Use component_size() in emit_uniformize and BROADCAST to properly account for UQ vs UD destination. This doesn't matter for emit_uniformize because the type is always UD, but it is technically more correct. v4: Update trace checksums. Now amly expects the same checksum as several other platforms. v5: Use xbld.dispatch_width() in the builder for when scalar_group() eventually becomes SIMD1. Suggested by Lionel. shader-db: Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) total instructions in shared programs: 18091701 -> 18091586 (<.01%) instructions in affected programs: 29616 -> 29501 (-0.39%) helped: 28 / HURT: 18 total cycles in shared programs: 919250494 -> 919123828 (-0.01%) cycles in affected programs: 12201102 -> 12074436 (-1.04%) helped: 124 / HURT: 108 LOST: 0 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20480808 -> 20480624 (<.01%) instructions in affected programs: 58465 -> 58281 (-0.31%) helped: 61 / HURT: 20 total cycles in shared programs: 874860168 -> 874960312 (0.01%) cycles in affected programs: 18240986 -> 18341130 (0.55%) helped: 113 / HURT: 158 total spills in shared programs: 4557 -> 4555 (-0.04%) spills in affected programs: 93 -> 91 (-2.15%) helped: 1 / HURT: 0 total fills in shared programs: 5247 -> 5243 (-0.08%) fills in affected programs: 224 -> 220 (-1.79%) helped: 1 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 220486064 -> 220486959 (+0.00%); split: -0.00%, +0.00% Subgroup size: 14102592 -> 14102624 (+0.00%) Cycle count: 31602733838 -> 31604733270 (+0.01%); split: -0.01%, +0.02% Max live registers: 65371025 -> 65355084 (-0.02%) Totals from 12130 (1.73% of 702392) affected shaders: Instrs: 5162700 -> 5163595 (+0.02%); split: -0.06%, +0.08% Subgroup size: 388128 -> 388160 (+0.01%) Cycle count: 751721956 -> 753721388 (+0.27%); split: -0.54%, +0.81% Max live registers: 1538550 -> 1522609 (-1.04%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 241601142 -> 241599114 (-0.00%); split: -0.00%, +0.00% Subgroup size: 9631168 -> 9631216 (+0.00%) Cycle count: 25101781573 -> 25097909570 (-0.02%); split: -0.03%, +0.01% Max live registers: 41540611 -> 41514296 (-0.06%) Max dispatch width: 6993456 -> 7000928 (+0.11%); split: +0.15%, -0.05% Totals from 16852 (2.11% of 796880) affected shaders: Instrs: 6303937 -> 6301909 (-0.03%); split: -0.11%, +0.07% Subgroup size: 323592 -> 323640 (+0.01%) Cycle count: 625455880 -> 621583877 (-0.62%); split: -1.20%, +0.58% Max live registers: 1072491 -> 1046176 (-2.45%) Max dispatch width: 76672 -> 84144 (+9.75%); split: +14.04%, -4.30% Tiger Lake Totals: Instrs: 235190395 -> 235193286 (+0.00%); split: -0.00%, +0.00% Cycle count: 23130855720 -> 23128936334 (-0.01%); split: -0.02%, +0.01% Max live registers: 41644106 -> 41620052 (-0.06%) Max dispatch width: 6959160 -> 6981512 (+0.32%); split: +0.34%, -0.02% Totals from 15102 (1.90% of 793371) affected shaders: Instrs: 5771042 -> 5773933 (+0.05%); split: -0.06%, +0.11% Cycle count: 371062226 -> 369142840 (-0.52%); split: -1.04%, +0.52% Max live registers: 989858 -> 965804 (-2.43%) Max dispatch width: 61344 -> 83696 (+36.44%); split: +38.42%, -1.98% Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 236063150 -> 236063242 (+0.00%); split: -0.00%, +0.00% Cycle count: 24516187174 -> 24516027518 (-0.00%); split: -0.00%, +0.00% Spill count: 567071 -> 567049 (-0.00%) Fill count: 701323 -> 701273 (-0.01%) Max live registers: 41914047 -> 41913281 (-0.00%) Max dispatch width: 7042608 -> 7042736 (+0.00%); split: +0.00%, -0.00% Totals from 3904 (0.49% of 798473) affected shaders: Instrs: 2809690 -> 2809782 (+0.00%); split: -0.02%, +0.03% Cycle count: 182114259 -> 181954603 (-0.09%); split: -0.34%, +0.25% Spill count: 1696 -> 1674 (-1.30%) Fill count: 2523 -> 2473 (-1.98%) Max live registers: 341695 -> 340929 (-0.22%) Max dispatch width: 32752 -> 32880 (+0.39%); split: +0.44%, -0.05% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>	2024-12-05 00:15:27 +00:00
Ian Romanick	d2b266187d	brw: Use resize_sources several more places Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>	2024-12-05 00:15:27 +00:00
Ian Romanick	12d1886b87	brw/lower: Don't "fix" regioning of broadcast The next two commits modify the destination regioning in a way that, which still correct, trigger assertion failures if we try to fix the regioning here. Broadcast gets lowered in brw_eu_emit. For the purposes of region restrictions, let's assume that the final code emission will do the right thing. Doing a bunch of shuffling here is only going to make a mess of things. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>	2024-12-05 00:15:27 +00:00
Caio Oliveira	cbc45ac99e	intel/brw: Enable EU validation and compaction tests for PTL Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32195>	2024-12-04 23:03:11 +00:00
Sagar Ghuge	9afb0480c4	intel/compiler: Extend nir_intrinsic_load_topology_id_intel for xe3 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32426>	2024-12-04 19:20:51 +00:00
Michael Cheng	ed620bcd41	anv : Add tracepoint for as_build Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Kevin Chuang	5098c0c5df	anv: Add INTEL_DEBUG for bvh dump and visualization tools This commit allows you to dump different regions of memory related to bvh building. An additional script to decode the memory dump is also added, and you're able to view the built bvh in 3D view in html. See the included README.md for usage. Rework: - you can now view the actual child_coord in internalNode in html - change exponent to be int8_t in the interpreter - fix the actual coordinates using an updated formula - now you can have 3D view of the bvh - blockIncr could be 2 and vk_aabb should be first - Now, if any bvh dump is enabled, we will zero out tlas, to prevent gpu hang caused by incorrect tlas traversal - rootNodeOffset is back to the beginning - Add INTEL_DEBUG=bvh_no_build. - Fix type of dump_size - add assertion for a 4B alignment - when clearing out bvh, only clear out everything after (header+bvh_offset) - TODO: instead of dumping on destory, track in the command buffer Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	5561db68c3	anv: Add helper to copy data from src to dest anv_address Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	41baeb3810	anv: Implement acceleration structure API Rework: (Kevin) - Properly setup bvh_layout Our bvh resides in contiguous memory and can be divided into two sections: 1. anv_accel_struct_header, tightly followed by 2. actual bvh, which starts with root node, followed by interleaving leaves or internal nodes. - Update comments for some fields for BVH and nodes. - Properly populate the UUIDs in serialization header - separate header func into completely two paths based on compaction bit - Encode rt_uuid at second VK_UUID_SIZE. - Write query result at correct slot - add assertion for a 4B alignment - move bvh_layout to anv_bvh - Use meson option to decide which files to compile - The alignment of serialization size is not needed - Change static_assert to STATIC_ASSERT and move them inside functions Rework (Sagar) - Use anv_cmd_buffer_update_buffer instead of MI to copy data Rework (Lionel) - Remove flush after builds, and add flush in copy before dispatch - Handle the flushes in CmdWriteAccelerationStructuresPropertiesKHR properly Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	9002e52037	anv: Implement cmd_dispatch_unaligned callback Rework: (Kevin) - Calculate correct number of threads in GPGPU thread group based on SIMD size. - Instead of round up, just use the simple division and let the remainder part handle groupCount < local_size_x. - Drop indirect_unroll_off and fix the bug that we're not using is_unaligned_size_x Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	0cab02ca9b	anv: Implement flush_buffer_write_cp callbck Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	b2cffdb1ed	anv: Implement write_buffer_cp callback Rework: (Kevin) - Fix pointer arithmatic calculation. - Add assertion for a 4B alignment Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	8817ff26fc	anv: Move update buffer code in helper Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	0edf208ab9	anv: Implement cmd_fill_buffer_addr callback Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Kevin Chuang	2fe57947e3	anv: Implement encode shader to fit in ANV BVH This shader gets called and will construct ANV BVH from IR BVH. More specifically, each invocation will take care of one internal node. The internal nodes get processed starting from root node all the way to the bottom leaves. During processing, we keep track of the destination of where the internal node should be encoded (tracked in vk_ir_box.bvh_offset), and where its leaves should be encoded (tracked in vk_ir_header.dst_node_offset). The processed bvh is in contiguous memory, which starts with header, followed by interleaving internal nodes and leaves. The nodes information are also populated. Rework: (Sagar) - Return out of bounds threads early - Mimic GRL internal node encoding - Handle node mask - Fix block_incr_and_start_prim - Fix shader_index_and_geom_mask for instance node - Fix instance flag - Fix block_incr and instance_contribution_and_geom_flags initialized to be zero - Fix lower_x and upper_x to be properly flipped for invalid child - For invalid node, clear blockIncr and set startPrim to INVALID - Calculated things upfront and assign, cutting down more than ~200 instructions Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	692b5fa9f2	anv: Add shader to copy acceleration structures Rework (Kevin) - encode the address of anv_instance_leaf after header in order to handle serialization and deserialization part. - draw serialized data layout and explanation Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	a6b1a1fce1	anv: Add shader to build BVH header Rework: (Kevin) - Calculate the compacted_size properly - Update instance count and self pointer - The alignment of serialization size is not needed Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	ef94b7097b	anv: Add header to track BVH data structures This commit adds build interface and helper header for ANV BVH. Rework: (Kevin) - Use block_size macro to represent bvh node/leaf size - Rename BVH-related node/leaf size macros for clarity - Updated comments for some fields for bvh and nodes. - move bvh_layout to anv_bvh.h - Draw anv_bvh layout - rename child_offset to child_block_offset Co-authored-by: Kevin Chuang <kaiwenjon23@gmail.com> Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:45 +00:00
Sagar Ghuge	617b7602ea	anv: Split GRL code path in separate file Rework (Kevin) - Remove genX_acceleration_structure.c from meson option to avoid linking error Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:44 +00:00
Sagar Ghuge	b002b2589c	anv: Update include dir for anv_tests Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31588>	2024-12-04 10:41:44 +00:00
Lionel Landwerlin	69edf4144a	brw: use transpose unspill messages when possible This simplifies the unspill messages quite a bit. A/B testing on DG2 : BlackOps3 : +0.96% TotalWarPharaoh: +0.31% DG2 shader changes : Assassin's Creed Valhalla: Totals from 19 (0.89% of 2131) affected shaders: Instrs: 70542 -> 64369 (-8.75%) Cycle count: 18810945 -> 18560169 (-1.33%); split: -1.40%, +0.06% Black Ops 3: Totals from 55 (3.41% of 1612) affected shaders: Instrs: 389549 -> 350646 (-9.99%) Cycle count: 344168275 -> 340652311 (-1.02%); split: -1.17%, +0.15% Control: Totals from 1 (0.11% of 878) affected shaders: Instrs: 3409 -> 3212 (-5.78%) Cycle count: 255991 -> 250411 (-2.18%) Cyberpunk 2077: Totals from 1 (0.08% of 1264) affected shaders: Instrs: 2363 -> 2337 (-1.10%) Cycle count: 69283 -> 69186 (-0.14%) Fallout 4: Totals from 1 (0.06% of 1601) affected shaders: Instrs: 27946 -> 20056 (-28.23%) Cycle count: 2391398 -> 2153658 (-9.94%) Fortnite: Totals from 273 (3.65% of 7470) affected shaders: Instrs: 634377 -> 601519 (-5.18%) Cycle count: 31870433 -> 31624089 (-0.77%); split: -0.78%, +0.01% Hogwarts Legacy: Totals from 50 (3.02% of 1656) affected shaders: Instrs: 110455 -> 103339 (-6.44%) Cycle count: 6613728 -> 6530832 (-1.25%); split: -1.28%, +0.03% Metro Exodus: Totals from 70 (0.16% of 43076) affected shaders: Instrs: 253847 -> 245321 (-3.36%) Cycle count: 13269473 -> 13209131 (-0.45%) Spill count: 1111 -> 1108 (-0.27%) Fill count: 2868 -> 2865 (-0.10%) Red Dead Redemption 2: Totals from 139 (2.38% of 5847) affected shaders: Instrs: 496551 -> 450180 (-9.34%) Cycle count: 43233944 -> 40947386 (-5.29%); split: -5.33%, +0.04% Spill count: 6322 -> 6326 (+0.06%) Fill count: 15558 -> 15568 (+0.06%) Rise Of The Tomb Raider: Totals from 1 (0.56% of 178) affected shaders: Instrs: 1682 -> 1437 (-14.57%) Cycle count: 603670 -> 586766 (-2.80%) Spiderman Remastered: Totals from 820 (11.77% of 6965) affected shaders: Instrs: 4622877 -> 3984893 (-13.80%) Cycle count: 235094963186 -> 234483925430 (-0.26%); split: -0.42%, +0.16% Spill count: 73414 -> 73581 (+0.23%); split: -0.02%, +0.25% Fill count: 215090 -> 215627 (+0.25%); split: -0.02%, +0.27% Scratch Memory Size: 3520512 -> 3528704 (+0.23%); split: -0.12%, +0.35% Some of stats show spilling changes which is telling of how our spill code is not adequate. Some of the spilled values are probably being respilled which shouldn't be the case. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32110>	2024-12-04 08:59:07 +00:00
Pavel Ondračka	dcfa8851bd	ci: bring back some i915g testing Only single g33 as part of r300 ci-tron-based farm. Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32376>	2024-12-04 08:18:43 +00:00
Kenneth Graunke	2ade3ec2a9	brw: Allow SIMD32 math instructions on Xe2 There's no restriction here AFAICT - only when HF types are involved. fossil-db results on Lunar Lake: Totals: Instrs: 143665291 -> 142654109 (-0.70%) Cycle count: 22516049016 -> 22514172014 (-0.01%); split: -0.02%, +0.01% Max live registers: 49038116 -> 49017687 (-0.04%); split: -0.04%, +0.00% Totals from 117623 (21.07% of 558370) affected shaders: Instrs: 25098642 -> 24087460 (-4.03%) Cycle count: 1038884570 -> 1037007568 (-0.18%); split: -0.48%, +0.29% Max live registers: 12423219 -> 12402790 (-0.16%); split: -0.16%, +0.00% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32471>	2024-12-04 02:42:34 +00:00
Kenneth Graunke	815236b417	brw: Fix register unit calculation in SIMD32 LOAD_PAYLOAD lowering We were wanting to check if the destination region spanned multiple registers. But we were checking against REG_SIZE, when the register size is actually REG_SIZE * reg_unit(devinfo) now. This meant that SIMD32 LOAD_PAYLOAD was always getting SIMD-split on Xe2 platforms, generating a lot of unnecessary mess for compute shaders. fossil-db results on Lunar Lake: Totals: Instrs: 146178614 -> 143291988 (-1.97%); split: -1.98%, +0.00% Subgroup size: 11089632 -> 11089376 (-0.00%); split: +0.00%, -0.00% Cycle count: 22528892444 -> 22507551650 (-0.09%); split: -0.12%, +0.03% Max live registers: 48834202 -> 48886685 (+0.11%); split: -0.09%, +0.20% Totals from 134306 (24.10% of 557327) affected shaders: Instrs: 28806335 -> 25919709 (-10.02%); split: -10.02%, +0.00% Subgroup size: 4297680 -> 4297424 (-0.01%); split: +0.00%, -0.01% Cycle count: 956867650 -> 935526856 (-2.23%); split: -2.84%, +0.61% Max live registers: 13085711 -> 13138194 (+0.40%); split: -0.33%, +0.73% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32471>	2024-12-04 02:42:34 +00:00
Caio Oliveira	dfa4c55a4f	intel/brw: Add is_control_source for the new subgroup ops Fixes: `019770f026` ("intel/brw: Add SHADER_OPCODE_VOTE_") Fixes: `9537b62759` ("intel/brw: Add SHADER_OPCODE_REDUCE") Fixes: `0ba1159b0a` ("intel/brw: Add SHADER_OPCODE__SCAN") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32411>	2024-12-04 01:19:37 +00:00
Nanley Chery	428a970511	anv: Only consider R32 image formats as supporting atomics Only consider R32 image formats as supporting atomics because we only expose VK_FORMAT_FEATURE_2_STORAGE_IMAGE_ATOMIC_BIT for those formats. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32192>	2024-12-03 22:54:35 +00:00
Nanley Chery	122c01a496	anv: Enable more storage compression on gfx12+ On gfx12.0, allow storage compression unless the image may be used with atomics. On gfx20, use the CCS_E aux-usage for storage compression. This causes ISL to create surface states with more appropriate render compression formats. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5657 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32192>	2024-12-03 22:54:35 +00:00
Nanley Chery	01c4ea771c	anv: Enable storage accesses with modifiers on gfx12+ I tested this patch with an ACM card. It enables "Halo: The Master Chief Collection" to use the clear color modifier instead falling back to the uncompressed Tile4 modifier. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32192>	2024-12-03 22:54:35 +00:00
Nanley Chery	2dedd8dbb2	intel/isl: Fix DecompressInL3 assignment on gfx12.5 * In the ACM PRMs, the programming notes under RENDER_SURFACE_STATE::MemoryCompressionEnable state that the DecompressInL3 bit must be set for media compression. * Unlike TGL, ACM seems to handle format reinterpretation just fine without using the bit. Update the assignment accordingly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32192>	2024-12-03 22:54:34 +00:00
Marek Olšák	7f4e36ff7d	gallium: replace PIPE_SHADER_CAP_INDIRECT_INPUT/OUTPUT_ADDR with NIR options This is a prerequisite for enabling nir_opt_varyings for all gallium drivers. nir_lower_io_passes (called by the GLSL linker) only uses NIR options to lower indirect IO access before lowering IO and calling nir_opt_varyings. Most drivers report full support for indirect IO and lower it themselves, which prevents compaction of lowered indirectly accessed varyings because nir_opt_varyings doesn't touch indirect varyings. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> (Rb for asahi) Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> (for r300) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32423>	2024-12-03 12:57:36 +00:00
Kenneth Graunke	6fd10a6620	brw: Tune vectorizer conditions to allow overfetching with holes Notably, our convergent block loads were already overfetching - we rounded up to block sizes of 8, 16, 32, or 64(LSC-only). But we did so in the backend, rather than NIR. With recent changes, nir_opt_load_store_vectorizer allows holes of up to 28 bytes (7 components at 4 bytes each). This allows us to detect cases where we did a convergent block load for 1 component (but loaded a whole vec8), then another load for the next vec8, and combine them into a single V16 load. Single component loads aren't the most common, but convergent loads of a vec2 in one group and a vec3 in another are quite common, and it makes no sense to do V8+V8 loads instead of V16. For non-block loads, we allow a max hole of 4 bytes. This allows the common case of XYZ_ + XYZ_ loads (where the last component is unread) to combine into a single larger load. fossil-db results on Lunarlake: Totals: Instrs: 146692608 -> 146246432 (-0.30%); split: -0.33%, +0.02% Subgroup size: 11100528 -> 11100512 (-0.00%) Send messages: 7003425 -> 6862529 (-2.01%); split: -2.01%, +0.00% Cycle count: 22396273274 -> 22523048654 (+0.57%); split: -1.08%, +1.64% Spill count: 67671 -> 67594 (-0.11%); split: -1.59%, +1.48% Fill count: 128999 -> 130223 (+0.95%); split: -1.73%, +2.68% Scratch Memory Size: 5986304 -> 6042624 (+0.94%); split: -1.40%, +2.34% Max live registers: 48898858 -> 48881655 (-0.04%); split: -0.05%, +0.01% Non SSA regs after NIR: 172397792 -> 167577380 (-2.80%); split: -2.80%, +0.00% Totals from 451003 (80.87% of 557667) affected shaders: Instrs: 134111754 -> 133665578 (-0.33%); split: -0.36%, +0.03% Subgroup size: 9039104 -> 9039088 (-0.00%) Send messages: 6127775 -> 5986879 (-2.30%); split: -2.30%, +0.00% Cycle count: 20306336726 -> 20433112106 (+0.62%); split: -1.19%, +1.81% Spill count: 56230 -> 56153 (-0.14%); split: -1.92%, +1.78% Fill count: 112920 -> 114144 (+1.08%); split: -1.97%, +3.06% Scratch Memory Size: 3769344 -> 3825664 (+1.49%); split: -2.23%, +3.72% Max live registers: 43750259 -> 43733056 (-0.04%); split: -0.05%, +0.01% Non SSA regs after NIR: 158449343 -> 153628931 (-3.04%); split: -3.04%, +0.00% In particular, sends get cut by 20.85% for Borderlands 3 DX12, 13.82% on Cyberpunk 2077, 10.75% on Strange Brigade, and 10.20% on Red Dead Redemption 2. Yet, spill/fills remain about the same. fossil-db results on Alchemist are similar though not quite as good. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	f88eb48ff2	anv: Don't consider nir_var_mem_global for vectorizer robustness checks nir_opt_load_store_vectorize checks for potential address wrapping when vectorizing two loads ("low" and "high"). It looks for cases where "low" might have a large address, and "high" has a positive offset which, when added together, could trigger integer wraparound. The issue here is that if the large address of "low" was considered out-of-bounds, adding offset could wrap around to a small address, which might actually be in-bounds. Thus, when loaded separately, "low" will fail and trigger robustness out-of-bound-read behavior, but "high" would read correctly. When vectorized, the entire load would fail. This is explicitly tested for with 32-bit SSBO addresses in the Vulkan CTS. However, anv's 64-bit global addresses and VMA handling effectively prevent this case. Addresses 0-4095 are a reserved page so that if people try to use 0 as a NULL pointer, it never maps to a valid BO. That alone guarantees that the above case where "high" gets a small address would never be in-bounds, so we don't need to check for it. In fact, we allocate most user allocations out of high addresses, and have specialized allocation heaps for certain types of GPU data structures in the lower GB of memory. For a load to wrap around and successfully land in the right heap, it would have to load gigabytes. Disabling this allows load vectorization and overfetching in more cases. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	01680a66a9	brw: Simplify choose_oword_block_size_dwords() Just calculate the block size using util_logbase2() - it's simpler. Also drop the name "oword" as this refers to legacy HDC messages, rather than the newer LSC "vector size" field. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	e8c85f8476	brw: Only consider components read for UBO push analysis Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	e703ff5e02	brw: Only consider components read for UBO loads This will matter more with overfetching, where we may suggest loading additional data that we don't actually need for vectorization purposes. We want to make sure that push ranges have the data we actually need; any extra padding is irrelevant. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	da93b13f8b	brw: Use nir_combined_align in brw_nir_should_vectorize_mem Better than open-coding this. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:32 +00:00
Kenneth Graunke	8c795af0b8	brw: Drop a few crocus references in comments crocus no longer uses brw. It uses elk. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:32 +00:00
Kenneth Graunke	46af23649c	brw: Drop "regular uniform" concept from UBO push analysis i965 used to upload its own regular GL uniforms and push those in addition to UBO ranges. st/mesa instead uploads regular uniforms and presents those to use as UBO 0. So this really isn't a thing anymore. nir_intrinsic_load_uniform is still used today but it represents Vulkan push constants. anv_nir_compute_push_layout already takes care of ensuring too many ranges aren't present, so it doesn't need the pass to do so. iris doesn't use this intrinsic at all. We can also drop the compute shader check, because neither iris nor anv use UBO push analysis for compute shaders - except for anv's internal kernels, which already have well specified push layouts. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:32 +00:00
Kenneth Graunke	586a470a00	brw: Drop image deref handling from brw_analyze_ubo_ranges This was for pre-Skylake image load/store handling with image params. We don't support that in brw anymore. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:32 +00:00
Dylan Baker	5a6531b5d6	anv: bump conformance version to 1.4 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32441>	2024-12-02 21:56:40 +00:00
Dylan Baker	212565f42e	anv: Add new Vulkan 1.4 features and properties Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32441>	2024-12-02 21:56:39 +00:00
Dylan Baker	953d8a61f8	anv: bump max number of push constants to 256 As is required by Vulkan 1.4 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32441>	2024-12-02 21:56:39 +00:00
Dylan Baker	8105f80244	anv: advertise Vulkan 1.4 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32441>	2024-12-02 21:56:39 +00:00
Lionel Landwerlin	888f63cf1b	anv/iris: leave 4k alignments for clear colors with modifiers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `17f97a69c1` ("iris: Reduce clear color state alignment to 64B") Fixes: `063715ed45` ("anv: Reduce clear color state alignment to 64B") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12195 Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13057 Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32422>	2024-12-02 12:51:45 +00:00
Pierre-Eric Pelloux-Prayer	9f4ab06842	glx: return BadMatch for invalid reset notification strategy The specification doesn't say which error should be reported, but piglit expects BadMatch: /* The GLX_ARB_create_context_robustness spec does not say what error * code should be generated. However, similar cases (e.g., valid GL * versions) specify BadMatch. This is also the behavior of NVIDIA's * closed-source driver. */ Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32281>	2024-11-27 19:00:20 +00:00

1 2 3 4 5 ...

13131 commits