fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-26 23:18:12 +02:00

Author	SHA1	Message	Date
Caio Oliveira	2c64e12462	intel/executor: Add performance counter support Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Add optional OA performance counter collection around each execute() call. Examples: ``` # List all profiles and counters, with descriptions. $ executor --oa list # Collect all counters from a profile. $ executor --oa ComputeBasic file.lua # Collect a subset of counters from a profile, separated by comma. $ executor --oa ComputeBasic:GpuTime,AvgGpuCoreFrequency file.lua # By default use ComputeBasic profile, so counter names only also work. $ executor --oa GpuTime file.lua ``` The selected counters are printed to stdout after the script finishes, or written to a file specified by --oa-csv FILENAME. Assisted-by: Pi coding agent (GPT-5.5) Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41610>	2026-05-21 16:46:35 -07:00
Caio Oliveira	8d237b5408	intel/executor: Add an overflow check for alloc function Assisted-by: Pi coding agent (GPT-5.5) Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41610>	2026-05-21 16:46:35 -07:00
Caio Oliveira	0dda43819e	intel/compiler: Move bison command to shared meson.build It is used by both brw and elk. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41738>	2026-05-21 22:15:00 +00:00
Sagar Ghuge	7f1defa5ef	brw/rt: Commit hit even if we are skipping closest hit shader It's not about the memory traffic but updating the Tmax value/distance so that on next intersection, we would be comparing the updated Tmax value/distance instead of original distance. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Iván Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41709>	2026-05-21 20:45:39 +00:00
Sagar Ghuge	17f7e7f96b	anv: Set execution mask based on SIMD size Execution mask gets applied to last thread in the threadgroup to mask off simd lanes, But with BTD enabled, we are seeing only last 4 components has valid stack ID's and upper 4 components of the register are zero. Changing execution mask somehow populates the stack IDs properly. This is on simulator, before changing the execution mask: 00000000 00000000 00000000 00000000 000F000E 000D000C 000B000A 00090008 00000000 00000000 00000000 00000000 000F000E 000D000C 000B000A 00090008 r1 After changing execution mask: 000F000E 000D000C 000B000A 00090008 00070006 00050004 00030002 00010000 000F000E 000D000C 000B000A 00090008 00070006 00050004 00030002 00010000 r1 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41409>	2026-05-21 20:25:46 +00:00
Caio Oliveira	e2402f6a07	brw: Bound register coalesce rewrites by live range When updating a register after successfully finding a pair to coalesce, use the live range of the source register to walk only the instructions that might use it. Depending on the shader this allows skipping a bunch of blocks -- and also terminating early. Below are fossil compilation times in a MTL machine compiling shaders for a BMG GPU, the big win here was for Cyberpunk 2077. ``` // Differences at 95.0% confidence. // Rise of the Tomb Raider (n=20) -0.0095 +/- 0.00706877 -1.90572% +/- 1.40609% // Alan Wake (n=20) -0.031 +/- 0.0172806 -0.93599% +/- 0.51952% // Borderlands 3 (n=15) -0.353333 +/- 0.118679 -2.44307% +/- 0.80787% // Oblivion Remastered (n=15) -0.134 +/- 0.026008 -2.76898% +/- 0.531637% // Baldur's Gate 3 (n=15) -0.954286 +/- 0.163625 -2.21713% +/- 0.377562% // Cyberpunk 2077 (n=20) -2.8665 +/- 0.228489 -8.08661% +/- 0.621779% ``` Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41495>	2026-05-21 18:32:36 +00:00
Caio Oliveira	821a812c7d	brw: Don't directly use regs_read/regs_written/size_read as bound for non-trivial loops Instead save to a local variable and use that. In various cases the compiler is not able to pull it out of the loop, since there are other not inlined function calls as part of the loop's body, resulting in repeated unnecessary calls to either size_read() or its pieces that get inlined. Below are fossil compilation times in a MTL machine compiling shaders for a BMG GPU: ``` // Differences at 95.0% confidence. // Rise of the Tomb Raider (n=20) -0.017 +/- 0.00724575 -3.45177665% +/- 1.45084% // Alan Wake (n=20) -0.153 +/- 0.00960067 -4.99265786% +/- 0.303695% // Borderlands 3 (n=14) -0.486428571 +/- 0.15354 -3.51248195% +/- 1.0835% // Oblivion Remastered (n=14) -0.143571429 +/- 0.0357991 -3.05749924% +/- 0.747872% // Baldur's Gate 3 (n=14) -1.68928571 +/- 0.151598 -4.12128605% +/- 0.364259% ``` Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>	2026-05-21 18:04:14 +00:00
Caio Oliveira	3f71aab327	brw: Pass VGRF numbers to liveness helpers Compute var_from_reg() once in setup_def_use() and pass the variable number to setup_one_read() and setup_one_write(). This lets the loops walk consecutive variable numbers directly instead of mutating a brw_reg offset. Also: setup_one_write() is only called for VGRFs, so remove the check for VGRF there. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>	2026-05-21 18:04:14 +00:00
Caio Oliveira	9975a35f43	brw: Avoid unnecessary calls to size_read() in flags_read() Only ARF sources are relevant in this case, so check the file before calling size_read(). Below are fossil compilation times in a MTL machine compiling shaders for a BMG GPU: ``` // Differences at 95.0% confidence. // Rise of the Tomb Raider (n=20) No difference proven // Alan Wake (n=20) -0.0725 +/- 0.0139437 -2.30965276% +/- 0.438787% // Borderlands 3 (n=14) -0.248571429 +/- 0.135107 -1.76946153% +/- 0.954171% // Oblivion Remastered (n=14) -0.0735714286 +/- 0.0235712 -1.54770849% +/- 0.492117% // Baldur's Gate 3 (n=14) -0.832142857 +/- 0.23095 -1.98028217% +/- 0.545648% ``` Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>	2026-05-21 18:04:13 +00:00
Caio Oliveira	bb8d8a2141	brw: Call size_read() once in regs_read() regs_read() itself gets inlined, but size_read() does not. In GCC release builds this results in three calls to size_read() at each site, one of them due to how MIN2 is expanded. Use a local variable to store the result. Below are fossil compilation times in a MTL machine compiling shaders for a BMG GPU: ``` // Differences at 95.0% confidence. // Rise of the Tomb Raider (n=20) -0.013 +/- 0.00596452 -2.56410256% +/- 1.15623% // Alan Wake (n=20) -0.1755 +/- 0.0144896 -5.29491628% +/- 0.425556% // Borderlands 3 (n=14) -0.562142857 +/- 0.129678 -3.84765816% +/- 0.870239% // Oblivion Remastered (n=14) -0.0821428571 +/- 0.0262485 -1.69867061% +/- 0.537247% // Baldur's Gate 3 (n=14) -1.61357143 +/- 0.21693 -3.69788342% +/- 0.486462% ``` Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>	2026-05-21 18:04:13 +00:00
Caio Oliveira	3850922b78	brw: Save original regs_written() value in register coalesce The instruction may get transformed, modifying the destination before the loop index gets incremented. So save the original regs_written value to be used in the loop increment. While we are here, assert that all the slots in mov[] are filled at this point in the code. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41496>	2026-05-21 18:04:13 +00:00
Michael Cheng	ec778a297f	brw: Fix ordered dependency exec_all handling on Xe2+ Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On Xe2+ the Wa_1407528679 NoMask workaround is disabled, so baked_ordered_dependency_mode() should treat all instructions as exec_all, matching the logic in gather_inst_dependencies() and emit_inst_dependencies(). Without this, ordered RegDist dependencies from uniform/WE_all producers (e.g. 'mov s0, imm') are not found during baking and fall through as separate WE_all SYNC NOPs. Real shaders pile up dozens of these in front of masked sends. v2(Caio): Fix existing scalar_register test expectations Signed-off-by: Michael Cheng <michael.cheng@intel.com> Fixes: `47a6ef3fef` ("brw/scoreboard: Use a predicate helper for the nomask workaround") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41713>	2026-05-21 16:50:50 +00:00
Caio Oliveira	26e832d069	brw/scoreboard: Add disabled tests for RegDist baking on Xe2+ Add two tests verifying that ordered RegDist dependencies from uniform/WE_all producers are baked into the consumer's SWSB on Xe2+. Disabled for now since they fail on current main. Reviewed-by: Michael Cheng <michael.cheng@intel.com> Assisted-by: Pi coding agent (Opus-4.7) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41713>	2026-05-21 16:50:50 +00:00
Alyssa Rosenzweig	3a447b4065	jay: use new fs payload variable more blow up harder if we try to load stuff in the wrong stage Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	ababf12b04	jay: add a hack until we munge barycentrics dynamically Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	a56aa9547b	jay: Call constant folding before collecting FS outputs Fixes "multiple stores to the same location" assertions in tests like dEQP-VK.pipeline.monolithic.color_write_enable_maxa.cwe_after_bind.attachments3_more0 In that case, the stores were actually to different locations, but some constant additions hadn't been folded into the location field yet. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	23884ee02c	jay: Prohibit JAY_STRIDE_8 for EXPAND_QUAD No idea why we're getting a stride 8 here, but we can't handle it. Fixes baldurs_gate_3.vk.foz --graphics-pipeline-range 2248 2249. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	a9525f4b44	jay: hack for sample position Adding this to the list of design constraints for the next RA rework. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	1e31be0e52	jay: fix omask on single sample dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_pixel.singlesample_rbo Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	6a02e228bc	jay: Implement load_fs_config_intel We could lower this in to load_push_data_intel in NIR, but it's trivial, and probably less code just to implement it directly. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	3d91cb9d1e	jay: Implement coverage mask This is the actual MSAA coverage mask. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	35622f165f	jay, nir: Make a dispatch_mask_intel intrinsic jay is trying to use the fragment shader dispatch mask for helper invocation lowering, but it was using load_sample_mask_in for that (now load_coverage_mask_intel). But this isn't the MSAA coverage mask, the two are different payload fields. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	0f3a311591	jay: Implement sample position Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	a590500802	jay: Add a GPR_FROM_UGPRS opcode Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	4555cd23c6	jay: Set Dispatch GRF Start Register in jay_setup_payload() We want it to be set to wherever the push constants ended up. Setting it close to the setup_payload_push() call makes this easier. We'll also be adding some extra UGPRs for the fragment shader payload soon, and the partitioning code will just have one big UGPR partition for payload fields, push constants, and general purpose UGPRs, so it really won't know how to do this very well without duplicating a bunch of information. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	0670b40013	jay: Add comments summarizing the PS thread payload layout The documentation is large and hard to follow due to all the optional fields and the SIMD16 vs. SIMD32 split for barycentrics. This quick summary helps clarify what fields exist, which are split for SIMD32 or kept together, and which pairs of registers are involved for splits. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	6c142f7edc	jay: Implement sample mask writes Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	49299050ea	jay: Implement fragment shader stencil writes Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	b01d286083	jay: Move render target store payload/descriptor construction to backend Constructing the render target store payload is more complex than we can reasonably handle at the NIR level. The main reason is that samplemask and stencil are packed 16-bit and 8-bit parameters, respectively, which are intermixed with other values that are 32-bit. In SIMD32 mode, the packed sub-32-bit values take up fewer registers than normal values. Currently we also don't specialize the NIR for each FS dispatch width, and we can't construct the message descriptor without knowing it. So, we alter nir_intrinsic_store_render_target_intel to take each of the expected parameters - colour, depth, stencil, samplemask, src0_alpha, and discard predicate. We construct the payloads and descriptors in the backend. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	bc22a37d98	jay: schedule for pressure Implement a simple pre-RA bottom-up list scheduler with the goal of decreasing register pressure. On Xe2, this significantly reduces spilling. SSA form allows us to estimate register demand cheaply and accurately, which theoretically [1] gives this algorithm the two Hippocratic properties: 1. Shaders with low register pressure are unaffected. 2. Register pressure can only be decreased, never increased. In other words: first, do no harm. The heuristic itself is very simple: greedily choose instructions that decrease liveness using a backwards list scheduler. This is far from optimal! But thanks to the above properties, even a heuristic that picked random instructions would be a win overall - by construction, we can only ever win. In other words: this scheduler is your older brother powering off the game console any time he's about to lose a game, maintaining a 100% win rate. [1] In reality, neither property is strictly satisfied due to the messy details of mapping our clean logical model onto Intel's many weird physical register files. Nevertheless, the algorithm is well-motivated and the empirical results on Xe2 are excellent. SIMD16: Totals: Instrs: 2754194 -> 2753957 (-0.01%); split: -0.23%, +0.22% CodeSize: 41094768 -> 41092768 (-0.00%); split: -0.23%, +0.23% Number of spill instructions: 1724 -> 1129 (-34.51%) Number of fill instructions: 1912 -> 1119 (-41.47%) Totals from 168 (6.35% of 2647) affected shaders: Instrs: 850994 -> 850757 (-0.03%); split: -0.75%, +0.73% CodeSize: 12825680 -> 12823680 (-0.02%); split: -0.74%, +0.73% Number of spill instructions: 1724 -> 1129 (-34.51%) Number of fill instructions: 1912 -> 1119 (-41.47%) SIMD32: Totals: Instrs: 4688858 -> 4557800 (-2.80%); split: -3.53%, +0.74% CodeSize: 70177200 -> 68214816 (-2.80%); split: -3.53%, +0.74% Number of spill instructions: 50316 -> 45795 (-8.99%); split: -9.56%, +0.57% Number of fill instructions: 51526 -> 45075 (-12.52%); split: -13.23%, +0.71% Totals from 819 (30.94% of 2647) affected shaders: Instrs: 3810182 -> 3679124 (-3.44%); split: -4.35%, +0.91% CodeSize: 57044000 -> 55081616 (-3.44%); split: -4.35%, +0.91% Number of spill instructions: 49264 -> 44743 (-9.18%); split: -9.76%, +0.58% Number of fill instructions: 50182 -> 43731 (-12.86%); split: -13.58%, +0.73% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	81e21a8756	jay: factor jay_op_(starts,ends)_block queries Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	e72ffb0046	jay: annotate pure sends for scheduling, CSE, etc Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	c069b7e47c	jay/opt_propagate: avoid branching on poison logically it doesn't matter because we'll bail on a later check, but this is still UB and therefore releases nasal demons. i am jealous of Faith's Rust compilers. there, I said it. ==107281== Conditional jump or move depends on uninitialised value(s) ==107281== at 0x7069768: propagate_backwards (jay_opt_propagate.c:327) ==107281== by 0x7069768: jay_opt_propagate_backwards (jay_opt_propagate.c:367) ==107281== by 0x7058960: jay_compile (jay_from_nir.c:2677) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	4b0c3f5c32	jay/lower_scoreboard: add asserts on key bounds if these are botched you get UB (-: Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	4c97493b69	jay/lower_scoreboard: handle accumulator hazard Challenging to hit but fixes dEQP-GLES3.functional.shaders.swizzle_math_operations.vector_multiply.mediump_ivec4_wzyx_zyxw_fragment with scheduling changes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	9a68101bc2	jay/liveness: drop redundant source filtering Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	9b68b4e7a1	jay/liveness: speed up physical CFG merging on top of scheduler changes, compile-time of shaders/blender/1017.shader_test: Difference at 95.0% confidence -0.00173202 +/- 0.00116931 -0.791537% +/- 0.532384% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	1b50d3eed2	jay/liveness: remove pointless bitset init dup initializes it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	5da3b57605	jay: insert simd32 deswizzle in a dedicated pass we don't actually need the DESWIZZLE pseudo instruction, and the pseudo op complicates pre-RA scheduling. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig	47c6601d5e	jay: relax fragment payload layout this isn't optimal but it should unblock bring up. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Co-authored-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:46 +00:00
Kenneth Graunke	cb75c9f962	brw: Lower sample_pos for non-per-sample shaders in NIR We generalize the sample_mask_in lowering to handle this too. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>	2026-05-21 15:34:45 +00:00
Collabora's Gfx CI Team	18ba81e5b6	Uprev Piglit to 6fd29fe44f8857b876a67bee962919635f22ecc8 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details `11ce9eb56e...6fd29fe44f` Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40989>	2026-05-20 21:37:44 +00:00
Christoph Neuhauser	7eba054c5b	anv: Add compute only divergent atomics fusion optimization for Blender Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Blender uses atomic operations as part of its virtual shadow mapping implementation. Virtual shadow mapping page tagging in compute shaders benefits from divergent atomics fusion, while fragment shaders doing the atomic raster step in general have worse performance with this optimization turned on. Thus, an option is added to only apply divergent atomics fusion to compute shaders in ANV, and this option is enabled for Blender. Initial support for divergent atomics fusion optimization in ANV was added in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631. Signed-off-by: Christoph Neuhauser <christoph.neuhauser@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41706>	2026-05-20 19:29:15 +00:00
Jordan Justen	28f6a442c6	brw/compact: Precompact using 2src fields on 3src instructions In shader-db, with `-p skl`, shaders/0ad/12.shader_test does not compact an instruction because precompact overwrites portions of the instruction. (Treating the three source instruction as a two source when accessing instruction fields.) This instruction could be compacted: mad(8) g65<1>F g61<4,4,1>F g64<4,4,1>F -g17<4,4,1>F { align16 1Q }; But, since precompact erroneously sets bits, the instruction isn't compacted. Fossil testing: * Tested with `0a3f3fd193` ("brw: drop unused color_outputs_valid key") reverted, as fossils are currently producing inconsitent results otherwise. * Tested skl, icl, dg2, mtl, lnl, bmg and ptl. Only skl had a change. SKL: Totals: CodeSize: 8335219296 -> 8320248992 (-0.18%) Totals from 359508 (14.42% of 2492689) affected shaders: CodeSize: 2838254352 -> 2823284048 (-0.53%) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41588>	2026-05-20 11:52:52 -07:00
Iván Briano	d0253e25c4	intel/dev: ARL-H supports EXECUTE_INDIRECT_* Signed-off-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41372>	2026-05-19 22:41:53 +00:00
Iván Briano	b420958166	anv, iris: fix MOCS Index setting of EXECUTE_INDIRECT_* commands Unlike most other things where the MOCS setting combines the MOCS Index and the protected memory bit, the EXECUTE_INDIRECT_DRAW/DISPATCH commands take only the MOCS Index, and it's limited to only 4 bits. Enabling the feature on ARL-H caused some tests to hit an assert when the MOCS selected ended up out of range. Rename the field to avoid confusion (and match documentation) and set it through a helper function that calls the same old function and shifts it down to fit. Fixes: `d1109f67bb` ("iris: Emit EXECUTE_INDIRECT_DRAW when available") Fixes: `d161e3c2e2` ("iris: Emit a EXECUTE_INDIRECT_DISPATCH when available") Fixes: `580728564e` ("anv: Emit a EXECUTE_INDIRECT_DISPATCH when available") Fixes: `6d4f43f0d6` ("anv: Emit EXECUTE_INDIRECT_DRAW when available") Fixes: `7a9e82e82f` ("genxml/12.5: Add the EXECUTE_INDIRECT_DISPATCH instruction") Fixes: `4229757309` ("genxml/12.5: Add the EXECUTE_INDIRECT_DRAW instruction") Signed-off-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41372>	2026-05-19 22:41:53 +00:00
Iván Briano	7b26ff692b	anv: fix return of cmd_buffer_set_indirect_stride() function Unless the tristate is unset, which is not, it will be true when casted to bool, as the return of this function expects. Fixes: `2741ddd75a` ("anv: fix issues found with indirect data stride") Signed-off-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41372>	2026-05-19 22:41:53 +00:00
Konstantin Seurer	690d9b0d00	util/u_trace: Rework resource management Stops allocating events in chunks. u_trace_event is allocated using a linear allocator which has minimal overhead. Buffers for timestamps are allocated using a custom allocator. As a sideeffect, it is possible to deduplicate consecutive tracepoints. Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41271>	2026-05-19 20:27:59 +00:00
Samuel Pitoiset	54b71e9e77	util: pass a struct to driParseConfigFiles() It would be easier to add more functionalities like shader hashes etc. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41657>	2026-05-19 19:51:45 +00:00
José Roberto de Souza	180d8cb544	intel/brw: Fix nir_intrinsic_load_inline_data_intel register offset calculation Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details In case of nir_intrinsic_load_inline_data_intel it was not using base_offset to create the uniform, instead it was using only the special BRW_INLINE_PARAM_REG value that later will be replaced by the inline_data fixed register. So here using base_offset for both intrinsics, adding BRW_INLINE_PARAM_REG if nir_intrinsic_load_inline_data_intel and then in brw_shader::assign_curb_setup checking for inst->src[i].nr >= BRW_INLINE_PARAM_REG and adjusting brw_reg by the remaining of the subtraction with BRW_INLINE_PARAM_REG. Fixes: `7f19814414` ("brw/nir: handle inline_data_intel more like push_data_intel") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41607>	2026-05-19 19:30:18 +00:00

1 2 3 4 5 ...

16178 commits