fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-05 20:28:04 +02:00

Author	SHA1	Message	Date
Francisco Jerez	760437c4c4	intel/brw/xe3+: Override P value of GRF register classes to increase thread parallelism. This causes the graph coloring allocator to use the optimistic coloring codepath for all nodes whose total Q value exceeds the threshold of 96 GRFs, in order to do a better job at minimizing the register requirement of programs even when they are trivially colorable. At the threshold of 96 GRFs the number of threads available per EU starts decreasing as the number of register blocks requested by the program increases, so decreasing the number of registers can increase performance. That showed up in some test cases as a performance inversion from the enabling of VRT, since the extension of the register set to 256 GRFs has the side effect of making some non-trivially colorable programs trivially colorable, which would cause the register allocator to do a worse job at ordering the (trivial) allocations due to the optimistic coloring path being skipped, leading to increased register use and reduced performance. The following Traci test cases improve significantly as a result of this change (4 iterations, 5% significance): MetroExodus-trace-dx11-2160p-ultra: 1.90% ±0.85% BaldursGate3-trace-dx11-1440p-ultra: 1.47% ±0.38% Palworld-trace-dx11-1080p-med: 1.01% ±0.09% TerminatorResistance-trace-dx11-2160p-ultra: 0.95% ±0.29% Control-trace-dx11-1440p-high: 0.87% ±0.50% Even though lowering the P value threshold is expected to have a cost in compile time theoretically due to the increased use of the slower optimistic path of the graph coloring allocator, this doesn't actually show up in my numbers, my shader-db and fossil-db compile-time numbers don't show any statistically significant change (13 iterations, 5% significance). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:55 +00:00
Francisco Jerez	74168a601e	util/ra: Allow driver to override class P value. This is helpful for the driver to have the option to provide a custom threshold for the PQ test performed by the graph coloring algorithm. A threshold lower than the physical number of registers is helpful on platforms where the number of registers used can impose a limit on the thread parallelism of the program. In such platforms even though a passing PQ test guarantees that the node can be pushed onto the stack and neglected while coloring the remaining nodes, the ordering in which this happen can have a dramatic effect in the register pressure of the resulting shader and therefore also on the thread parallelism of the program. Setting a P value threshold lower than the real P value will cause nodes with Q value above the threshold to use the existing optimistic coloring heuristic that takes the effort of ordering nodes in the stack by Q value, in order to do a better job at minimizing the total register requirement of the program. Even though this causes us to hit the optimistic codepaths for trivially colorable nodes the interference graph is still guaranteed to be trivially colorable if it was trivially colorable without the override. The use of a threshold lower than the real P value will come at a compile-time performance cost, the specific trade-off between compile-time and run-time can be adjusted by the driver based on the number of registers available to each thread without causing a hit to thread parallelism. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:55 +00:00
Francisco Jerez	35ac517780	intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode. This defines a new pre-RA scheduling mode similar to BRW_SCHEDULE_PRE but more aggressive at optimizing for minimum latency rather than minimum register usage. The main motivation is that on recent xe3 platforms we use a register allocation heuristic that packs variables more tightly at the bottom of the register file instead of the round-robin heuristic we used on previous platforms, since as a result of VRT there is a parallelism penalty when a program uses more GRF registers than necessary. Unfortunately the xe3 tight-packing heuristic severely constrains the work of the post-RA scheduler due to the false dependencies introduced during register allocation, so we can do a better job by making the scheduler aware of instruction latencies before the register allocator introduces any false dependencies. This can lead to higher register pressure, but only when the scheduler decides it could save cycles by extending a live range. It makes sense to preserve the preexisting BRW_SCHEDULE_PRE as a separate mode since some workloads can still benefit from neglecting latencies pre-RA due to the trade-off mentioned between parallelism and GRF use, a future commit will introduce a more accurate estimate of the expected relative performance of BRW_SCHEDULE_PRE vs. BRW_SCHEDULE_PRE_LATENCY taking into account this trade-off. In theory this could also be helpful on earlier pre-xe3 platforms, but the benefit should be significantly smaller due to the different RA heuristic so it hasn't been tested extensively pre-xe3. The following Traci tests are improved significantly by this change on PTL (nearly all tests that run on my system are affected positively): Ghostrunner2-trace-dx11-1440p-ultra: 7.12% ±0.36% SpaceEngineers-trace-dx11-2160p-high: 5.77% ±0.43% HogwartsLegacy-trace-dx12-1080p-ultra: 4.40% ±0.03% Naraka-trace-dx11-1440p-highest: 3.06% ±0.43% MetroExodus-trace-dx11-2160p-ultra: 2.26% ±0.60% Fortnite-trace-dx11-2160p-epix: 2.12% ±0.53% Nba2K23-trace-dx11-2160p-ultra: 1.98% ±0.30% Control-trace-dx11-1440p-high: 1.93% ±0.36% GodOfWar-trace-dx11-2160p-ultra: 1.62% ±0.47% TotalWarPharaoh-trace-dx11-1440p-ultra: 1.55% ±0.18% MountAndBlade2-trace-dx11-1440p-veryhigh: 1.51% ±0.37% Destiny2-trace-dx11-1440p-highest: 1.44% ±0.34% GtaV-trace-dx11-2160p-ultra: 1.26% ±0.27% ShadowTombRaider-trace-dx11-2160p-ultra: 1.10% ±0.58% Borderlands3-trace-dx11-2160p-ultra: 0.95% ±0.43% TerminatorResistance-trace-dx11-2160p-ultra: 0.87% ±0.22% BaldursGate3-trace-dx11-1440p-ultra: 0.84% ±0.28% CitiesSkylines2-trace-dx11-1440p-high: 0.82% ±0.22% PubG-trace-dx11-1440p-ultra: 0.72% ±0.37% Palworld-trace-dx11-1080p-med: 0.71% ±0.26% Superposition-trace-dx11-2160p-extreme: 0.69% ±0.19% The compile-time cost of shader-db increases significantly by 1.85% after this commit (14 iterations, 5% significance), the compile-time of fossil-db doesn't change significantly in my setup. v2: Addressed interaction with `81594d0db1`, since the code that calculates deps, delays and exits is no longer mode-independent after this change. Instead of reverting that commit (which is non-trivial and would have a greater compile-time hit) simply reconstruct the scheduler object during the transition between BRW_SCHEDULE_PRE_LATENCY and any other PRE mode that doesn't require instruction latencies. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:55 +00:00
Francisco Jerez	501b1cbc2c	intel/brw: Fix behavior of scheduler around flag register writes. We were currently treating explicit flag writes and reads as a full scheduler barrier, which is unnecessary since the tracking we already do handles explicit flag access correctly so there is no reason for taking a possibly large performance hit from add_barrier_deps(). Found by inspection while trying to understand the poor scheduling of some fragment shaders. Improves performance by a small but statistically significant amount (4 iterations, 5% significance) for the following Traci tests in combination with a subsequent commit that makes the pre-RA scheduler sensitive to instruction latencies: SpaceEngineers-trace-dx11-2160p-high: 0.66% ±0.30% MountAndBlade2-trace-dx11-1440p-veryhigh: 0.62% ±0.23% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:55 +00:00
Francisco Jerez	17b068ed1c	intel/brw/xe3+: Handle SENDG in instruction scheduler. We weren't handling the SHADER_OPCODE_SEND_GATHER instruction in the instruction scheduler and this was leading to reduced performance in many programs since SEND instructions have the longest latency and tend to be among the most critical to schedule efficiently. Handle SENDG similarly to SEND since the timings of both instructions are mostly bound by the shared function which doesn't care if the message was sent by SEND or SENDG. Improves performance significantly in the following Traci traces (4 iterations, 5% significance), most of them regressions from SENDG being enabled: MetroExodus-trace-dx11-2160p-ultra: 1.99% ±0.88% HogwartsLegacy-trace-dx12-1080p-ultra: 1.33% ±0.20% GtaV-trace-dx11-2160p-ultra: 1.12% ±0.19% Borderlands3-trace-dx11-2160p-ultra: 1.00% ±0.58% TerminatorResistance-trace-dx11-2160p-ultra: 0.98% ±0.27% Control-trace-dx11-1440p-high: 0.91% ±0.36% Naraka-trace-dx11-1440p-highest: 0.90% ±0.30% Ghostrunner2-trace-dx11-1440p-ultra: 0.87% ±0.38% Palworld-trace-dx11-1080p-med: 0.71% ±0.17% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:54 +00:00
Lionel Landwerlin	d6ee5b7177	anv: remove divergence requirement Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Not required since we've disabled maintenance8 support. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `d39e443ef8` ("anv: add infrastructure for common vk_pipeline") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37242>	2025-09-09 21:25:06 +00:00
Mike Blumenkrantz	dee9600ac7	zink: eliminate buffer refcounting to improve performance Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36296>	2025-09-09 20:47:38 +00:00
Mike Blumenkrantz	b3133e250e	gallium: add pipe_context::resource_release to eliminate buffer refcounting refcounting uses atomics, which are a significant source of CPU overhead in many applications. by adding a method to inform the driver that the frontend has released ownership of a buffer, all other refcounting for the buffer can be eliminated see MR for more details Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36296>	2025-09-09 20:47:38 +00:00
Mike Blumenkrantz	7c1c2f8fce	zink: ensure transient surface is created when doing msaa expand forgetting this can lead to res->transient being NULL Fixes: `ef3f798957` ("zink: prune zink_surface down to the imageview and create/fetch on demand") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37257>	2025-09-09 19:10:25 +00:00
Caio Oliveira	67fcfed67b	brw: Add `FILE *` parameter to dump_assembly Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37259>	2025-09-09 10:40:42 -07:00
Anna Maniscalco	19f32df4ff	mailmap: Update my name Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Egg-crAcked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37230>	2025-09-09 16:44:38 +00:00
Mike Blumenkrantz	8eca6ee134	zink: just reference compute progs to batch on delete eliminates a bunch of pointless refcounting Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37139>	2025-09-09 16:14:16 +00:00
Mike Blumenkrantz	1e07f58c62	zink: do bindless init when binding a bindless shader, not on create this avoids doing bindless init in multi-context scenarios where one context is used solely for shader compilation Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37139>	2025-09-09 16:14:16 +00:00
Mike Blumenkrantz	e21438192a	zink: set current compute prog after comparing against current compute prog not sure how this was never caught before now? cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37139>	2025-09-09 16:14:16 +00:00
Mike Blumenkrantz	3fecf68784	zink: only set compute module info on dispatch (after compile fence) this otherwise can attempt to access thread data cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37139>	2025-09-09 16:14:14 +00:00
Lionel Landwerlin	febe90e109	vulkan: remove incorrect assert Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details You can have a group with 0 shaders in it. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `69a04151db` ("vulkan/runtime: add ray tracing pipeline support") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13858 Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37249>	2025-09-09 13:34:05 +00:00
Rhys Perry	e2181744c2	aco/tests: add barrier-to-waitcnt tests Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	0f32b573a4	aco/gfx10: skip waitcnts or use vm_vsrc(0) for workgroup lds barriers fossil-db (navi21): Totals from 36594 (45.84% of 79825) affected shaders: Instrs: 19922581 -> 19922563 (-0.00%) CodeSize: 103616980 -> 103616956 (-0.00%) Latency: 69862064 -> 69053273 (-1.16%) InvThroughput: 14607708 -> 14606308 (-0.01%); split: -0.01%, +0.00% fossil-db (navi31): Totals from 1641 (2.06% of 79825) affected shaders: Instrs: 1247591 -> 1247875 (+0.02%); split: -0.00%, +0.03% CodeSize: 6259516 -> 6260612 (+0.02%); split: -0.00%, +0.02% Latency: 7657224 -> 7577299 (-1.04%); split: -1.05%, +0.00% InvThroughput: 1150669 -> 1148171 (-0.22%); split: -0.22%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	ac882985c0	aco/gfx10: skip waitcnts or use vm_vsrc(0) for workgroup vmem barriers Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	145b178de2	aco: fix workgroup-scope barrier between vmem and lds A barrier between two lds/vmem instructions needs to ensure that the second starts after the first finishes, which means that we can't just skip workgroup-scope vmem barriers if there is a lds instruction later. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	02718fd4c5	aco: use a separate event for sendmsg_rtn Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	5812c2ea89	aco: update waitcnt events for exports Include primitive, dual source blend and POS4 exports. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	711c023b55	aco: remove waitcnt code for POPS We now insert barriers around these instead. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	005694fe1f	aco: remove waitcnt code for SMEM stores These were removed in GFX10.3 and we haven't used them in a while. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	20cd5cf5f7	aco: delay barrier waitcnt until they are needed fossil-db (navi21): Totals from 44 (0.06% of 79825) affected shaders: Instrs: 16001 -> 15932 (-0.43%); split: -0.46%, +0.02% CodeSize: 85800 -> 85548 (-0.29%); split: -0.30%, +0.01% Latency: 190124 -> 173458 (-8.77%) InvThroughput: 23605 -> 22756 (-3.60%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	843acfa50b	aco: add a separate barrier_info for release/acquire barriers These can wait for different sets of accesses. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	6c446c2f83	aco: refactor waitcnt pass to use barrier_info Currently there's just barrier_info_all, but more will be added later. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	21332609b9	aco: don't move acquire barriers before interlock begin Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	0ee1c137f9	aco: don't move release barriers after interlock end Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	7c056dd473	aco: add is_atomic_or_control_instr helper Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Rhys Perry	df6a3b7619	aco: reduce cost of using values defined in predecessors For code like: if (cond) { val = load() } use(val) The "use(val)" now has a similar cost to a use inside the IF. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>	2025-09-09 12:34:40 +00:00
Gert Wollny	b7ac5d8453	r600/sfn: Optimize pred(not X != 0) to pred(X == 0) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:54 +00:00
Gert Wollny	125ce0f909	r600/sfh: Handle 64 bit comparisons in predicate optimization Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:54 +00:00
Gert Wollny	abe9b61212	r600/sfn: relax restrictions when optimizing predicate evaluation with a register If the comparison comes right before the predicate evaluation it still can be contracted. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:53 +00:00
Gert Wollny	bbbb2be123	r600/sfn: emit 64 bit predicates like normal ALU ops Also clean up the scheduler changes we did to deal with one-slot and two slot predicate ops at the same time. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:53 +00:00
Gert Wollny	51d8ca2dff	r600/sfn: optimize comparison results * optimize not(compare(a,b)), nir_opt_algebraic does this only if the comparison result is used only once, but on a vector arch we still get an advantage when doing this, because it reduces dependencies. * optimize b2f32(compare(a,b)), this is r600 specific Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:53 +00:00
Gert Wollny	82dffae611	r600/sfn: don't use dummy regs in alu ops when no dest register is needed Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:53 +00:00
Gert Wollny	4f1f5aa02d	r600/sfn: Add handling of channels for dest-less ALU ops This will be used to get rid of some dummy register handling. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:52 +00:00
Gert Wollny	90b2fbbab4	r600/sfn: Pass chan and dest_clamp to alu op if no dest register is given v2: move common code Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:52 +00:00
Gert Wollny	4dd3951323	r600/sfn: fix op2_pred_sete_64 opcode Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>	2025-09-09 12:11:52 +00:00
Georg Lehmann	08b58c3fac	nir/lower_subgroups: remove lower_fp64 option This was incorrect (it also lowered int64 reductions/scans), and the only user can just use the general callback to precisely only lower what it wants. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>	2025-09-09 11:09:22 +00:00
Georg Lehmann	687510495f	nir: remove subgroup size related nir_shader_compiler_options members This was added with the goal to eventually replace the per pass subgroup/ballot size options, but that won't work because some backends don't have a fixed subgroup size across the compilation process. It was also mostly added to hack around mesa state tracker behavior, and we have a better solution there now. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>	2025-09-09 11:09:22 +00:00
Georg Lehmann	c7d5108373	mesa/st: make double subgroup lowering more precise Really don't touch anything else. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>	2025-09-09 11:09:21 +00:00
Georg Lehmann	9bc14a0047	nir/lower_subgroup: optimize reduce/scans with unknown subgroup size We skip iterations with ifs. These can be optimized aways after the subgroup size is known. Every driver should do that because applications depend on it anyway. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>	2025-09-09 11:09:21 +00:00
Rhys Perry	c59a85d406	nir/load_store_vectorize: remove offset check in try_vectorize_shared2 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This doesn't seem useful anymore. fossil-db (gfx1201): Totals from 111 (0.14% of 79839) affected shaders: Instrs: 152356 -> 151883 (-0.31%); split: -0.35%, +0.04% CodeSize: 808484 -> 805584 (-0.36%); split: -0.39%, +0.04% VGPRs: 7880 -> 7844 (-0.46%); split: -0.91%, +0.46% Latency: 4121366 -> 4120648 (-0.02%); split: -0.04%, +0.02% InvThroughput: 814622 -> 815362 (+0.09%); split: -0.02%, +0.11% VClause: 3066 -> 3065 (-0.03%); split: -0.10%, +0.07% SClause: 2594 -> 2593 (-0.04%) Copies: 9412 -> 9447 (+0.37%); split: -0.47%, +0.84% PreSGPRs: 4012 -> 4026 (+0.35%) PreVGPRs: 4025 -> 4070 (+1.12%); split: -0.22%, +1.34% VALU: 80457 -> 81039 (+0.72%); split: -0.08%, +0.80% SALU: 16542 -> 16528 (-0.08%); split: -0.10%, +0.02% VOPD: 39 -> 44 (+12.82%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>	2025-09-09 10:11:52 +00:00
Rhys Perry	0f364aded3	nir/opt_offsets: improve shared2 optimization Combine additions too, instead of just constant offsets. fossil-db (gfx1201): Totals from 97 (0.12% of 79839) affected shaders: Instrs: 145269 -> 144886 (-0.26%); split: -0.27%, +0.01% CodeSize: 762184 -> 759556 (-0.34%); split: -0.36%, +0.01% VGPRs: 5812 -> 5764 (-0.83%) Latency: 4050681 -> 4050528 (-0.00%); split: -0.01%, +0.00% InvThroughput: 617458 -> 617181 (-0.04%); split: -0.05%, +0.00% Copies: 8719 -> 8672 (-0.54%); split: -0.70%, +0.16% PreVGPRs: 3558 -> 3543 (-0.42%); split: -0.59%, +0.17% VALU: 77793 -> 77462 (-0.43%); split: -0.44%, +0.01% SALU: 17028 -> 17009 (-0.11%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>	2025-09-09 10:11:51 +00:00
Rhys Perry	c10e495182	nir/opt_offsets: fix progress determination with offsets that add to zero If the offset is iadd(iadd(iadd(a, 1), b), -1), try_extract_const_addition will create a dead iadd(a, b) and claim that it didn't modify the shader. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>	2025-09-09 10:11:50 +00:00
Rhys Perry	9aad852af8	nir/opt_offsets: report progress if NUW is set Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>	2025-09-09 10:11:50 +00:00
Tomeu Vizoso	5eab4f06d5	teflon/tests: Remove dependency on xtensor Upstream has been moving headers around and breaking users. Because we don't use it for much right now, drop the dependency altogether by open coding some rand() helpers. Issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13681 Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37220>	2025-09-09 11:07:19 +02:00
Sviatoslav Peleshko	b148d47c3e	anv: Always disable Color Blending for unused Render Targets Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Commit `d2f7b6d5` changed the BLEND_STATE update process so that only the used render targets will be updated. This mostly works fine, but in cases when the Dual Source Blending was used previously, we still must turn it off to avoid the undefined behavior that leads to hangs. Fixes: `d2f7b6d5` ("anv: implement VK_KHR_dynamic_rendering_local_read") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13675 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37246>	2025-09-09 07:38:50 +00:00

1 2 3 4 5 ...

211861 commits