fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-25 08:18:11 +02:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	2ed6ff728a	brw: explicitly pad tgl_swsb This lets us treat it as a packed data structure without worrying about garbage. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40829>	2026-04-07 19:32:15 +00:00
Sagar Ghuge	f0ae58df12	intel/compiler: Handle TerminateOnFirstHit in ray query execution Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Once commited and have AABB or triangle intersection found, terminate the traversal if TerminateOnFirstHit ray flag is present. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40773>	2026-04-06 10:00:05 -07:00
Arkady Shlykov	7f7ba20cca	brw: Implement divergent atomics fusion optimization (single message approach) For an atomic with a divergent addr generates a CFG grouping the same addrs values together and emits a single atomic with fused data covering the subgroup. Lanes with other addr values perform a default atomic. Co-authored-by: Jhanani Thiagarajan <jhanani.thiagarajan@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631>	2026-04-03 12:17:01 +00:00
Lionel Landwerlin	fab6f84126	brw: make the program key available on pass_tracker Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631>	2026-04-03 12:17:01 +00:00
Caio Oliveira	0bf3aaedb1	brw: Always use split send in generator Instead of generating special single source send in some cases, always use the split send (called SENDS pre-Xe, and the only option in Xe). Having code-path for single source was relevant for old Gfx versions, but for Gfx9+ split send is always available. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40755>	2026-04-02 18:31:02 +00:00
Kenneth Graunke	ca3cabd2f8	brw: Use nir_texop_resinfo_intel for query_levels and txs Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This eliminates the need to special case query_levels. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40451>	2026-03-29 12:53:10 +00:00
Lionel Landwerlin	fa523aedd0	brw: fence SLM writes between workgroups Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On LSC platforms the SLM writes are unfenced between workgroups. This means a workgroup W1 finishing might have uncompleted SLM writes. Another workgroup W2 dispatched after W1 which gets allocated an overlapping SLM location might have writes that race with the previous W1 operations. The solution to this is fence all write operations (store & atomics) of a workgroup before ending the threads. We do this by emitting a single SLM fence either at the end of the shader or if there is only a single unfenced right, at the end of that block. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13924 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40430>	2026-03-26 22:38:55 +00:00
Georg Lehmann	eef0fa22e0	brw: preserve fp_math_ctrl when lowering cmat alu Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40630>	2026-03-26 13:15:50 +00:00
Kenneth Graunke	204af7e09f	intel/nir: Replace tg4 with txl/txb/tex when splitting texture residency Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details textureGather() returns the four taps that would have been filtered together to produce the value that ordinary texturing operations would return. As such, it should access the same data, so we can use either interchangeably when we're only checking for residency and not returning the actual data. This allows us to mask out some unneeded registers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>	2026-03-24 16:06:29 +00:00
Kenneth Graunke	605ef577b3	intel/nir: Generalize lower_tex_compare to split_tex_residency This splits a single texture-with-residency operation into two halves, one which returns texture data, and another which queries residency. We're currently using this only for a shadow sampling workaround, but the technique is more broadly applicable, if we ever wanted. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>	2026-03-24 16:06:29 +00:00
Kenneth Graunke	dc760104ba	intel/nir: Set new image intrinsic parameters via builder helpers A bit less code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>	2026-03-24 16:06:28 +00:00
Kenneth Graunke	9d07e85287	intel/nir: Use txf builder in intel_nir_lower_sparse Newer helpers make NIR easier to write. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40590>	2026-03-24 16:06:28 +00:00
Tapani Pälli	c75256b2ab	intel/compiler: move validation assert after brw_shader_debug_log Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details When validation fails we print instructions to use INTEL_DEBUG=shaders but that will not help if we assert before dumping shader debug log. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40529>	2026-03-24 04:54:31 +00:00
Ian Romanick	b5e023777c	brw: Change the flags written by some CMP Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details One frustrating thing about the CMP and CMPN instructions is that they always write the flags. Sometimes, however, it is desirable to generate the comparison result without modifying the flags. This would, theoretically, reduce false dependencies that restrict the scheduler's ability to rearrange code, create more opportunities for cmod propagation, save a kitten from a tree, and make a rainbow. Consider this sequence: cmp.ge.f0.0(8) g103<1>F g101<8,8,1>F g39<8,8,1>F cmp.nz.f0.0(8) null<1>D g81<8,8,1>D 0D (+f0.0) if(8) JIP: LABEL19 UIP: LABEL19 It would be advantageous to put the first CMP between the second CMP and the IF, but this cannot be done since the IF depends on the flags generated by the second CMP. This pass enables this rescheduling by changing the first CMP to write to a different flags register. cmp.ge.f1.0(8) g103<1>F g101<8,8,1>F g39<8,8,1>F cmp.nz.f0.0(8) null<1>D g81<8,8,1>D 0D (+f0.0) if(8) JIP: LABEL19 UIP: LABEL19 Sometimes this is also possible by using a different instruction. For example, consider cmp.l.f0.0(8) g103<1>D g101<8,8,1>D 0D This produces 0xffffffff when g101 negative and zero otherwise. This instruction, which does not modifiy the flag, also produces these results: asr(8) g103<1>D g101<8,8,1>D 31D Gfx9 platforms take a hit on instructions due to the instruction added at the end of short shaders by brw_workaround_source_arf_before_eot. shader-db: Lunar Lake, Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Lunar Lake shown) total instructions in shared programs: 17089451 -> 17088766 (<.01%) instructions in affected programs: 766613 -> 765928 (-0.09%) helped: 653 / HURT: 0 total cycles in shared programs: 888832986 -> 887873068 (-0.11%) cycles in affected programs: 549441852 -> 548481934 (-0.17%) helped: 10474 / HURT: 130 LOST: 9 GAINED: 0 Skylake total instructions in shared programs: 19037976 -> 19049719 (0.06%) instructions in affected programs: 3979914 -> 3991657 (0.30%) helped: 503 / HURT: 12303 total cycles in shared programs: 867918242 -> 866930801 (-0.11%) cycles in affected programs: 512773919 -> 511786478 (-0.19%) helped: 13858 / HURT: 66 LOST: 32 GAINED: 0 fossil-db: Lunar Lake Totals: Instrs: 925023504 -> 924950382 (-0.01%); split: -0.01%, +0.00% Cycle count: 106348432916 -> 106116809009 (-0.22%); split: -0.22%, +0.00% Spill count: 3423988 -> 3423930 (-0.00%); split: -0.00%, +0.00% Fill count: 4877087 -> 4876960 (-0.00%); split: -0.01%, +0.00% Max dispatch width: 49087552 -> 49078448 (-0.02%); split: +0.00%, -0.02% Totals from 1099332 (54.44% of 2019443) affected shaders: Instrs: 742670473 -> 742597351 (-0.01%); split: -0.01%, +0.00% Cycle count: 100455549635 -> 100223925728 (-0.23%); split: -0.23%, +0.00% Spill count: 3384366 -> 3384308 (-0.00%); split: -0.00%, +0.00% Fill count: 4837434 -> 4837307 (-0.00%); split: -0.01%, +0.00% Max dispatch width: 26725152 -> 26716048 (-0.03%); split: +0.00%, -0.03% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 997603774 -> 997529238 (-0.01%); split: -0.01%, +0.00% Cycle count: 93904012762 -> 93646730006 (-0.27%); split: -0.28%, +0.00% Spill count: 3710155 -> 3710125 (-0.00%); split: -0.00%, +0.00% Fill count: 5032908 -> 5032819 (-0.00%); split: -0.01%, +0.00% Max dispatch width: 37929640 -> 37811560 (-0.31%) Totals from 1334920 (58.52% of 2281134) affected shaders: Instrs: 817377787 -> 817303251 (-0.01%); split: -0.01%, +0.00% Cycle count: 88468851658 -> 88211568902 (-0.29%); split: -0.29%, +0.00% Spill count: 3663353 -> 3663323 (-0.00%); split: -0.00%, +0.00% Fill count: 4991629 -> 4991540 (-0.00%); split: -0.01%, +0.00% Max dispatch width: 20245832 -> 20127752 (-0.58%) Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 1013433769 -> 1013363273 (-0.01%); split: -0.01%, +0.00% Cycle count: 85766921182 -> 85509316620 (-0.30%); split: -0.31%, +0.00% Spill count: 3903923 -> 3903944 (+0.00%); split: -0.00%, +0.00% Fill count: 6801983 -> 6801948 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 37896320 -> 37805320 (-0.24%); split: +0.00%, -0.24% Totals from 1333814 (58.54% of 2278396) affected shaders: Instrs: 830200531 -> 830130035 (-0.01%); split: -0.01%, +0.00% Cycle count: 80746184101 -> 80488579539 (-0.32%); split: -0.32%, +0.01% Spill count: 3855771 -> 3855792 (+0.00%); split: -0.00%, +0.00% Fill count: 6755513 -> 6755478 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 20301456 -> 20210456 (-0.45%); split: +0.00%, -0.45% Skylake Totals: Instrs: 519389758 -> 519874108 (+0.09%); split: -0.00%, +0.10% Cycle count: 57932316132 -> 57789433956 (-0.25%); split: -0.25%, +0.00% Spill count: 636741 -> 636715 (-0.00%); split: -0.01%, +0.00% Fill count: 860470 -> 860357 (-0.01%); split: -0.02%, +0.00% Max dispatch width: 32527800 -> 32481792 (-0.14%); split: +0.00%, -0.14% Totals from 1080380 (62.25% of 1735462) affected shaders: Instrs: 411976399 -> 412460749 (+0.12%); split: -0.00%, +0.12% Cycle count: 54291447615 -> 54148565439 (-0.26%); split: -0.27%, +0.00% Spill count: 602993 -> 602967 (-0.00%); split: -0.01%, +0.00% Fill count: 734459 -> 734346 (-0.02%); split: -0.02%, +0.00% Max dispatch width: 18626096 -> 18580088 (-0.25%); split: +0.00%, -0.25% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>	2026-03-24 01:31:26 +00:00
Ian Romanick	31de96d321	brw/lower_regioning: Allow integer conversions in SEL The Bspec says that SEL sources and destination can be any mix of B, W, and *D. We should allow those. Specifically, without this change, this instruction sel.sat.l(8) v548:UD, v899:D, 255d gets unnecessarily split into two instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>	2026-03-24 01:31:26 +00:00
Ian Romanick	dff1e8ae28	brw: Handle scalars and swizzles correctly in is_const_zero v2: Massive simplification based on feedback from Ken. Fixes: `96cde9cc01` ("intel/fs: Emit better code for bfi(..., 0)") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>	2026-03-24 01:31:25 +00:00
Ian Romanick	985ace332b	brw/algebraic: Allow mixed types in saturate constant folding Prevents assertion failures in func.shader-ballot.basic.q0 and other tests starting with "nir/algebraic: Optimize some b2f of integer comparison". Vector immediates, bfloat, and 8-bit floats are still not supported. v2: Almost complete re-write based on suggestions from Ken. v3: Don't retype() on a brw_imm_f value. Fixes: `f8e54d02f7` ("intel/compiler: Relax mixed type restriction for saturating immediates") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>	2026-03-24 01:31:25 +00:00
Marek Olšák	fa5175023b	Final rename of sha1 names to blake3 Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:28 +00:00
Marek Olšák	ae9ea27e0d	Rename _sha1 names to _blake3 Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:28 +00:00
Marek Olšák	102d41799b	Rename more sha and sha1 names to blake3 Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:28 +00:00
Marek Olšák	d4831aaf5f	Rename sha1_* and sha_* names to blake3_* Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:28 +00:00
Marek Olšák	c0ac992a2a	Remove mesa-sha1.h Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:27 +00:00
Marek Olšák	53c64973e8	Inline _mesa_sha1_compute/format, remove the other unused ones _mesa_sha1_format has a few remaining uses, so it's moved to build_id.c, which is its last user. Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:27 +00:00
Marek Olšák	699f9d7066	Inline _mesa_sha1_init/update/final functions Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:27 +00:00
Marek Olšák	a965ada6ee	Inline mesa_sha1, SHA1_CTX Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:27 +00:00
Marek Olšák	0da88d237a	Inline SHA1_DIGEST_STRING_LENGTH Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:27 +00:00
Marek Olšák	110632f702	Inline SHA1_DIGEST_LENGTH Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>	2026-03-23 07:03:27 +00:00
Georg Lehmann	ec331cc48a	nir: replace lower_ldexp with has_ldexp I can be bothered to fix all the backends that don't set lower_ldexp, and only two backends have ldexp anyway. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33900>	2026-03-20 08:15:08 +00:00
Iván Briano	fd556e54f6	brw: do not omit RT writes if dual_src_blend is on Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Dual source blending when one of the sources is not written to leaves those values undefined, but the other should still be valid. By omitting unwritten outputs, we ended up not writing anything at all for the case that OUT1 is written to but OUT0 is undefined. Fixes new CTS tests: dEQP-VK.pipeline..blend.dual_source.undefined_output.first Cc: mesa-stable Signed-off-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40357>	2026-03-19 23:38:40 +00:00
Caio Oliveira	dcba49d7ef	intel/compiler: Handle shuffle_*_intel intrinsics in bit size lowering Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40376>	2026-03-17 17:21:52 +00:00
Kenneth Graunke	9f77991751	brw: Simplify mark_last_urb_write_with_eot() Just tag the last instruction, drop useless dead code elimination. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>	2026-03-12 21:40:37 +00:00
Kenneth Graunke	4bfa7a602c	brw: Don't emit HALT_TARGET for VS/TCS/TES/GS This isn't needed and will allow simplifications in the next patch. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>	2026-03-12 21:40:37 +00:00
Kenneth Graunke	2b6c6f8130	brw: Lower TCS single patch invocation ID calculations in NIR This is a bit less code and also drops one more TCS-specific thing from the "run" function. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>	2026-03-12 21:40:37 +00:00
Kenneth Graunke	66fbfe7bf3	brw: Fix single patch thread dispatch masks in NIR Arguably a little more code but it brings us a bit closer to not needing separate per-stage "run" functions. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>	2026-03-12 21:40:37 +00:00
Kenneth Graunke	4a9aa3ecc4	brw: Combine brw_assign_*_urb_setup() into one function They all do exactly the same thing, except that GS multiplies by an extra factor, and TCS has urb_read_length == 0 so it skips one line. No need for four copies. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>	2026-03-12 21:40:37 +00:00
Kenneth Graunke	7d463a45f7	brw: Simplify GS load_invocation_id handling Just return the register instead of having multiple functions stash the register in an array of registers. Way too much hoopla here. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>	2026-03-12 21:40:37 +00:00
Kenneth Graunke	9933882182	brw: Purge source_depth_to_render_target This was used for Gfx4-5. Since then, we're just passing around a boolean that nobody wants. Even if someone did, a better plan is to just check nir->info directly. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>	2026-03-12 21:40:37 +00:00
Sagar Ghuge	cb423ee636	anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921 WA states that we need to allocate maximum number of stackIDs per DSS from RT_DISPATCH_GLOBALS to 2048. We can still throttle/control the CFE_STATE::StackID to be in range specified by the field. This does impact performance having CFE_STATE::stackIDs capped to 2K by default. More the outstanding ray queries, larger the working set and have more impact on cache hit rate. This affect performance on Xe2+ onwards: * Boundary Benchmark: 36.2% * Solar Bay extreme: 9.8% * Hitman world of assassination: 3.9% Fixes: `c1a44e8d43` ("anv: force StackIDControl value for Wa_14021821874") Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40310>	2026-03-10 22:41:54 +00:00
Lionel Landwerlin	f508c6acbb	brw/nir: improve shader_indirect_data_intel handling Use is_scalar to know if we can do transpose loading. Also enable vectorization if 2 intrinsics share the same source (it means the only difference is the base). Fixes: `e14d6b535c` ("brw/nir: add new intrinsics to load data from the indirect address") Tested-by: Felix DeGrood <felix.j.degrood@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40308>	2026-03-10 18:24:04 +00:00
Paulo Zanoni	85751506ab	elk: don't use instr->const_index[] directly From what I understand, use of const_index[] by the driver is dangerous and should be avoided, as commits such as `a6330ed4d0` ("nir: add ACCESS to load_uniforms") may result in the indexes changing, breaking the driver. Switch to using the parameter names in order to make the code more future-proof. For elk_fs_nir.cpp and elk_vec4_tes.cpp we can verify in the generated nir_intrinsics.c that the wanted value is actually nir_intrinsic_base(). For elk_nir.c, according to Caio Oliveira: "The code is checking for certain load/store via the is_input() and is_output() checks a few lines above. I've checked all them have BASE at 0." Thanks to Ian Romanick for his guidance regarding this patch. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39438>	2026-03-10 01:03:42 +00:00
Ian Romanick	ffd4497e48	brw/asm: Don't drop accumulator number in the assembler Previously "acc1" or "acc2" would be stored as acc0. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>	2026-03-09 19:21:39 +00:00
Ian Romanick	1ae7a82811	brw: Fix encoding of accumulator sources of 3-source instructions Previously the accumulator was always forced to be acc0. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>	2026-03-09 19:21:39 +00:00
Ian Romanick	6531c425a0	brw/emit: Src1 can be accumulator on Gfx12.5 and newer v2: Add Bspec reference number. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>	2026-03-09 19:21:39 +00:00
Ian Romanick	c3a5b62c08	brw/validate: Perform more 3-src validation in brw_validate instead of brw_eu_emit v2: s/Lake/Ice Lake/ in a comment. Noticed by Caio. Add a missing Xe2 Bspec reference number. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>	2026-03-09 19:21:39 +00:00
Ian Romanick	1f45e33072	brw/validate: Implicit read of accumulator cannot also have explicit read v2: Add Bspec reference number. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>	2026-03-09 19:21:38 +00:00
Ian Romanick	8a6de2d973	brw/validate: Eliminate duplicate integer multiply validation I think two MRs must have crossed in the mail so to speak. Keep Caio's formatting and error message, and keep my PRM quote. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40226>	2026-03-09 19:21:38 +00:00
Ian Romanick	64c60582b5	elk/algebraic: Don't optimize SEL.L.SAT or SEL.G.SAT shader-db: Broadwell total instructions in shared programs: 18607516 -> 18607530 (<.01%) instructions in affected programs: 2095 -> 2109 (0.67%) helped: 0 / HURT: 8 total cycles in shared programs: 955704436 -> 955702925 (<.01%) cycles in affected programs: 34299 -> 32788 (-4.41%) helped: 2 / HURT: 6 All Haswell and older platforms had similar results. (Haswell shown) total instructions in shared programs: 16989200 -> 16989201 (<.01%) instructions in affected programs: 461 -> 462 (0.22%) helped: 0 / HURT: 1 total cycles in shared programs: 946537070 -> 946537035 (<.01%) cycles in affected programs: 16378 -> 16343 (-0.21%) helped: 1 / HURT: 0 Test: piglit!1100 Reported-by: Georg Lehmann Fixes: `ca675b73d3` ("i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40284>	2026-03-09 18:41:55 +00:00
Ian Romanick	6c6c6ce054	brw/algebraic: Don't optimize SEL.L.SAT or SEL.G.SAT This optimization was added in October 2013, and the error was only just now discovered. Removing the SEL.G.SAT optimization affected zero shader-db shaders, and it affected 9 fossil-db shaders for instruction size only. I haven't checked to see if any of the hurt shaders are helped by !39987. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17093041 -> 17093055 (<.01%) instructions in affected programs: 2072 -> 2086 (0.68%) helped: 0 / HURT: 8 total cycles in shared programs: 876739578 -> 876739154 (<.01%) cycles in affected programs: 18946 -> 18522 (-2.24%) helped: 2 / HURT: 6 fossil-db: Lunar Lake Totals: Instrs: 906230557 -> 906240487 (+0.00%); split: -0.00%, +0.00% CodeSize: 14498856128 -> 14499003168 (+0.00%); split: -0.00%, +0.00% Send messages: 40667184 -> 40667205 (+0.00%); split: -0.00%, +0.00% Cycle count: 104068494103 -> 104068561943 (+0.00%); split: -0.00%, +0.00% Max live registers: 189570192 -> 189570204 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 48157648 -> 48157552 (-0.00%) Non SSA regs after NIR: 139823587 -> 139823016 (-0.00%); split: -0.00%, +0.00% Totals from 9172 (0.46% of 1985212) affected shaders: Instrs: 10774709 -> 10784639 (+0.09%); split: -0.00%, +0.09% CodeSize: 177868384 -> 178015424 (+0.08%); split: -0.08%, +0.17% Send messages: 311154 -> 311175 (+0.01%); split: -0.00%, +0.01% Cycle count: 232471392 -> 232539232 (+0.03%); split: -0.15%, +0.18% Max live registers: 1243549 -> 1243561 (+0.00%); split: -0.00%, +0.01% Max dispatch width: 196672 -> 196576 (-0.05%) Non SSA regs after NIR: 509663 -> 509092 (-0.11%); split: -0.19%, +0.08% Test: piglit!1100 Reported-by: Georg Lehmann Fixes: `ca675b73d3` ("i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40284>	2026-03-09 18:41:55 +00:00
Lionel Landwerlin	9f2215b480	anv/brw: remove push constant load emulation from the backend compiler Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Anv is responsible for much of how the data is accessed (where is the push constant base pointer located, etc...), so move the memory load there. Fossildb on LNL : Totals from 135931 (8.65% of 1572134) affected shaders: Instrs: 68518228 -> 67142101 (-2.01%); split: -2.05%, +0.05% CodeSize: 1123507040 -> 1092022560 (-2.80%); split: -2.88%, +0.08% Subgroup size: 32 -> 16 (-50.00%) Send messages: 4401584 -> 4402565 (+0.02%); split: -0.02%, +0.04% Cycle count: 4626573038 -> 4619434858 (-0.15%); split: -0.89%, +0.74% Spill count: 451759 -> 452407 (+0.14%); split: -0.43%, +0.57% Fill count: 374513 -> 377440 (+0.78%); split: -0.76%, +1.54% Max live registers: 15788042 -> 15791399 (+0.02%); split: -0.05%, +0.08% Max dispatch width: 3349408 -> 3346192 (-0.10%); split: +0.09%, -0.19% Non SSA regs after NIR: 9477038 -> 9498328 (+0.22%); split: -0.27%, +0.50% Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>	2026-03-06 06:34:43 +00:00
Lionel Landwerlin	e14d6b535c	brw/nir: add new intrinsics to load data from the indirect address This address is delivered on Gfx12.5+ in compute/mesh/task shaders from the command stream instruction. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>	2026-03-06 06:34:43 +00:00

1 2 3 4 5 ...

5040 commits