fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-04-24 15:50:37 +02:00

Author	SHA1	Message	Date
Sushma Venkatesh Reddy	d1d4e3d530	brw: Add EU assembler support for float8 Decode logic in Gfx12+ has become complex with the new types, so Caio suggested that we move to the table like other gens. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>	2025-12-19 00:09:53 +00:00
Jordan Justen	0088aae481	intel/brw: Add new encode/decode for use with brw_data_type_float/int Rework: * Sushma: Add BF in brw_data_type_encode, brw_data_type_decode Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>	2025-12-19 00:09:53 +00:00
Jordan Justen	46e843f76e	intel/brw: Add brw_data_type_float/brw_data_type_int These type encodings were first were used in dpas instructions, but continue to be used in more places. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>	2025-12-19 00:09:52 +00:00
Sushma Venkatesh Reddy	54accefed2	brw: Add BRW_TYPE_BF8 and BRW_TYPE_HF8 for float8 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>	2025-12-19 00:09:52 +00:00
Ian Romanick	b967942b64	brw: Do cmod prop again after scheduling Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details After selecting the scheduling mode, do cmod prop again. It's possible that doing cmod prop between performing a schedule and trying to register allocate would cause a different scheduling mode to be selected. However, this would require fully restoring the pre-schedule set of instructions (via cloning). I have tried to implement this, and it's harder than it looks. :( v2: Delete unused variable `progress`. Noticed by Marge. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19967018 -> 19967006 (<.01%) instructions in affected programs: 10652 -> 10640 (-0.11%) helped: 4 / HURT: 0 total cycles in shared programs: 884129990 -> 884139590 (<.01%) cycles in affected programs: 20334512 -> 20344112 (0.05%) helped: 0 / HURT: 4 fossil-db: Lunar Lake Totals: Instrs: 924967191 -> 924963460 (-0.00%); split: -0.00%, +0.00% Cycle count: 105962414958 -> 105961925594 (-0.00%); split: -0.00%, +0.00% Spill count: 3423582 -> 3423564 (-0.00%); split: -0.00%, +0.00% Fill count: 4877121 -> 4876955 (-0.00%); split: -0.00%, +0.00% Totals from 2511 (0.12% of 2018786) affected shaders: Instrs: 12541707 -> 12537976 (-0.03%); split: -0.03%, +0.00% Cycle count: 4816359238 -> 4815869874 (-0.01%); split: -0.01%, +0.00% Spill count: 179536 -> 179518 (-0.01%); split: -0.03%, +0.02% Fill count: 279407 -> 279241 (-0.06%); split: -0.07%, +0.01% Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 980252404 -> 980237686 (-0.00%); split: -0.00%, +0.00% Cycle count: 91758669556 -> 91764028404 (+0.01%); split: -0.00%, +0.01% Spill count: 3664771 -> 3664744 (-0.00%); split: -0.00%, +0.00% Fill count: 4962078 -> 4960482 (-0.03%); split: -0.04%, +0.01% Totals from 8472 (0.38% of 2251522) affected shaders: Instrs: 34977623 -> 34962905 (-0.04%); split: -0.04%, +0.00% Cycle count: 6251857553 -> 6257216401 (+0.09%); split: -0.04%, +0.13% Spill count: 480251 -> 480224 (-0.01%); split: -0.01%, +0.00% Fill count: 676539 -> 674943 (-0.24%); split: -0.28%, +0.05% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Ian Romanick	09450faf6a	brw: Do cmod prop again after post-RA scheduling shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19968728 -> 19963825 (-0.02%) instructions in affected programs: 788014 -> 783111 (-0.62%) helped: 2503 / HURT: 0 total cycles in shared programs: 884112912 -> 884093268 (<.01%) cycles in affected programs: 20017168 -> 19997524 (-0.10%) helped: 1830 / HURT: 52 LOST: 0 GAINED: 6 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 980768016 -> 980172179 (-0.06%) Cycle count: 91762351767 -> 91757280093 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 37602592 -> 37608768 (+0.02%) Totals from 157150 (6.98% of 2251329) affected shaders: Instrs: 107323207 -> 106727370 (-0.56%) Cycle count: 12696754006 -> 12691682332 (-0.04%); split: -0.04%, +0.00% Max dispatch width: 3708584 -> 3714760 (+0.17%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Ian Romanick	08d71730ca	brw/cmod: Propagate to an instruction with same source Detect cases like mov.nz.f0.0(8) null<1>D g66<8,8,1>D (+f0.0) sel(8) g123<1>UD g87<8,8,1>UD g84<8,8,1>UD mov.nz.f0.0(8) null<1>D g66<8,8,1>D (+f0.0) sel(8) g124<1>UD g88<8,8,1>UD g85<8,8,1>UD Either MOV instruction could also be an equivalent CMP. v2: Require no predicate, groups match, and flags written match. v3: Add some more unit tests. Suggested by Caio. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 17203627 -> 17203590 (<.01%) instructions in affected programs: 51432 -> 51395 (-0.07%) helped: 37 / HURT: 0 total cycles in shared programs: 879884982 -> 879884670 (<.01%) cycles in affected programs: 6014730 -> 6014418 (<.01%) helped: 25 / HURT: 4 fossil-db: Lunar Lake Totals: Instrs: 925092938 -> 925071952 (-0.00%); split: -0.00%, +0.00% Cycle count: 105972157149 -> 105966120894 (-0.01%); split: -0.01%, +0.00% Spill count: 3423592 -> 3423582 (-0.00%) Fill count: 4876743 -> 4877121 (+0.01%); split: -0.00%, +0.01% Max live registers: 193525293 -> 193525251 (-0.00%) Max dispatch width: 49047056 -> 49047088 (+0.00%); split: +0.00%, -0.00% Totals from 17714 (0.88% of 2018791) affected shaders: Instrs: 56708169 -> 56687183 (-0.04%); split: -0.04%, +0.00% Cycle count: 4560530879 -> 4554494624 (-0.13%); split: -0.15%, +0.01% Spill count: 434846 -> 434836 (-0.00%) Fill count: 807443 -> 807821 (+0.05%); split: -0.02%, +0.07% Max live registers: 4332542 -> 4332500 (-0.00%) Max dispatch width: 295248 -> 295280 (+0.01%); split: +0.02%, -0.01% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 995075628 -> 995051291 (-0.00%); split: -0.00%, +0.00% Cycle count: 92060967154 -> 92059311640 (-0.00%); split: -0.00%, +0.00% Spill count: 3664664 -> 3664675 (+0.00%); split: -0.00%, +0.00% Fill count: 4961929 -> 4961874 (-0.00%); split: -0.00%, +0.00% Max live registers: 121480292 -> 121480184 (-0.00%) Max dispatch width: 37947528 -> 37947496 (-0.00%) Totals from 20569 (0.90% of 2278279) affected shaders: Instrs: 57437989 -> 57413652 (-0.04%); split: -0.04%, +0.00% Cycle count: 4297505238 -> 4295849724 (-0.04%); split: -0.06%, +0.03% Spill count: 487508 -> 487519 (+0.00%); split: -0.00%, +0.00% Fill count: 869228 -> 869173 (-0.01%); split: -0.01%, +0.00% Max live registers: 2413028 -> 2412920 (-0.00%) Max dispatch width: 239280 -> 239248 (-0.01%) Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 1012570598 -> 1012546137 (-0.00%); split: -0.00%, +0.00% Cycle count: 85579989052 -> 85589116671 (+0.01%); split: -0.00%, +0.01% Spill count: 3901755 -> 3901748 (-0.00%) Fill count: 6799383 -> 6799367 (-0.00%) Max live registers: 122288761 -> 122288658 (-0.00%) Totals from 20595 (0.90% of 2280449) affected shaders: Instrs: 57764192 -> 57739731 (-0.04%); split: -0.04%, +0.00% Cycle count: 3899898675 -> 3909026294 (+0.23%); split: -0.04%, +0.27% Spill count: 481262 -> 481255 (-0.00%) Fill count: 1057996 -> 1057980 (-0.00%) Max live registers: 2412395 -> 2412292 (-0.00%) Skylake Totals: Instrs: 516619178 -> 516617390 (-0.00%) Cycle count: 57593545602 -> 57592502019 (-0.00%); split: -0.00%, +0.00% Fill count: 860403 -> 860402 (-0.00%) Max live registers: 87553761 -> 87553649 (-0.00%) Totals from 1357 (0.08% of 1730068) affected shaders: Instrs: 3575640 -> 3573852 (-0.05%) Cycle count: 1772148559 -> 1771104976 (-0.06%); split: -0.06%, +0.00% Fill count: 68917 -> 68916 (-0.00%) Max live registers: 131237 -> 131125 (-0.09%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Ian Romanick	50f2cd7366	brw/dce: Don't generate more NULL destinations after brw_lower_3src_null_dest Later commits will call DCE after lowering has been performed. Creating more things that would need lowering is problematic. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Ian Romanick	24cd8aa3b8	brw/cmod: Allow FIXED_GRF Later commits will call cmod prop after register allocation. At that time, there is only FIXED_GRF. No shader-db or fossil-db changes on any Intel platform. v2: FIXED_GRF uses subnr instead of offset. Add a unit test to demonstrate the issue. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Ian Romanick	d7227b11a1	brw: elk: Disable can_do_cmod for MACH PRMs for G35 (Gfx4) through Ivy Bridge (Gfx7) all say that conditional modifiers are allowed for MACH. Starting with Haswell (Gfx7.5), this seems to be removed. This function doesn't have any way to know the platform, so false is returned for all platforms. No shader-db or fossil-db changes on any Intel platform. Prevents a failure in "brw: Do cmod prop again after post-RA scheduling" in piglit's builtin-uint-mad_sat-1.0.generated.cl. Cc: stable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Ian Romanick	ba30794847	brw/cmod: Don't propagate between instructions in different groups The group implicity selects which flags the instruction can write. This was discovered while working on another set of changes that could change some logical operations into predicated MOV instructions. Prevents regressions later in the series in dEQP-VK.graphicsfuzz.cov-loop-fragcoord-identical-condition. No shader-db or fossil-db changes on any Intel platform. v2: Update the comment in the test case. Suggested by Caio. Fixes: `95ac3b1dae` ("i965/fs: don't propagate cmod when the exec sizes differ") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
Ian Romanick	c0fb93506b	brw: Add brw_reg::is_grf v2: Add a function comment. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38315>	2025-12-18 15:15:20 -08:00
hwandy	ffbe6470a2	anv: fix a memory leak in slab allocator. An example when the memory leak happens: requested_size = 4 and alignment = 65536 in anv_slab_bo_alloc: The alloc_size = 65536 and requested = 4 in this case. The group to allocate the entry is the group of size 65536 based on the entry size, while the group to reclaim the entry is the group of size 4 due to the bo->size is registered as the requested_size=4 and used in anv_slab_bo_free. That means, the entry is allocated in group[order of size 65535]->free, moved from group[order of size 65535]->free to the user, and then moved to group[order of size 4]->reclaim, so the entries is accumulated in group[order of size 4]->reclaim and group[order of size 65535] keeps allocating new entries and leading to OOM. The solution is to use `bo->actual_size` to get the group in pb_slab_bo_free using the allocation size. Fixes: `dabb012423` ("anv: Implement anv_slab_bo and enable memory pool") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14396 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: hwandy <hwandy@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38989>	2025-12-18 18:25:54 +00:00
Alyssa Rosenzweig	61dc9201a1	brw: constant fold before texture lowering This ensures we don't need dynamic stuff. Noticed when debugging weird regressions around the mcs lowering. ARL: total instructions in shared programs: 19857061 -> 19854964 (-0.01%) instructions in affected programs: 91768 -> 89671 (-2.29%) helped: 154 HURT: 0 helped stats (abs) min: 9.0 max: 33.0 x̄: 13.62 x̃: 13 helped stats (rel) min: 0.51% max: 40.91% x̄: 4.66% x̃: 3.36% 95% mean confidence interval for instructions value: -14.04 -13.19 95% mean confidence interval for instructions %-change: -5.49% -3.84% Instructions are helped. total cycles in shared programs: 884538769 -> 884485530 (<.01%) cycles in affected programs: 10508994 -> 10455755 (-0.51%) helped: 116 HURT: 38 helped stats (abs) min: 4.0 max: 15238.0 x̄: 666.22 x̃: 148 helped stats (rel) min: 0.01% max: 34.53% x̄: 2.58% x̃: 1.07% HURT stats (abs) min: 4.0 max: 4027.0 x̄: 632.68 x̃: 302 HURT stats (rel) min: 0.01% max: 32.75% x̄: 3.46% x̃: 0.59% 95% mean confidence interval for cycles value: -631.32 -60.09 95% mean confidence interval for cycles %-change: -2.06% -0.12% Cycles are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39023>	2025-12-18 17:55:29 +00:00
Kenneth Graunke	d83c699045	brw: Convert GS pulled inputs to use URB intrinsics We leave GS pushed inputs using load_per_vertex_input for now - they're relatively simple, and using load_attribute_payload doesn't work well since it's assumed to be convergent (for TES, FS inputs) while GS inputs are divergent. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>	2025-12-18 06:39:02 +00:00
Kenneth Graunke	eae3bd19d4	brw: Move GS URB Read Length limiting to brw_nir_lower_gs_inputs() We're going to be deciding on push vs. pull in the NIR lowering pass soon, so move the code to limit our register usage from brw's thread payload code to brw_nir_lower_gs_inputs(). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>	2025-12-18 06:39:02 +00:00
Kenneth Graunke	8889802271	brw: Make max_push_bytes a parameter to URB lowering data This allows us to program something other than a stage-based constant. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>	2025-12-18 06:39:02 +00:00
Kenneth Graunke	f62f7d80e2	brw: Update try_load_push_input to handle dword-unit offsets too We don't need this case today, but it's trivial to handle. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>	2025-12-18 06:39:01 +00:00
Tapani Pälli	2418c91537	anv/drirc: disable Xe2 CCS drm modifiers for GTK engine Cc: mesa-stable Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38373>	2025-12-17 17:34:09 +00:00
Sagar Ghuge	61287b00f3	anv: Stop using RCS companion for MSAA copy/clear on Xe3+ Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On Xe3+, we have typed MSAA load/store message support. We can use them during MSAA copies. We don't have to fallback on RCS companion queue anymore. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>	2025-12-17 05:34:02 +00:00
Sagar Ghuge	de0c547448	blorp: Handle 2D MSAA array image copies on compute shader We are passing number of layers as inline parameter register, so figure out z_pos and write to 2D MSAA array images in compute shader. We already get component X, Y and sample index, all we needed was the number of layers. Ken: - Use load/store var instead of derefs Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>	2025-12-17 05:34:02 +00:00
Sagar Ghuge	080d28a03e	blorp: Set persample_msaa_dispatch for render shader Only 3D shader gets dispatched per sample not the compute shader. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>	2025-12-17 05:34:02 +00:00
Lionel Landwerlin	d99a3d9b58	anv: remove CS-L3 coherency on Xe2 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details I'll try to write some crucible tests for this. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `be5f5f659f` ("anv: consider CS coherent with L3 on Xe2+") Fixes: `503355c7f8` ("anv: update pipeline barriers for Xe2+") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38966>	2025-12-16 21:35:27 +00:00
Alyssa Rosenzweig	079e9ae606	treewide: use BITSET_*_COUNT Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff as-if hand written: Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>	2025-12-16 17:42:10 +00:00
Caio Oliveira	9c16bbd023	brw: Perform mark_last_urb_write_with_eot optimization after CFG Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Avoid using exec_node::remove() and the initial "main list of instructions", and instead use the existing helpers like other passes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37146>	2025-12-16 17:02:58 +00:00
Caio Oliveira	e53576a559	brw: Move MATH related validation Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Moved existing checks to EU validation and added a few more based on instruction description in the various PRMs / BSpec. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:46 +00:00
Caio Oliveira	55863c1267	brw: Add EU validation for ROR/ROL And remove asserts() in generator. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:46 +00:00
Caio Oliveira	47d8ed1177	brw: Move PLN/LINE normalization Add validation for Source 0 and move the normalization into the code producing the instruction. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:44 +00:00
Caio Oliveira	3f436bdc6e	brw: Make LINE normalization into validation Add validation for Source 0. Should not cause problems since this instruction is not used by the compiler anymore. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:43 +00:00
Caio Oliveira	75cf20f0eb	brw: Remove LINE from brw_builder and brw_generator Gfx9 only instruction that is not used anymore. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:42 +00:00
Caio Oliveira	cd3e3dd0d3	brw: Drop asserts for brw_SRND These are already covered by the EU validation. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:41 +00:00
Caio Oliveira	68190499df	brw: Move ADD related validation Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:40 +00:00
Caio Oliveira	6ae92d3372	brw: Move AVG related validation Couldn't find in the docs a reference for the types needing to match, and simulator + MTL seem fine with mixing UD and UW, so not adding a replacement for the removed assertions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:38 +00:00
Caio Oliveira	6d8d733d4d	brw: Move MUL related validation Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>	2025-12-16 01:34:34 +00:00
Kenneth Graunke	26523bedec	brw: Call nir_opt_offsets for mesh shaders Most stages call this as part of brw_nir_postprocess_opts() but mesh lowers to URB intrinsics after that since it needs bit-sizes lowered. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:46 +00:00
Kenneth Graunke	d831f38d11	brw: Delete all the old backend mesh/task URB handling code This has all been replaced by NIR lowering to URB intrinsics. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:46 +00:00
Kenneth Graunke	d0dc45955d	brw: Lower task shader payload access in NIR We keep this separate from the other lowering infrastructure because there's no semantic IO involved here, just byte offsets. Also, it needs to run after nir_lower_mem_access_bit_sizes, which means it needs to be run from brw_postprocess_opts. But we can't do the mesh URB lowering there because that doesn't have the MUE map. It's not that much code as a separate pass, though. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:46 +00:00
Kenneth Graunke	bd0c173595	brw: Lower mesh shader outputs in NIR With all the infrastructure in place, this is largely a matter of calling the lowering passes with the appropriate data from the MUE map. MUE initialization is now done with semantic IO instead of raw offsets. This drops another case of non-standard NIR IO usage (and no_validate). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:44 +00:00
Kenneth Graunke	6e5cc63a3a	brw: Extend URB lowering infrastructure to handle mesh shader outputs Mesh shaders introduce per-primitive outputs, and also our MUE layout has per-vertex data starting at an offset. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:43 +00:00
Lionel Landwerlin	60db7f20c9	brw: move MUE initialization out of the SIMD loop Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:42 +00:00
Lionel Landwerlin	d3053fb3d2	brw: Implement URB handle intrinsics for task/mesh stages (Split by Ken from a larger patch originally written by Lionel.) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:40 +00:00
Kenneth Graunke	d18423b116	brw: Make lower_{inputs,outputs}_to_urb_intrinsics non-static I want to reuse these in brw_compile_mesh.cpp. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:40 +00:00
Kenneth Graunke	788c49ecc6	brw: Extend load_urb/store_urb to handle 32-bit non-vec4-aligned access (Based on the original implementation by Lionel Landwerlin, but adapted to my respun URB lowering framework.) The mesh shader URB payload requires reading and writing fields at arbitrary DWord offsets. For example, the Primitive Indices array starts at DWord 1, and it can be a vec1[], vec2[], or vec3[] array, leading to very unaligned and sometimes double-parked elements. Still, most fields are still conveniently vec4-aligned. To handle this, we add a new cb_data::vec4_access flag. If set, access remains in vec4 units, with vec4 alignment. We use this for non-mesh stages. When unset, offset is in 32-bit units, allowing unaligned DWord access. This is trivial to support on Xe2, where the LSC URB messages support arbitrary byte-aligned addressing. On older platforms, we have to convert this to vec4 aligned offsets plus a component offset (either returning a subset of the channels loaded, or using component masking to store a subset of a vec4/vec8). Thankfully, since the OWord URB messages support accessing a vec8 at a time, this means we can do any vec4 access in one message, even if it's double-parked. We use mod-analysis to see if we can statically determine the sub-vec4 component offset required (we often can). If not, we use the ability to have dynamic writemasks to sort it out. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:38 +00:00
Kenneth Graunke	2b700f6bfd	brw: Delete attr_desc struct Unused since commit `18bbcf9a63`. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:37 +00:00
Kenneth Graunke	8177695403	brw: Add missed access to store_urb_lsc_intel intrinsics I forgot to copy this over in the LSC case. This meant we were missing reorderability which meant that we were missing out on CSE. fossil-db results on Battlemage: Instrs: 231471427 -> 231363032 (-0.05%) Send messages: 12077759 -> 12019628 (-0.48%) Cycle count: 34058451430.0 -> 34057005552.0 (-0.00%); split: -0.01%, +0.00% Spill count: 520387 -> 520135 (-0.05%) Fill count: 470812 -> 470722 (-0.02%) Max live registers: 72111834 -> 71873886 (-0.33%) Totals from 2898 (0.37% of 788851) affected shaders: Instrs: 1223836 -> 1115441 (-8.86%) Send messages: 148633 -> 90502 (-39.11%) Cycle count: 17732554.0 -> 16286676.0 (-8.15%); split: -10.65%, +2.49% Spill count: 252 -> 0 (-inf%) Fill count: 90 -> 0 (-inf%) Max live registers: 491684 -> 253736 (-48.39%) Non SSA regs after NIR: 255397 -> 255402 (+0.00%) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:36 +00:00
Kenneth Graunke	87c63b4725	brw: Rename brw_nir_lower_vue_inputs to brw_nir_lower_gs_inputs The other stages don't use this anymore. Geometry should stop too, but that's for a future MR. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:34 +00:00
Kenneth Graunke	d525d2456a	brw: Calculate tessellation URB offsets when lowering to URB intrinsics This now lowers IO intrinsics to URB intrinsics in a single step, rather than modifying IO intrinsics to have non-standard meanings temporarily. We are able to drop one "no_validate" flag. For example, remap_patch_urb_offsets had added (vertex * stride) to (offset) for per-vertex IO intrinsics, but left them as per-vertex intrinsics. Now we just have an urb_offset() function to calculate that when doing the lowering. This also provides a central location for calculating URB offsets, which we should be able to extend for other uses (per-view lowering, mesh per-primitive lowering) in future patches. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:34 +00:00
Kenneth Graunke	b02a01c636	intel: Replace signed char with int8_t Let's use modern stdint types. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>	2025-12-16 00:58:32 +00:00
Lionel Landwerlin	94d2ec975d	anv: disable crast on SKL SKL is failing the following tests (maybe more) : dEQP-VK.rasterization.conservative.overestimate.samples_1.triangles.degenerate.0_00 dEQP-VK.rasterization.conservative.overestimate.samples_16.triangles.degenerate.max dEQP-VK.rasterization.conservative.overestimate.samples_2.triangles.degenerate.max dEQP-VK.rasterization.conservative.overestimate.samples_2.triangles.degenerate.min dEQP-VK.rasterization.conservative.overestimate.samples_4.triangles.degenerate.0_00 dEQP-VK.rasterization.conservative.overestimate.samples_8.triangles.degenerate.max dEQP-VK.rasterization.conservative.overestimate.samples_8.triangles.degenerate.min Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38951>	2025-12-15 20:15:03 +00:00
José Roberto de Souza	518705a4fe	intel/brw: Split to a function the code that calculate sampler channels that should be written Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This block of code will be re-used in a future patch, also it reduces a bit the size and complexity of lower_sampler_logical_send(). No changes in behavior intended here. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38792>	2025-12-15 06:57:15 -08:00

1 2 3 4 5 ...

15183 commits