fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-06 13:10:10 +01:00

Author	SHA1	Message	Date
José Roberto de Souza	b8f93bfd38	anv: Always create anv_async_submit in init_copy_video_queue_state() A next patch will emit more instructions in video and copy queues for Gfx 200 and newer but the current code only creates anv_async_submit if device has aux_map. Instead we can always create anv_async_submit and only submit it to hardware if any instruction was emited. Fixes: `86813c60a4` ("mi-builder: add read/write memory fencing support on Gfx20+") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32680>	2024-12-18 17:16:05 +00:00
José Roberto de Souza	edb33b47ab	intel/genxml/xe2: Add STATE_SYSTEM_MEM_FENCE_ADDRESS instruction Fixes: `86813c60a4` ("mi-builder: add read/write memory fencing support on Gfx20+") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32680>	2024-12-18 17:16:05 +00:00
Valentine Burley	61d9c47944	ci/lava: Use CI_JOB_TIMEOUT instead of separate variable The CI_JOB_TIMEOUT variable is the GitLab-defined job timeout in seconds. Use this variable in LAVA instead of the separate JOB_TIMEOUT, which was intended to represent the test phase timeout (job timeout minus 5 minutes), but was often overlooked. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32609>	2024-12-18 09:23:27 +00:00
Valentine Burley	3d1dd22bb4	anv/ci: Update expectations Remove bogus failures caused by wrong GPU_VERSION configuration, delete tests that no longer exist in current CTS versions, and update expectations. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>	2024-12-18 07:13:44 +00:00
Valentine Burley	526ec3e7dd	anv/ci: Remove fails that are in .gitlab-ci/all-skips.txt These tests are always skipped in Mesa. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>	2024-12-18 07:13:44 +00:00
Valentine Burley	f42d670ea6	anv/ci: Re-enable TGL and JSL manual jobs Thanks to the speedup achieved by increasing tests_per_group, nightly jobs are now within reasonable time limits, allowing them to be re-enabled. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>	2024-12-18 07:13:44 +00:00
Valentine Burley	eb7fb2e919	anv/ci: Bump the number of tests per group for TGL Due to the slow startup time of deqp-vk, the previous default of 500 tests per group caused the jobs to run up to twice as slowly compared to using a higher number of tests per group. Increase the number of tests per group for all subsets of the deqp-runner suites, which allows decreasing the fractions. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>	2024-12-18 07:13:44 +00:00
Valentine Burley	629b19a59f	anv/ci: Bump the number of tests per group for JSL Due to the slow startup time of deqp-vk, the previous default of 500 tests per group caused the jobs to run up to twice as slowly compared to using a higher number of tests per group. Increase the number of tests per group for all subsets of the deqp-runner suites, which allows decreasing the fractions. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>	2024-12-18 07:13:44 +00:00
Valentine Burley	e68f9bb856	anv/ci: Bump the number of tests per group for ADL Due to the slow startup time of deqp-vk, the previous default of 500 tests per group caused the jobs to run up to twice as slowly compared to using a higher number of tests per group. Increase the number of tests per group for all subsets of the deqp-runner suites, which allows decreasing the fractions. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>	2024-12-18 07:13:44 +00:00
Valentine Burley	e7e9ceceb3	anv/ci: Fix GPU_VERSION configuration for anv-jsl and anv-jsl-full The GPU_VERSION was incorrectly set to iris-jsl for these ANV jobs, causing mismatched expectations. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>	2024-12-18 07:13:44 +00:00
Ian Romanick	b4d472cd67	brw/emit: Fix BROADCAST when value is uniform and index is immediate Fixes: `c74511f5dc` ("i965: Introduce the BROADCAST pseudo-opcode.") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Tried-to-help-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32668>	2024-12-17 21:57:26 +00:00
Kevin Chuang	1b55f10105	anv/bvh: Dump BVH synchronously upon command buffer completion Modified the BVH dumping mechanism to synchronously wait for the command buffer to complete before saving BVH data to files. This approach is more robust compared to the previous method of dumping during acceleration strucutre destruction. Note: if DEBUG_BVH_ANY is enabled but intel-rt is disabled, we will wait for nothing. Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32585>	2024-12-16 23:01:11 +00:00
Caio Oliveira	93dfe504f2	intel/brw: Add SHADER_OPCODE_READ_FROM_CHANNEL and LIVE_CHANNEL Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32412>	2024-12-14 11:38:14 -08:00
Caio Oliveira	d325de316d	intel/brw: Add some tests for new Xe2 register regioning restrictions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>	2024-12-14 02:15:18 +00:00
Caio Oliveira	f308be16a0	intel/brw: Add validation for some Xe2 register regioning restrictions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>	2024-12-14 02:15:18 +00:00
Caio Oliveira	6a5a316312	intel/brw: Extract format enum in EU validation code Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>	2024-12-14 02:15:18 +00:00
Caio Oliveira	57b703cec3	intel/brw: Skip some regioning EU validation for Vx1 and VxH modes Skip the ones that check the VertStride -- which is set to a special value in those modes. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>	2024-12-14 02:15:18 +00:00
Felix DeGrood	0f46c53b0c	anv: Use vfg distribution mode = RR_STRICT for Xe2+ Performance tuning. Round Robin strict faster on Xe2 for some workloads. Speedup: - Borderlands3-dx11-trace: +4% - WolfensteinYoungblood-vk.g6: +1.5% - Cyberpunk2077-dx12vk-2160p-ultra: +0.5% Acked-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32566>	2024-12-13 19:15:48 +00:00
Deborah Brouwer	b6435207ab	ci: python-test rename artifacts The current python-test job creates and compresses python related artifacts for use by future jobs. The artifacts are currently named `mesa-python-test` which is somewhat misleading because they are not needed for testing python scripts or libraries. Rename the artifacts generated by the python-test job to be more descriptive of their purpose. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32340>	2024-12-13 10:04:03 -08:00
Sagar Ghuge	d3f9139e49	intel: Use Morton compute walk order According to HSD 14016252163 if compute shader uses the sample operation, morton walk order and set the thread group batch size to 4 is expected to increase sampler cache hit rates by increasing sample address locality within a subslice. Rework: * Caio: "\|\|" => "&&" for type checking in instr_uses_sampler() * Jordan: Use nir's foreach macros rather than nir_shader_lower_instructions() Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>	2024-12-12 19:56:47 -08:00
Sagar Ghuge	4bd958243d	intel/genxml: Update COMPUTE_WALKER_BODY For PTL, we can have one more additional walk order along with the "Thread Group Batch Size" field. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>	2024-12-12 19:56:47 -08:00
Sagar Ghuge	41eda955af	intel/genxml: Drop morton walk field from Xe2 Looks like this one got added accidently for Xe2. Xe2 doesn't support Morton dispatch walk order. Thanks to Rohan for bringing up this during review. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>	2024-12-12 19:56:47 -08:00
Caio Oliveira	0af8133f09	intel/executor: Add example using scalar register and send gather Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>	2024-12-13 02:18:15 +00:00
Caio Oliveira	5420c027e6	intel/brw: Add validation for ARF scalar register Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>	2024-12-13 02:18:15 +00:00
Caio Oliveira	f8c7348468	intel/brw: Add assembly support for ARF scalar register And the SEND gather variant that uses a scalar register as its only source. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>	2024-12-13 02:18:15 +00:00
Caio Oliveira	46e9fe6981	intel/brw: Add TGL_PIPE_SCALAR value Add the enum value for the (in-order) scalar pipe. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>	2024-12-13 02:18:15 +00:00
Caio Oliveira	7acd84da51	intel/brw: Consider if SEND is gather variant when setting ex_desc SEND instructions of gather variant will use the upcoming ARF scalar register. They use only Src0 and reuse the bits of Src1.Length (part of ex_desc). Src1.Length is (implicitly) defined as 0. Adapt the helper functions to take the new variant into account when manipulating ex_desc. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32236>	2024-12-13 02:18:15 +00:00
Ian Romanick	1b1003ca6f	brw/algebraic: Pull brw_constant_fold_instruction out of the switch statement Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	f0bf68dd25	brw/const: Remove TODO that isn't allowed by the hardware There are a lot of restrictions for bfloat16. The one that prevents this very useful optimization from being possible is, "Broadcast of bfloat16 scalar is not supported." Part of the reason this MR exists is to build up to implementing BF support, and there are a couple more commits that implement this. However, it fails on both real hardware and simulation: Instruction is: mad (8\|M0) r6.0<1>:f 0xBF80:bf r2.0<8;1>:f r64.0<0>:f In bfloat/float mixed mode, bfloat src must be packed. Alas. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	99d3755bdd	brw/const: Allow HF constants in MAD on Gfx11 These can't mix with F values, but if the non-constant sources are already HF, this is allowed in src0. No shader-db changes on any Intel platform. fossil-db: Ice Lake Totals: Instrs: 236027458 -> 236027442 (-0.00%) Cycle count: 24515944704 -> 24515945379 (+0.00%) Totals from 8 (0.00% of 798454) affected shaders: Instrs: 10226 -> 10210 (-0.16%) Cycle count: 58567 -> 59242 (+1.15%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	4c462b6b32	brw/const: Allow constants in integer MAD Nothing can generate this currently, but a future commit will. The Bspec and experimentation support the following limitations: - Gfx11: Either src0 or src2 can be W or UW. - Gfx12: Either src0 or src2 can be W or UW. - Gfx12.5: Both src0 and src2 can be W or UW. - Gfx20: Both src0 and src2 can be W or UW. v2: Add missing break statement. v3: Leave the MAD handling in the case with the other 3 source instructions. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	9fa6b68f9e	brw/const: Refactor checking whether an immediate source is allowed Should be no functional change here. This simplifies some later changes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	69d74739fd	brw/algebraic: Don't restrict MAD(a, b, 1) optimization to float32 This is very unlikely for floating point MAD. At some point I intend to add internal integer MAD uses, and this could occur there. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	b605f76b2a	brw/algebraic: Constant fold multiplicands of MAD v2: Move the full constant folding part to brw_constant_fold_instruction. Suggested by Caio. I did this by extracting the core part of the folding to a helper function. v3: Delete stale comment. Noticed by Caio. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 18090847 -> 18090843 (<.01%) instructions in affected programs: 150 -> 146 (-2.67%) helped: 1 / HURT: 0 total cycles in shared programs: 919664648 -> 919663210 (<.01%) cycles in affected programs: 3426 -> 1988 (-41.97%) helped: 1 / HURT: 0 LOST: 1 GAINED: 0 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 220496486 -> 220496403 (-0.00%) Cycle count: 31610880908 -> 31610879044 (-0.00%); split: -0.00%, +0.00% Totals from 70 (0.01% of 702439) affected shaders: Instrs: 47018 -> 46935 (-0.18%) Cycle count: 6335504 -> 6333640 (-0.03%); split: -0.11%, +0.09% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	3a16ad71b7	brw/copy: Commute immediates for MAD multiplicands This enables constant combining to do its job. v2: Restore accidentally deleted line from a comment. Noticed by Caio. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total cycles in shared programs: 919668392 -> 919669310 (<.01%) cycles in affected programs: 10125264 -> 10126182 (<.01%) helped: 348 / HURT: 194 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Cycle count: 31610720660 -> 31610692748 (-0.00%); split: -0.00%, +0.00% Totals from 9066 (1.29% of 702433) affected shaders: Cycle count: 810411934 -> 810384022 (-0.00%); split: -0.01%, +0.00% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	e3e58d6f48	brw: Emit immediate value for MAD in canonical position No shader-db changes on any Intel platform. fossil-db: Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown) Totals: Cycle count: 25096109024 -> 25096108722 (-0.00%); split: -0.00%, +0.00% Totals from 4106 (0.51% of 797610) affected shaders: Cycle count: 63266176 -> 63265874 (-0.00%); split: -0.01%, +0.01% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	d9b019b683	brw/copy: Don't try to be clever about ADD3 constant propagation Always propagate into any source. Let commute_immedates and constant combining sort out the mess. It's literally their job. No shader-db changes on any Intel platform. The fossil-db changes just appear to be subtle changes in register allocation if the immediate source changes from src0 to src2. v2: Update the comment in commute_immediates. Suggested by Caio. fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Cycle count: 31610720510 -> 31610720660 (+0.00%); split: -0.00%, +0.00% Totals from 8 (0.00% of 702433) affected shaders: Cycle count: 5522382 -> 5522532 (+0.00%); split: -0.00%, +0.00% Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	a84e3a0f55	brw/const: Allow mixing signed and unsigned immediate sources No shader-db or fossil-db changes on any Intel platform. This commit just prevents issues with a later commit, "brw/copy: Don't try to be clever about ADD3 constant propagation." v2: Use 'can_promote = true; break;' instead of 'return true;'. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	a738c55d7b	brw/algebraic: Partial constant folding of ADD3 Fold the cases where one of the sources is zero or two of the sources are constants. Both case will result in a regular ADD. No shader-db or fossil-db changes on any Intel platform. This commit just prevents issues with a later commit, "brw/copy: Don't try to be clever about ADD3 constant propagation." v2: Move the full constant folding part to brw_constant_fold_instruction. Suggested by Caio. v3: Eliminate the impossible src.file == BAD_FILE case in brw_fs_opt_algebraic. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	c52ce6157f	brw/emit: Fix typo in recently added ADD3 assertion The current assertion fails as soon as a MAD with src0 and src2 being immediate is detected. The assertion was supposted to catch, "If it's ADD3, only one of src0 and src2 can be immediate." The detect this, the opcode test should have been !=. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `c1c09e3c4a` ("brw/emit: Add correct 3-source instruction assertions for each platform") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	25de9dcd76	brw/algebraic: Fix MUL constant folding Some callers of brw_constant_fold_instruction depend on the result being a MOV of immediate when progress is made. Previously `MUL dst:D src0:D 1:D` would be converted to `MOV dst:D src0:D`. There was also no handling for `MUL dst:D imm0:D imm1:D`. This could cause problems if one of the immedate values was -1. The existing code would convert this to a `MOV dst:D imm0:D` and set the negate flag on src0. That is not correct. v2: Fix the is_negative_one case handling of the non-negative-one source. Add a comment explaining the assertion. Both suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `2cc1575a31` ("brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Ian Romanick	086e83ccd9	brw/algebraic: Fix ADD constant folding Some callers of brw_constant_fold_instruction depend on the result being a MOV of immediate when progress is made. Previously `ADD dst:D src0:D 0:D` would be converted to `MOV dst:D src0:D`. There was also no handling for `ADD dst:D imm0:D imm1:D`. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `2cc1575a31` ("brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32436>	2024-12-13 01:24:26 +00:00
Caio Oliveira	c8f6d8154f	intel/brw: Remove overloads for brw_print_instruction/s functions Almost all cases now handled with default arguments. The only real extra work that was being done was pushed to the client code in debug_optimizer(). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32596>	2024-12-12 22:01:48 +00:00
Paulo Zanoni	d4a54d4f92	brw: don't read past the end of old_src buffer in resize_sources() In this case, num_sources is bigger than this->sources, so if we loop up to num_sources (instead of this->sources) we'll end up reading past the end of old_src[]. Only copy up to what we originally had. This was found by code inspection, I'm not aware of any applications failing due to the lack of this patch. Fixes: `d9e737212d` ("intel/brw: Add a src array for the common case in fs_inst") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32600>	2024-12-12 20:33:13 +00:00
Lionel Landwerlin	e0b5179869	blorp: use 2D dimension for 1D tiled images Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `31eeb72e45` ("blorp: Add support for blorp_copy via XY_BLOCK_COPY_BLT") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32608>	2024-12-12 17:10:45 +00:00
Lionel Landwerlin	2bb98a8f99	anv: document UBO descriptor range alignments Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32347>	2024-12-12 07:35:18 +00:00
Lionel Landwerlin	99bb2a087a	intel/decoder: fix COMPUTE_WALKER handling Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `17096f87` ("intel: Switch to COMPUTE_WALKER_BODY") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32347>	2024-12-12 07:35:18 +00:00
Kenneth Graunke	6341b3cd87	brw: Combine convergent texture buffer fetches into fewer loads Borderlands 3 (both DX11 and DX12 renderers) have a common pattern across many shaders: con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture) ... con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture) A single basic block contains piles of texelFetches from a 1D buffer texture, with constant coordinates. In most cases, only the .x channel of the result is read. So we have something on the order of 28 sampler messages, each asking for...a single uint32_t scalar value. Because our sampler doesn't have any support for convergent block loads (like the untyped LSC transpose messages for SSBOs)...this means we were emitting SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar, replicating what's effectively a SIMD1 value to the entire register. This is hugely wasteful, both in terms of register pressure, and also in back-and-forth sending and receiving memory messages. The good news is we can take advantage of our explicit SIMD model to handle this more efficiently. This patch adds a new optimization pass that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic block, with constant offsets, from the same texture. It constructs a new divergent coordinate where each channel is one of the constants (i.e <10, 11, 12, ..., 26> in the above example). It issues a new NoMask divergent texel fetch which loads N useful channels in one go, and replaces the rest with expansion MOVs that splat the SIMD1 result back to the full SIMD width. (These get copy propagated away.) We can pick the SIMD size of the load independently of the native shader width as well. On Xe2, those 28 convergent loads become a single SIMD32 ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can use a smaller size when there aren't many to combine. In fossil-db, this cuts 27% of send messages in affected shaders, 3-6% of cycles, 2-3% of instructions, and 8-12% of live registers. On A770, this improves performance of Borderlands 3 by roughly 2.5-3.5%. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>	2024-12-12 00:05:42 +00:00
Caio Oliveira	abe41b1d2c	intel/compiler: Use #pragma once instead of header guards Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32534>	2024-12-11 19:47:44 +00:00
Tapani Pälli	97fc987497	intel/dev: update mesa_defs.json from internal database This updates entry for 14017823839 which fixes issues on BMG with: dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32550>	2024-12-11 17:32:52 +00:00

1 2 3 4 5 ...

13197 commits