fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-31 18:08:18 +02:00

Author	SHA1	Message	Date
Kenneth Graunke	ea423aba1b	intel/brw: Split out 64-bit lowering from algebraic optimizations We don't necessarily want to split up MOVs for 64-bit addresses into 2x 32-bit MOVs right away, as this makes things like copy propagating the whole address around harder. We should do this late, once, while still doing other algebraic optimizations earlier. fossil-db results for Alchemist show tiny improvements: Totals: Instrs: 161310502 -> 161310436 (-0.00%); split: -0.00%, +0.00% Cycles: 14370605606 -> 14370605159 (-0.00%); split: -0.00%, +0.00% Totals from 33 (0.01% of 652298) affected shaders: Instrs: 15053 -> 14987 (-0.44%); split: -0.64%, +0.20% Cycles: 196947 -> 196500 (-0.23%); split: -0.25%, +0.02% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>	2024-03-20 01:04:17 -07:00
Mark Janes	345c918a76	intel/dev: remove pci revision from shader cache key Pci revision was included in the shader cache key because it can enable platform workarounds. While some platform workarounds exist in the compiler, none are dependent on the silicon stepping. Many platforms differ only in the pci revision id, causing needless duplication in cache entries between platforms. When a platform ships publicly with stepping-specific compiler workarounds, pci id must be incorporated into the shader cache key. Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28085>	2024-03-19 15:11:19 -07:00
Kenneth Graunke	d473004576	intel/fs: Avoid generating useless UNDEFs for every SSA def Emitting UNDEF is only necessary when the instructions we generate to produce the NIR def are considered partial writes. By adding a simple check (adapted from fs_inst::is_partial_write()), we can avoid creating loads of unnecessary UNDEFs that we have to clean up later. Our first dead code elimination pass does get rid of them pretty quickly, but this should save memory and time during our first split_virtual_grfs and dead_code_elimination passes. This generates roughly 30% fewer instructions at the beginning. Improves compilation time of shaders: - Rise of the Tomb Raider: -3.51563% +/- 0.103951% (n=7) - Borderlands 3: -3.64422% +/- 0.300951% (n=7). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28169>	2024-03-19 19:32:18 +00:00
Caio Oliveira	b58b6d2d32	anv: Enable VK_KHR_shader_quad_control Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>	2024-03-19 18:41:15 +00:00
Caio Oliveira	b22879e753	intel/brw: Use predicates for quad_vote_any and quad_vote_all when available Up until Xe2, we can use the predicates ANY4H and ALL4H to achieve the same result with less instructions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>	2024-03-19 18:41:15 +00:00
Caio Oliveira	857e62e6ac	intel/brw: Implement quad_vote_any and quad_vote_all Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>	2024-03-19 18:41:15 +00:00
Ian Romanick	671745b616	intel/fs: Don't allow 0 stride on MOV destination Outside SIMD1 instructions, a destination stride of zero doesn't make any sense. When such strides exist, they would be fixed by the FS generator. Currently the only place that intentionally generates such a stride is setup_barrier_message_payload_gfx125, and this commit changes that. The existence of a zero stride that won't really be a zero stride causes a variety of problems with other optimization passes. Those passes don't know that 0 actually means 1, and they make incorrect assumptions about sizes written, etc. The assertion helped catch many bugs in some other work in progress that tries to store convergent values in SIMD8 registers regardless of the dispatch width. That code would accidentally generate destination strides of zero. v2: Check stride differently depending on register file. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28256>	2024-03-19 18:17:59 +00:00
Iván Briano	446f652cde	intel/cmat: fix stride calculation in cmat load/store The stride given in the shader is in number of elements of the of the type pointed by the given pointer, which may not match the matrix own element type. Since we cast the pointer to match the element type, the stride needs to be ajusted accordingly. v2: - Fix mismatching bit-width in matrix element type and pointer type (Caio) - Do the stride calculation in one place Fixes dEQP-VK.compute.pipeline.cooperative_matrix.khr_.multicomponent. Fixes: `3a35f8b29b` ("intel/cmat: Lower cmat_load and cmat_store") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10820 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27903>	2024-03-15 20:34:43 +00:00
Rohan Garg	656f590bf5	iris,anv: WA 1509820217 is no impact for Xe2+ Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28201>	2024-03-15 12:02:45 +00:00
Caio Oliveira	bfdcddfda9	intel/tools: Make intel_stub_gpu work when using meson devenv When `meson devenv` is used, the shim library that is meant to be preloaded is not necessarily available at the installation dir. So when running in that mode both the script and the shim library will be in the same (build) directory, so adjust the ld_preload to pick that. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10798 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28134>	2024-03-15 03:25:46 +00:00
Jordan Justen	6922f421f4	intel/compiler: nib_ctrl no longer exists on Xe2+ Ref: `cfb34dc695` ("intel/eu/validate: Validate that the ExecSize is a factor of chosen ChanOff") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28191>	2024-03-15 03:01:53 +00:00
Jordan Justen	72d289b8d1	intel/compiler/fs: Restore SIMD32 restriction for ray_queries on Xe2 In `96e0d979a7`, the restriction was dropped because we don't compile a SIMD8 program on Xe2. This change moves it to run_fs() so the restriction will be added when compiling SIMD16 on Xe2. Fixes: `96e0d979a7` ("intel/fs: Check fs_visitor instance before using it") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28191>	2024-03-15 03:01:53 +00:00
Marcin Ślusarz	2ad4d5f8dd	intel/compiler/xe2: fix decoding of sampler simd mode Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28191>	2024-03-15 03:01:53 +00:00
Lionel Landwerlin	4df58ef503	intel/fs: bump max simd size of some messages for xe2 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28191>	2024-03-15 03:01:53 +00:00
Caio Oliveira	e5bc5bba7c	anv: Enable VK_KHR_shader_maximal_reconvergence Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27278>	2024-03-15 02:10:21 +00:00
Lionel Landwerlin	20df1d2b1f	anv: ignore descriptor alignment for inline uniforms For this particular case only it doesn't matter. Fixes some new CTS tests with small inline uniform sizes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28040>	2024-03-14 16:54:30 +00:00
José Roberto de Souza	27ab5fcf9f	anv: Set VM control to true in Xe KMD Xe KMD needs VMs to be created to work. Setting this on Xe KMD code path allow us to simply a feature check in init_queue_families(). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>	2024-03-14 15:53:22 +00:00
José Roberto de Souza	c20388d617	anv: Set VK_QUEUE_PROTECTED_BIT during queue families initialization Don't make sense to only set it in VkGetPhysicalDeviceQueueFamilyProperties2(). Not setting it to the code path without pdevice->engine_info because the protected support landed on i915 after DRM_I915_QUERY_ENGINE_INFO. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>	2024-03-14 15:53:22 +00:00
José Roberto de Souza	9102cb972a	anv: Replace the 2 sparse booleans by 1 enum Having just one place to check the Sparse type is less error prone. For example in i915 it was always setting sparse_uses_trtt to true even if running in gfx 9 that don't support sparse. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>	2024-03-14 15:53:22 +00:00
Yiwei Zhang	e0da118ab1	anv/hasvk: default image_read_without_format to true The spv cap has the correct requirements to be satisfied before an app can use it, so we can drop the redundant check here to be more robust. Either of below is needed: - VkPhysicalDeviceFeatures::shaderStorageImageReadWithoutFormat - VK_VERSION_1_3 - VK_KHR_format_feature_flags2 v2: dropped unused variable Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28117>	2024-03-13 19:29:04 +00:00
Lionel Landwerlin	b7719a9ed8	intel/fs: remove some unused send helpers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28152>	2024-03-13 14:37:48 +00:00
Lionel Landwerlin	2a77a46837	anv: return unsupported for FSR images on Gfx12.0 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28131>	2024-03-13 08:52:24 +02:00
Caio Oliveira	e324fbbe68	intel/brw: Fix validation of accumulator register The `stride` and `offset` attributes are meaningful for the "virtual" register files (VGRFs, UNIFORMs and ATTRs). Accumulator is an ARF so validation should check `hstride` (part of the <V,W,H> triple) and `subnr` instead. Fixes: `12d7aaf2b8` ("intel/compiler: add more validation for acc register usage") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>	2024-03-13 03:23:30 +00:00
Caio Oliveira	db8022dc4d	intel/brw: Use helper to create accumulator register This ensure the region triple <V,W,H> is set correctly, in this case the desired region is a sequential like <8,8,1>. Without the helper the sequence we get is <0,1,0> -- which the generator currently partially adjusts when emitting code, but is not sufficient when doing validation earlier. The code generated code is slightly modified. From crucible test func.shader.subtractSaturate.uint in the fragment shader for SIMD8, the diff looks like ``` mov(8) acc0<1>UD g21<8,8,1>UD { align1 1Q $0.dst }; -add.sat(8) g22<1>UD -acc0<0,1,0>UD g16<8,8,1>UD { align1 1Q @1 $0.dst }; +add.sat(8) g22<1>UD -acc0<8,8,1>UD g16<8,8,1>UD { align1 1Q @1 $0.dst }; ``` Note that without the patch generator adjusted the hstride for acc0 used as destination (see brw_set_dest), but kept the src region as is. For the source, it is not clear to me why the <0,1,0> would work correctly here since it is a scalar, but using <8,8,1> it is correct. Fixes: `58907568ec` ("intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>	2024-03-13 03:23:30 +00:00
Paulo Zanoni	18df1a81a8	anv/trtt: update GFX_TRTT_VA_RANGE for LNL This register has changed a little bit for LNL. While this fixes sparse with TR-TT, it is worth remembering that LNL is using sparse with vm_bind by default. v2: Use the proper value instead of hardcoding 0xF (Lionel). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27316>	2024-03-13 01:50:35 +00:00
Jordan Justen	f0769f5d8a	intel/compiler: Adjust fs_visitor::emit_cs_terminate() for Xe2 Fixes: `97bf3d3b2d` ("intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28110>	2024-03-13 00:25:55 +00:00
José Roberto de Souza	31920cb60c	intel: Enable Xe KMD support by default Xe KMD landed on drm-next, uAPI is now stable and we can remove the build time parameter to enable support to it but platforms older than Lunar lake will have experimental support with Xe KMD. Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20418>	2024-03-12 22:22:50 +00:00
Kenneth Graunke	97aec40111	intel/brw: Emit better code for read_invocation(x, constant) For something as basic as read_invocation(x, 0), we were emitting: mov(8) vgrf67:D, 0d find_live_channel(8) vgrf236:UD, NoMask broadcast(8) vgrf237:D, vgrf67:D, vgrf236+0.0<0>:UD NoMask broadcast(8) vgrf235+0.0:W, vgrf197+0.0:W, vgrf237+0.0<0>:D NoMask mov(8) vgrf234+0.0:W, vgrf235+0.0<0>:W This is way overcomplicated - if the invocation is a constant, we can simply emit a single MOV which reads the desired channel index. Not only that, but it's difficult to clean up: 1. If this expression appears multiple times, CSE will find all the redundant emit_uniformize(invocation) and get rid of the duplicate (find_live_channel+broadcast) on future instructions. 2. Copy propagation will put the 0d directly in the first broadcast. 3. Dead code elimination will get rid of the vgrf67 temp holding 0. 4. Algebraic will replace the first broadcast(x, 0) with a MOV. 5. Copy propagation will put the 0d directly in the second broadcast. 6. Dead code elimination will get rid of the vgrf237 temp. 7. Algebraic will replace the second broadcast(x, 0) with a MOV. 8. Copy propagation will finally combine the two MOVs That's at least 7-8 optimization passes and several loops through the same passes just to clean up something we can do trivially. Cuts 25% of the of the optimizer steps in pipeline 22200210259a2c9c of fossil-db/google-meet-clvk/BgBlur.1f58fdf742c27594.1 (31 to 23). Shortens compilation time of the google-meet-clvk/Relight pipeline by -2.87717% +/- 0.509162% (n=150). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28097>	2024-03-12 21:58:27 +00:00
Ian Romanick	e87881f616	intel/brw: Avoid a silly add with zero in assign_curb_setup No shader-db changes. fossil-db: DG2 Totals: Instrs: 161008251 -> 161004452 (-0.00%) Cycles: 13894249509 -> 13893050101 (-0.01%); split: -0.01%, +0.00% Totals from 3804 (0.58% of 652145) affected shaders: Instrs: 2232984 -> 2229185 (-0.17%) Cycles: 7124966553 -> 7123767145 (-0.02%); split: -0.02%, +0.00% No fossil-db changes on any other platform. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Ian Romanick	d9674cbe7d	intel/brw: Combine constants for src0 of POW instructions too I tried this when I was working on MR !7698, and it didn't have much affect back then. Maybe I've added more stuff to my fossil-db? Gfx12 platforms (Tiger Lake and DG2) are unaffected because the POW instruction was removed. shader-db: Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20301933 -> 20301900 (<.01%) instructions in affected programs: 9077 -> 9044 (-0.36%) helped: 33 / HURT: 0 total cycles in shared programs: 842797624 -> 842799471 (<.01%) cycles in affected programs: 1361911 -> 1363758 (0.14%) helped: 35 / HURT: 111 LOST: 0 GAINED: 9 fossil-db: Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 165510222 -> 165510163 (-0.00%) Cycles: 15125195835 -> 15125194484 (-0.00%); split: -0.00%, +0.00% Spill count: 45204 -> 45196 (-0.02%) Fill count: 74157 -> 74149 (-0.01%) Totals from 65 (0.01% of 656118) affected shaders: Instrs: 57426 -> 57367 (-0.10%) Cycles: 1667918 -> 1666567 (-0.08%); split: -0.11%, +0.03% Spill count: 137 -> 129 (-5.84%) Fill count: 515 -> 507 (-1.55%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Ian Romanick	e7480f94c1	intel/brw: Combine constants for src0 of integer multiply too The majority of cases that would have been affected by this actually had both sources as integer constants. The earlier commit "intel/rt: Don't directly generate umul_32x16" allowed those to be constant folded. v2: Move the a-1 block to be near the existing a-1 block. No shader-db changes on any Intel platform. fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165510246 -> 165510222 (-0.00%) Cycles: 15125198238 -> 15125195835 (-0.00%); split: -0.00%, +0.00% Totals from 46 (0.01% of 656118) affected shaders: Instrs: 36010 -> 35986 (-0.07%) Cycles: 2613658 -> 2611255 (-0.09%); split: -0.17%, +0.07% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Ian Romanick	dd3bed1d92	intel/brw: Integer multiply w/ DW and W sources is not commutative The DW source must be first on all platforms since Gfx7. On previous platforms it's the other way around. Unsurprisingly, no shader-db or fossil-db changes. This change is necessary for the next commit. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Ian Romanick	93478c095e	intel/compiler: Enforce 64-bit RepCtrl restriction in eu_validate For some reason, this wasn't always caught in fs_visitor::validate. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Ian Romanick	31f640bc5f	intel/brw: Correctly dump subnr for FIXED_GRF in INTEL_DEBUG=optimizer v2: Also update printing FIXED_GRF as destionation. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Ian Romanick	f89d9cc53d	intel/brw: Silence "statement may fall through" warning src/intel/compiler/brw_lower_logical_sends.cpp: In member function ‘bool fs_visitor::lower_logical_sends()’: src/intel/compiler/brw_lower_logical_sends.cpp:3170:10: warning: this statement may fall through [-Wimplicit-fallthrough=] 3170 \| if (devinfo->has_lsc) { \| ^~ src/intel/compiler/brw_lower_logical_sends.cpp:3174:7: note: here 3174 \| case SHADER_OPCODE_DWORD_SCATTERED_READ_LOGICAL: \| ^~~~ Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>	2024-03-12 21:31:30 +00:00
Alyssa Rosenzweig	a6123a80da	nir/opt_shrink_vectors: shrink some intrinsics from start If the backend supports it, intrinsics with a component() are straightforward to shrink from the start. Notably helps vectorized I/O. v2: add an option for this and enable only on grown up backends, because some backends ignore the component() parameter. RADV GFX11: Totals from 921 (1.16% of 79439) affected shaders: Instrs: 616558 -> 615529 (-0.17%); split: -0.30%, +0.14% CodeSize: 3099864 -> 3095632 (-0.14%); split: -0.25%, +0.11% Latency: 2177075 -> 2160966 (-0.74%); split: -0.79%, +0.05% InvThroughput: 299997 -> 298664 (-0.44%); split: -0.47%, +0.02% VClause: 16343 -> 16395 (+0.32%); split: -0.01%, +0.32% SClause: 10715 -> 10714 (-0.01%) Copies: 24736 -> 24701 (-0.14%); split: -0.37%, +0.23% PreVGPRs: 30179 -> 30173 (-0.02%) VALU: 353472 -> 353439 (-0.01%); split: -0.03%, +0.02% SALU: 40323 -> 40322 (-0.00%) VMEM: 25353 -> 25352 (-0.00%) AGX: total instructions in shared programs: 2038217 -> 2038049 (<.01%) instructions in affected programs: 10249 -> 10081 (-1.64%) total alu in shared programs: 1593094 -> 1592939 (<.01%) alu in affected programs: 7145 -> 6990 (-2.17%) total fscib in shared programs: 1589254 -> 1589102 (<.01%) fscib in affected programs: 7217 -> 7065 (-2.11%) total bytes in shared programs: 13975666 -> 13974722 (<.01%) bytes in affected programs: 65942 -> 64998 (-1.43%) total regs in shared programs: 592758 -> 591187 (-0.27%) regs in affected programs: 6936 -> 5365 (-22.65%) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28004>	2024-03-12 18:17:17 +00:00
José Roberto de Souza	d1916432ab	intel/dev: Nuke display_ver It is not used. Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28128>	2024-03-12 17:44:46 +00:00
José Roberto de Souza	b09ffe48f2	intel/dev: Nuke 'ver == 10' check There is no intel_device_info with ver 10 anymore. Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28128>	2024-03-12 17:44:46 +00:00
Tapani Pälli	275bcbd7a7	anv: setup distribution granularity with Wa_14019166699 Workaround describes that we need to set instance level distribution granularity when primitive id is used by the draw. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27955>	2024-03-12 09:25:32 +00:00
Lionel Landwerlin	75c6ad9907	intel/fs: fixup sampler header message If you look at the sampler message header on Gfx9+, you'll see that we mostly only use 2 dwords (dw2 & dw3). DW2 has a bunch of sampler parameters, DW3 is the sampler handle. On Gfx9 we can micro optimize by copying r0 into the header because the HW mostly doesn't care about other DWs. We just have to clear dw2 on non VS/FS stages. On Gfx11+, we always have to do a careful copy of the r0.3 bits to mask out the bottom unrelated bits. So there, just clearing the entire header makes more sense. On Xe2+, the dw4 of the header references the sampler feedback surface handle and bit0 is a boolean to know whether to use that surface or not. So it REALLY matters to have that as 0. If we copy r0, we'll get random bits in dw4, leading to enable that surface. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28082>	2024-03-12 07:25:45 +00:00
Hyunjun Ko	db8eaa3620	anv/video: fix scan order for scaling lists on H265 decoding. The default scan order of scaling lists is up-right-diagonal according to the spec. But the device requires raster order, so we need to convert from the passed scaling lists. Fixes: `8d519eb` ("anv: add initial video decode support for h265") Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28063>	2024-03-12 03:33:49 +00:00
José Roberto de Souza	9227d63c19	anv: Fix Xe KMD userptr unbind Userptr don't have a valid gem fd so it can't use DRM_XE_VM_BIND_OP_UNMAP_ALL. Current code was unbinding workaround_bo or returning error when workaround_bo size was smaller than userptr address. So here doing a regular DRM_XE_VM_BIND_OP_UNMAP, without setting xe_bind->obj and setting xe_bind->range and xe_bind->addr. Fixes: `19439624` ("anv: Use DRM_XE_VM_BIND_OP_UNMAP_ALL to unbind whole bos") Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28114>	2024-03-11 22:00:51 +00:00
Paulo Zanoni	4c92084ed9	anv/trtt: invalidate the TLB after writing TR-TT entries We're changing the memory address translation tables, we should invalidate their cache. It seems i915.ko is already doing this for us in between batches. The xe.ko driver only adds invalidates to the ring before submissions if scratch page is enabled in the VM (which it is today, but may change in the future), and after some vm_bind and all vm_unbind ioctls, but we don't use vm_bind for TR-TT. Still, it won't hurt to have it here righ tnow. v2: Use PIPE_CONTROL_length (José). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1) Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27928>	2024-03-11 19:17:20 +00:00
Paulo Zanoni	3e5dfd668d	anv: add an anv_pipe_bits bit to allow invalidating the TLB Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27928>	2024-03-11 19:17:20 +00:00
José Roberto de Souza	52ced4008c	intel: Drop pre-production steppings Workaround tool was already updated with MTL production stepping so no need to return any stepping value for MTL. For TGL it was also updated a long time ago, so no need to check for revision 0. Reviewed-by: Mark Janes <markjanes@swizzler.org> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27399>	2024-03-11 18:52:44 +00:00
Tapani Pälli	e592ab466f	anv: use workaround framework for Wa_16013000631 This should drop it from MTL as there it should apply only for a0 stepping. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28047>	2024-03-11 08:18:26 +00:00
Caio Oliveira	e1afffe7fa	intel/brw: Use hstride instead of stride for accumulator The `stride` field is not meant to be used by ARFs (like the accumulator), and is always 1. Use the `hstride` instead. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28064>	2024-03-09 18:26:24 +00:00
Paulo Zanoni	a8f7d26c2b	anv: change the vm_bind-related kmd_backend vfuncs to return VkResult All these vfuncs funnel down to either stubs or the xe_vm_bind_op() function. By returning int we're shifting VkResult generation to the callers, which are simply not doing the correct job. If they get VkResult they can simply throw the errors up the stack without having to erroneously try to figure out what really happened. Today the callers are returning either VK_ERROR_UNKNOWN or VK_ERROR_OUT_OF_DEVICE_MEMORY, but after the patch we're returning either VK_ERROR_OUT_OF_HOST_MEMORY or VK_ERROR_DEVICE_LOST. Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>	2024-03-08 23:14:09 +00:00
Paulo Zanoni	4863e12679	anv/sparse: don't use the bind_timeline when doing sparse binding The bind_timeline is used to guarantee that non-sparse objects will be bound when batches use them (although any batch will wait on the most recent bind, even if that's not necessary). For sparse binding resources, it's up to the user to guarantee synchronization: do not force every single batch buffer to wait on the latest sparse binding operation, as that adds unnecessary synchronization points. v2: Document how each of the vfuncs interacts with bind_timeline (José). Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>	2024-03-08 23:14:09 +00:00
Paulo Zanoni	8051919b3c	anv/sparse: leave the semaphore waits and signals to the vm_bind ioctl We can now finally leave the semaphore waits and signals to the vm_bind ioctl, making vm_bind operations truly asynchronous. This was previously done for TR-TT in `18bd00c024` ("anv/trtt: don't wait/signal syncobjs using the CPU anymore"). Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>	2024-03-08 23:14:09 +00:00

... 71 72 73 74 75 ...

15202 commits