fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-28 12:08:24 +02:00

Author	SHA1	Message	Date
Sagar Ghuge	108f880986	anv: Handle video/copy engine queue initialization We don't need to setup any state for video/copy usually but platform that supports the aux map, we need to init the aux map by programming equivalent registers. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26409>	2023-12-14 00:53:15 +00:00
Lionel Landwerlin	7c76125db2	anv: use 2 different buffers for surfaces/samplers in descriptor sets We had the unfortunate finding on a recent platform to learn that the bindless sampler heap is not functioning as expected. Nowhere in the documentation is the size of the heap written down. So most people assumed that's the max number that we can program (4Gb). The reality is that it's only 64Mb. Though it is appearing like it's working properly for the whole 4Gb range for most apps, this is only because the HW bounds checking applied is broken. Instead of clamping anything beyong 64Mb, it's only clamping the last 4Kb of each 64Mb region. So this heap is useless for us to make a 4Gb region of both sampler & surface states... This change essentially turns off the bindless sampler heap on DG2+. The only location where we can put SAMPLER_STATE elements is the dynamic state heap. Unfortunately we cannot align the dynamic state heap with the bindless surface state heap. So the solution is to allocate sampler & surface states separately, each from the own heap in the descriptor pool. We now have to provide 2 sets of offsets for surfaces & samplers. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25897>	2023-12-04 23:06:05 +00:00
Lionel Landwerlin	18a1234541	anv: add a sampler state pool Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25897>	2023-12-04 23:06:05 +00:00
Francisco Jerez	6a810b0ba8	intel: Improve N-way pixel hashing computation to handle pixel pipes with asymmetric processing power. This reworks the intel_compute_pixel_hash_table_nway() pixel pipe hashing table computation helper to handle cases where some pixel pipes have processing power different from the others, this is helpful for Gfx12.7+ platforms where there are pixel pipes with 1 DSS as well as pixel pipes with 2 DSSes, which currently can lead to a serious performance bottleneck in the pixel pipes with lower processing power. In order to avoid such a load imbalance the intel_compute_pixel_hash_table_nway() function will now take two pixel pipe bitsets instead of one: Pixel pipes enabled on both bitsets will appear with twice the frequency on the table as pixel pipes which only appear on one bitset. See the comments below for more details on the algorithm used to construct a pixel hashing table with the desired properties. With this change rendering performance improves by about 25% on a fused MTL platform -- The list of specific configs this is expected to show an improvement on is not included here since the list is rather long and some of the configs may still be embargoed or may never be productized, but in order to find out whether your Gfx12.7+ device could be affected by this you can check the output of the intel_dev_info tool from the Mesa tree and see if there are multiple "pixel pipe" entries with different DSS count. That isn't expected to occur on any DG2 configuration, only on MTL+ platforms, so this change should have no effect at all on DG2 (it's easy to convince oneself that it won't since for DG2 mask1 should equal mask2 so mask2 will be set to zero at the beginning of intel_compute_pixel_hash_table_nway() and the new swzx[] permutation will be set to the identity). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26266>	2023-11-20 23:48:34 +00:00
Paulo Zanoni	04bfe828db	anv/sparse: allow sparse resouces to use TR-TT as its backend TR-TT is a hardware feature supported by both i915.ko and xe.ko, which means we can now finally have Sparse Resources on i915.ko and we also have 2 options for xe.ko (and whatever is the best should be the default). In this patch we use batch commands to write the page tables and forever keep them in device memory. We maintain a mirror of both the L3 and and L2 tables because that helps us never having to read the tables that are in device memory. We still have some things to improve, but with this commit, workloads that didn't work at all due to the lack of sparse resources should at least run. This is still all disabled by default in i915.ko, you can turn it on by exporting ANV_SPARSE=1 before launching the applications. For xe.ko, switch the default with ANV_SPARSE_USE_TRTT=1. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>	2023-11-17 17:58:28 +00:00
Tapani Pälli	01046cd6ad	anv/drirc: add option to disable FCV optimization There are rendering issues with FCV on DG2 and Unreal engine 5.1, patch adds option to disable fcv in drirc. Cc: mesa-stable Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26169>	2023-11-15 18:16:56 +00:00
Sagar Ghuge	ee48b12a8f	anv: Avoid emitting PIPE_CONTROL command for copy/video queue Avoid emitting PIPE_CONTROL instruction since Copy/Video doesn't support it. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26121>	2023-11-13 23:43:27 +00:00
Lionel Landwerlin	ed83d1415c	anv: rename internal heaps Some of the names are a bit confusing. The main change is to introduce the "indirect_" prefix. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>	2023-10-30 14:47:18 +00:00
Jordan Justen	9bd47aabaf	anv: Add more space for init_render_queue_state() batch (MTL regression) It may be some MTL specific code paths, but `7cdacaf493` is triggering anvil to run out of space when initializing the render batch. Fixes: `7cdacaf493` ("intel/xehp: Adjust TBIMR performance chicken bits.") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25949>	2023-10-30 10:05:10 +00:00
Francisco Jerez	f0d24b155b	intel/xehp+: Adjust TBIMR batch size based on slice count. This programs a TBIMR batch size equal to 128 polygons per slice in order to match the hardware spec recommendation (BSpec 68436). This has been confirmed to improve performance slightly relative to the hardware default batch size of 256 polygons. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>	2023-10-27 14:50:42 -07:00
Francisco Jerez	7cdacaf493	intel/xehp: Adjust TBIMR performance chicken bits. This enables a couple of TBIMR performance tunables in CHICKEN_RASTER_2 that default to disabled. TBIMR fast clip appears to help slightly with some geometry-bound workloads. TBIMR open batch allows the rasterizer to start working immediately on the first tile of the framebuffer, even before the batch has been closed, which helps reduce the latency cost of the tile walk. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>	2023-10-27 14:50:42 -07:00
Francisco Jerez	6b9583734b	intel/l3: Set up L3FullWayAllocationEnable config if ALL partition has over 126 ways. L3 configurations with an ALL partition of 128 ways per bank or more cannot be represented with the normal L3ALLOC partitioning mechanism since the "All L3 client pool" field would overflow, instead the L3FullWayAllocationEnable bit has to be set, which causes the whole L3 to be used in a unified cache configuration. That's precisely the configuration we're currently using on recent platforms, but previously we were relying on the L3 config tables being empty and the selected L3 configuration being a NULL pointer to detect this condition. This is about change, the L3 configuration structure will be defined for gfx12.5+ platforms since they provide useful information about the cache hierarchy to the drivers. Instead of checking whether the pointer is NULL in order to apply a unified L3 cache configuration, use it when there is a single ALL partition larger than can be represented via L3ALLOC. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>	2023-10-27 14:48:28 -07:00
Tapani Pälli	2254eaa3ae	anv: add current_pipeline for batch_emit_pipe_control This way we can implemented workarounds depending on the pipeline. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25671>	2023-10-26 11:51:47 +00:00
Lionel Landwerlin	a97065adab	anv: fix uninitialized use of compute initialization batch We sometimes fail initialization. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `09d12e6727` ("anv: Add support for I915_ENGINE_CLASS_COMPUTE in init_device_state()") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25891>	2023-10-25 19:27:23 +00:00
Rohan Garg	3bf1b7deba	anv: selectively enable FCV optimization for DG2 Enabling FCV on MTL breaks a number of games and benchmarks. Let's disable it for now till we can root cause the issue. Closes: #9987 Fixes: 26c2c9 ('anv: enable FCV for Gen12.5') Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25863>	2023-10-24 19:27:14 +00:00
Rohan Garg	f85d8d908c	anv: cleanup includes Signed-off-by: Rohan Garg <rohan.garg@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25766>	2023-10-24 10:33:57 +00:00
Felix DeGrood	b561bcd78c	anv: set ComputeMode.PixelAsyncComputeThreadLimit = 4 Heuristic-based optimization throttling CCS work (async compute). Without throttling, background compute work consumes all threads, deminishing performance gains by running dispatch in parallel with 3D work. Optimization is heuristics based, meaning a workload might slow down when using async compute. Best value: PixelAsyncComputeThreadLimit = 4. On DG2, this equates to a max CCS thread occupancy of 37.5%. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25508>	2023-10-17 18:09:29 +00:00
Rohan Garg	26c2c96d62	anv: enable FCV for Gen12.5 Now that we have proper handling of FCV_CCS_E everywhere, we can turn this on for Gen12.5. This helps fix a performance regression where enabling fast clears to non-zero values with CCS_E caused additional partial resolves, regressing performance on certain games. Performance is helped on the following games: - F1'22: +45% - RDR2: +6% Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25589>	2023-10-11 12:18:15 +00:00
Kenneth Graunke	17b8b2cffd	anv: Add support for a transfer queue on Alchemist Alchemist has an improved blitter that's sufficiently powerful to implement a transfer queue. Tigerlake's blitter lacks compression handling and other features we need, unfortunately. Rework (Sagar): - Check blitter command buffer in EndCommandBuffer v2: (Lionel) - Look at image, buffer and memory barriers as well - Flush cache if there is queue ownership transfer Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18325>	2023-10-03 18:02:52 +00:00
Jordan Justen	65684b0c7f	anv: Disable Ray Tracing on xe2 until our compiler supports Xe2 RT Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25411>	2023-09-27 21:11:18 +00:00
Tapani Pälli	8d2dcd55d7	anv: refactor to fix pipe control debugging While earlier changes to pipe control emission allowed debug dump of each pipe control, they also changed debug output to almost always print same reason/function for each pc. These changes fix the output so that we print the original function name where pc is emitted. As example: pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_batch_emit_pipe_control_write pc: emit PC=( ) reason: gfx11_batch_emit_pipe_control_write changes back to: pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_emit_apply_pipe_flushes pc: emit PC=( ) reason: cmd_buffer_emit_depth_stencil Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25282>	2023-09-20 06:04:37 +00:00
Sagar Ghuge	6a89507be8	anv: Program and emit STATE_COMPUTE_MODE Don't rely on the HW to set values correctly so just emit STATE_COMPUTE_MODE with default values set to zero. Also, this change includes workaround changes:- - 14015808183 (Parent HSD 14015782607) - Need to emit pipe control with HDC flush and untyped cache flush set to 1 when CCS has non-pipelined state update with STATE_COMPUTE_MODE. - 14014427904 (Parent HSD 22013045878) - We need additional invalidate/flush when emitting non-pipelined state commands with multiple CCS enabled. v2: (Tapani) - Use lineage HSD numbers for check - Don't use poisoned WA directly - Use intel_needs_workaround helper Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24508>	2023-09-08 23:08:26 +00:00
Sagar Ghuge	a63277ec36	anv: Execute RCS init batch on companion RCS context/engine Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>	2023-09-07 06:39:06 +00:00
Sagar Ghuge	103512ef3b	anv: Move compute specfic bits under compute queue init Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>	2023-09-07 06:39:06 +00:00
Lionel Landwerlin	b1614c4e22	anv: rename files to represent their usage gfx8_cmd_buffer.c does not apply to gfx8 anymore for instance, it can also be included in all builds. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>	2023-09-06 20:07:01 +00:00

25 commits