It may be some MTL specific code paths, but 7cdacaf493 is triggering
anvil to run out of space when initializing the render batch.
Fixes: 7cdacaf493 ("intel/xehp: Adjust TBIMR performance chicken bits.")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25949>
This programs a TBIMR batch size equal to 128 polygons per slice in
order to match the hardware spec recommendation (BSpec 68436). This
has been confirmed to improve performance slightly relative to the
hardware default batch size of 256 polygons.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This enables a couple of TBIMR performance tunables in
CHICKEN_RASTER_2 that default to disabled. TBIMR fast clip appears to
help slightly with some geometry-bound workloads. TBIMR open batch
allows the rasterizer to start working immediately on the first tile
of the framebuffer, even before the batch has been closed, which helps
reduce the latency cost of the tile walk.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
L3 configurations with an ALL partition of 128 ways per bank or more
cannot be represented with the normal L3ALLOC partitioning mechanism
since the "All L3 client pool" field would overflow, instead the
L3FullWayAllocationEnable bit has to be set, which causes the whole L3
to be used in a unified cache configuration.
That's precisely the configuration we're currently using on recent
platforms, but previously we were relying on the L3 config tables
being empty and the selected L3 configuration being a NULL pointer to
detect this condition. This is about change, the L3 configuration
structure will be defined for gfx12.5+ platforms since they provide
useful information about the cache hierarchy to the drivers. Instead
of checking whether the pointer is NULL in order to apply a unified L3
cache configuration, use it when there is a single ALL partition
larger than can be represented via L3ALLOC.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This way we can implemented workarounds depending on the pipeline.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25671>
We sometimes fail initialization.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 09d12e6727 ("anv: Add support for I915_ENGINE_CLASS_COMPUTE in init_device_state()")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25891>
Enabling FCV on MTL breaks a number of games and benchmarks. Let's
disable it for now till we can root cause the issue.
Closes: #9987
Fixes: 26c2c9 ('anv: enable FCV for Gen12.5')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25863>
Heuristic-based optimization throttling CCS work (async compute).
Without throttling, background compute work consumes all threads,
deminishing performance gains by running dispatch in parallel with
3D work.
Optimization is heuristics based, meaning a workload might slow
down when using async compute.
Best value: PixelAsyncComputeThreadLimit = 4. On DG2, this
equates to a max CCS thread occupancy of 37.5%.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25508>
Now that we have proper handling of FCV_CCS_E everywhere, we can turn
this on for Gen12.5.
This helps fix a performance regression where enabling fast
clears to non-zero values with CCS_E caused additional partial resolves,
regressing performance on certain games. Performance is helped on the
following games:
- F1'22: +45%
- RDR2: +6%
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25589>
Alchemist has an improved blitter that's sufficiently powerful to
implement a transfer queue. Tigerlake's blitter lacks compression
handling and other features we need, unfortunately.
Rework (Sagar):
- Check blitter command buffer in EndCommandBuffer
v2: (Lionel)
- Look at image, buffer and memory barriers as well
- Flush cache if there is queue ownership transfer
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18325>
While earlier changes to pipe control emission allowed debug dump of
each pipe control, they also changed debug output to almost always print
same reason/function for each pc. These changes fix the output so that
we print the original function name where pc is emitted.
As example:
pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_batch_emit_pipe_control_write
pc: emit PC=( ) reason: gfx11_batch_emit_pipe_control_write
changes back to:
pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_emit_apply_pipe_flushes
pc: emit PC=( ) reason: cmd_buffer_emit_depth_stencil
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25282>
Don't rely on the HW to set values correctly so just emit
STATE_COMPUTE_MODE with default values set to zero.
Also, this change includes workaround changes:-
- 14015808183 (Parent HSD 14015782607) - Need to emit pipe control
with HDC flush and untyped cache flush set to 1 when CCS has
non-pipelined state update with STATE_COMPUTE_MODE.
- 14014427904 (Parent HSD 22013045878) - We need additional
invalidate/flush when emitting non-pipelined state commands with
multiple CCS enabled.
v2: (Tapani)
- Use lineage HSD numbers for check
- Don't use poisoned WA directly
- Use intel_needs_workaround helper
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24508>
gfx8_cmd_buffer.c does not apply to gfx8 anymore for instance, it can
also be included in all builds.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>
2023-09-06 20:07:01 +00:00
Renamed from src/intel/vulkan/genX_state.c (Browse further)