We don't need to setup any state for video/copy usually but platform
that supports the aux map, we need to init the aux map by programming
equivalent registers.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26409>
We had the unfortunate finding on a recent platform to learn that the
bindless sampler heap is not functioning as expected.
Nowhere in the documentation is the size of the heap written down. So
most people assumed that's the max number that we can program (4Gb).
The reality is that it's only 64Mb.
Though it is appearing like it's working properly for the whole 4Gb
range for most apps, this is only because the HW bounds checking
applied is broken. Instead of clamping anything beyong 64Mb, it's only
clamping the last 4Kb of each 64Mb region.
So this heap is useless for us to make a 4Gb region of both sampler &
surface states...
This change essentially turns off the bindless sampler heap on DG2+.
The only location where we can put SAMPLER_STATE elements is the
dynamic state heap. Unfortunately we cannot align the dynamic state
heap with the bindless surface state heap. So the solution is to
allocate sampler & surface states separately, each from the own heap
in the descriptor pool.
We now have to provide 2 sets of offsets for surfaces & samplers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25897>
This reworks the intel_compute_pixel_hash_table_nway() pixel pipe
hashing table computation helper to handle cases where some pixel
pipes have processing power different from the others, this is helpful
for Gfx12.7+ platforms where there are pixel pipes with 1 DSS as well
as pixel pipes with 2 DSSes, which currently can lead to a serious
performance bottleneck in the pixel pipes with lower processing power.
In order to avoid such a load imbalance the
intel_compute_pixel_hash_table_nway() function will now take two pixel
pipe bitsets instead of one: Pixel pipes enabled on both bitsets will
appear with twice the frequency on the table as pixel pipes which only
appear on one bitset. See the comments below for more details on the
algorithm used to construct a pixel hashing table with the desired
properties.
With this change rendering performance improves by about 25% on a
fused MTL platform -- The list of specific configs this is expected to
show an improvement on is not included here since the list is rather
long and some of the configs may still be embargoed or may never be
productized, but in order to find out whether your Gfx12.7+ device
could be affected by this you can check the output of the
intel_dev_info tool from the Mesa tree and see if there are multiple
"pixel pipe" entries with different DSS count. That isn't expected to
occur on any DG2 configuration, only on MTL+ platforms, so this change
should have no effect at all on DG2 (it's easy to convince oneself
that it won't since for DG2 mask1 should equal mask2 so mask2 will be
set to zero at the beginning of intel_compute_pixel_hash_table_nway()
and the new swzx[] permutation will be set to the identity).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26266>
TR-TT is a hardware feature supported by both i915.ko and xe.ko, which
means we can now finally have Sparse Resources on i915.ko and we also
have 2 options for xe.ko (and whatever is the best should be the
default).
In this patch we use batch commands to write the page tables and
forever keep them in device memory. We maintain a mirror of both the
L3 and and L2 tables because that helps us never having to read the
tables that are in device memory.
We still have some things to improve, but with this commit, workloads
that didn't work at all due to the lack of sparse resources should
at least run.
This is still all disabled by default in i915.ko, you can turn it on
by exporting ANV_SPARSE=1 before launching the applications. For
xe.ko, switch the default with ANV_SPARSE_USE_TRTT=1.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
There are rendering issues with FCV on DG2 and Unreal engine 5.1,
patch adds option to disable fcv in drirc.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26169>
Some of the names are a bit confusing. The main change is to introduce
the "indirect_" prefix.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
It may be some MTL specific code paths, but 7cdacaf493 is triggering
anvil to run out of space when initializing the render batch.
Fixes: 7cdacaf493 ("intel/xehp: Adjust TBIMR performance chicken bits.")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25949>
This programs a TBIMR batch size equal to 128 polygons per slice in
order to match the hardware spec recommendation (BSpec 68436). This
has been confirmed to improve performance slightly relative to the
hardware default batch size of 256 polygons.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This enables a couple of TBIMR performance tunables in
CHICKEN_RASTER_2 that default to disabled. TBIMR fast clip appears to
help slightly with some geometry-bound workloads. TBIMR open batch
allows the rasterizer to start working immediately on the first tile
of the framebuffer, even before the batch has been closed, which helps
reduce the latency cost of the tile walk.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
L3 configurations with an ALL partition of 128 ways per bank or more
cannot be represented with the normal L3ALLOC partitioning mechanism
since the "All L3 client pool" field would overflow, instead the
L3FullWayAllocationEnable bit has to be set, which causes the whole L3
to be used in a unified cache configuration.
That's precisely the configuration we're currently using on recent
platforms, but previously we were relying on the L3 config tables
being empty and the selected L3 configuration being a NULL pointer to
detect this condition. This is about change, the L3 configuration
structure will be defined for gfx12.5+ platforms since they provide
useful information about the cache hierarchy to the drivers. Instead
of checking whether the pointer is NULL in order to apply a unified L3
cache configuration, use it when there is a single ALL partition
larger than can be represented via L3ALLOC.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This way we can implemented workarounds depending on the pipeline.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25671>
We sometimes fail initialization.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 09d12e6727 ("anv: Add support for I915_ENGINE_CLASS_COMPUTE in init_device_state()")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25891>
Enabling FCV on MTL breaks a number of games and benchmarks. Let's
disable it for now till we can root cause the issue.
Closes: #9987
Fixes: 26c2c9 ('anv: enable FCV for Gen12.5')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25863>
Heuristic-based optimization throttling CCS work (async compute).
Without throttling, background compute work consumes all threads,
deminishing performance gains by running dispatch in parallel with
3D work.
Optimization is heuristics based, meaning a workload might slow
down when using async compute.
Best value: PixelAsyncComputeThreadLimit = 4. On DG2, this
equates to a max CCS thread occupancy of 37.5%.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25508>
Now that we have proper handling of FCV_CCS_E everywhere, we can turn
this on for Gen12.5.
This helps fix a performance regression where enabling fast
clears to non-zero values with CCS_E caused additional partial resolves,
regressing performance on certain games. Performance is helped on the
following games:
- F1'22: +45%
- RDR2: +6%
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25589>
Alchemist has an improved blitter that's sufficiently powerful to
implement a transfer queue. Tigerlake's blitter lacks compression
handling and other features we need, unfortunately.
Rework (Sagar):
- Check blitter command buffer in EndCommandBuffer
v2: (Lionel)
- Look at image, buffer and memory barriers as well
- Flush cache if there is queue ownership transfer
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18325>
While earlier changes to pipe control emission allowed debug dump of
each pipe control, they also changed debug output to almost always print
same reason/function for each pc. These changes fix the output so that
we print the original function name where pc is emitted.
As example:
pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_batch_emit_pipe_control_write
pc: emit PC=( ) reason: gfx11_batch_emit_pipe_control_write
changes back to:
pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_emit_apply_pipe_flushes
pc: emit PC=( ) reason: cmd_buffer_emit_depth_stencil
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25282>
Don't rely on the HW to set values correctly so just emit
STATE_COMPUTE_MODE with default values set to zero.
Also, this change includes workaround changes:-
- 14015808183 (Parent HSD 14015782607) - Need to emit pipe control
with HDC flush and untyped cache flush set to 1 when CCS has
non-pipelined state update with STATE_COMPUTE_MODE.
- 14014427904 (Parent HSD 22013045878) - We need additional
invalidate/flush when emitting non-pipelined state commands with
multiple CCS enabled.
v2: (Tapani)
- Use lineage HSD numbers for check
- Don't use poisoned WA directly
- Use intel_needs_workaround helper
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24508>
gfx8_cmd_buffer.c does not apply to gfx8 anymore for instance, it can
also be included in all builds.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>
2023-09-06 20:07:01 +00:00
Renamed from src/intel/vulkan/genX_state.c (Browse further)