Commit graph

10519 commits

Author SHA1 Message Date
Alyssa Rosenzweig
cc3f20ca6c nir: Also gather decomposed primitive count
Simple extension.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
2023-11-07 00:05:54 +00:00
Connor Abbott
55f3f952aa vk/graphics_state, tu: Rewrite renderpass flags handling
Before this, the render pass code or the driver combined the pipeline
create flags and the implicit flags from the render pass, but the
pipeline create flags will need to be sanitized when they are dynamic
state, so we need to do it in vk_graphics_state where we know that
information.

We also weren't combining pipeline flags correctly when linking, which
on turnip was being hidden by the lack of sanitizing for driver-provided
flags. We can't combine them correctly if they're part of the render
pass state, so they need to be pulled out into the overall pipeline
state.

For drivers using emulated renderpasses or tracking feedback loop
information themselves, this won't make a difference, but we have to
adapt turnip to not pass pipeline flags. This also means that we can
drop all handling of feedback_loop_input_only in turnip and just set it
in the runtime.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25436>
2023-11-06 14:33:51 +00:00
Connor Abbott
2b62d90158 vk/graphics_state: Support VK_KHR_maintenance5
Switch to using VkPipelineCreateFlags2KHR, and use the new common helper
to get the right flags.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25436>
2023-11-06 14:33:51 +00:00
Connor Abbott
e6f5d7222c vk,lvp,tu,radv,anv: Add common vk_*_pipeline_create_flags() helper
And replace the various homegrown or copy-pasted helpers in drivers.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25436>
2023-11-06 14:33:51 +00:00
Paulo Zanoni
c2db19f496 anv: setup the TR-TT vma heap
"16TB ought to be enough for anybody."
      - Probably some Intel graphics hardware engineer

TR-TT addresses are fixed regardless of the platform's gtt_size.
Unconditionally reserve this space for it: our total 48bit address
space is 256tb and TR-TT takes 16tb out of it (1/16th).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
2023-11-04 02:06:53 +00:00
Paulo Zanoni
0a120edfb8 anv/sparse: extract anv_sparse_bind()
This function will be able to transparently handle sparse binding
regardless of the backend: vm_bind ioctls or TR-TT. For now we only
support the vm_bind ioctls, but soon we'll have anv_sparse_bind_trtt()
as an option.

It is important to notice that even backends that support the vm_bind
ioctl may choose to do Sparse binding via TR-TT, that's why we're
adding the indirection at this specific point.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
2023-11-04 02:06:53 +00:00
Paulo Zanoni
544c5c006c intel/genxml: add the Gen12+ TR-TT registers
These are the registers we're going to use for now.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
2023-11-04 02:06:52 +00:00
Paulo Zanoni
1af1426542 anv/sparse: also print bind->address at dump_anv_vm_bind
This helped tracking down xe.ko bug #746.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
2023-11-04 02:06:52 +00:00
Paulo Zanoni
b94d7dbe66 anv/sparse: join multiple NULL binds when possible
When it's a NULL bind we always set the bo_offset (aka memory offset)
to zero, so we have to avoid the "bind.offset == prev.offset + size"
check.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
2023-11-04 02:06:52 +00:00
Paulo Zanoni
2fc0bbe814 anv/sparse: join multiple bind operations when possible
If the next bind is just an extension of the previous one, join both
in the same bind operation. Due to how mip levels are laid in memory,
this can only happen for mip level 0.

As of today xe.ko doesn't try to join contiguous operations for us.
Due to how rebinds happen each additional rebind operation may end up
resulting in many extra things done, so these simple checks end up
saving us a lot of cycles the Kernel would otherwise waste. This will
be true even after we issue all binds in a single ioctl.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
2023-11-04 02:06:52 +00:00
Paulo Zanoni
2883c6ddaa anv: alloc client visible addresses at the bottom of vma_hi
Kill vma_cva and just toggle heap->alloc_high instead. This way,
client visible addresses will remain isolated in their own little
corner, except we have one less vma to deal with.

For TR-TT we'll need a special vma, and if we don't use the trick
above we'll need yet another trtt_cva_vma, increasing complexity even
more.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
2023-11-04 02:06:52 +00:00
Paulo Zanoni
e1b50074fe anv: don't forget to destroy device->vma_mutex
This actually doesn't fix any bugs or leaks, because according to the
man page:

  "In the LinuxThreads implementation, no resources are associated
   with mutex objects, thus pthread_mutex_destroy actually does
   nothing except checking that the mutex is unlocked.

still, it's better to have it than not to have it, especially since
other implementations may do something.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
2023-11-04 02:06:52 +00:00
Jesse Natalie
228329f4da vulkan: Consolidate common ICD methods
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25998>
2023-11-03 20:01:14 +00:00
Jesse Natalie
32f0034ec9 vulkan: Remove no-longer-needed prototypes for ICD entrypoints
The comment around these is no longer true, vk_icd.h does in fact
have prototypes for these functions.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25998>
2023-11-03 20:01:14 +00:00
Rohan Garg
2444a3cd46 intel/compiler: migrate WA 14013672992 to use WA framework
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26006>
2023-11-02 16:39:25 +00:00
Mark Janes
a1e6879021 anv: make shader cache content deterministic
Pointer values in shader cache data generate binary differences for
functionally identical shader content.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25923>
2023-11-02 02:53:41 +00:00
Felix DeGrood
aa23120e4f anv: remove CS_FLUSH from query regression
Fixes performance regression introduced by prior refactoring of
pipe control code that unnecessarily added CS_FLUSH to query start
and end. Issue was diagnosed by Ben L (thank you!)

Confirmed this restores performance on:
* Borderlands3 +2%
* Payday +3%
* Factorio +3%
* HogwartsLegacy +4%
* Ghostrunner +7%

Fixes: 6dc95685 (convert genX_query pipe controls to use pc helper)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25983>
2023-11-02 02:28:02 +00:00
Kenneth Graunke
fddad4d5f9 intel/compiler: Assert that FS_OPCODE_[REP_]FB_WRITE is for pre-Gfx7
We use SHADER_OPCODE_SEND directly instead of FS_OPCODE_FB_WRITE (for a
while now) and FS_OPCODE_REP_FB_WRITE (since the previous commit).

Assert that it isn't used on Gfx7+.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
48f60f4c4b intel/compiler: Convert the repclear shader to use send-from-GRF
Sandybridge uses this code and needs MRFs, but all other platforms send
from GRFs.  Do that directly rather than relying on the MRF hack.

Ivybridge and later also use SHADER_OPCODE_SEND directly rather than
a virtual opcode that's handled in the generator, so we follow suit.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
ef7d1b5f44 intel/compiler: Drop unused saturate handling in repclear shader
We never set key->clamp_fragment_color when compiling the BLORP fast
clear shaders.  Besides, we were setting saturate on an FB write opcode,
which...isn't even a thing.  We would need it on the MOV, and weren't
setting it there.  So it wouldn't have even worked.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
e6d9267d4f intel/compiler: Delete repclear shader's special case for 1 color target
This is basically just once through the loop but copy and pasted.  One
difference is that the single render target case used a headerless
message, and the multiple render target case always used headers.

Now we use headerless messages for the first render target, even in the
multiple render target case.  While we already have it set up for the
other RTs, it's still 2 fewer registers to send.  Minor improvement.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
e6460fe66b intel/compiler: Delete unused repclear shader uniform handling
A long time ago, we used a uniform for the clear color.  Back in 2014,
we added support for using a flat input instead, as this was easier for
Vulkan, but we left the option of using a uniform for OpenGL.

Eventually nobody used the uniform approach anymore, but the compiler
code to handle it remained.  Drop the dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Kenneth Graunke
b35f1fc910 intel/compiler: Delete unused emit_dummy_fs()
This code is compiled out, but has been left in place in case we wanted
to use it for debugging something.  In the olden days, we'd use it for
platform enabling.  I can't think of the last time we did that, though.

I also used to use it for debugging.  If something was misrendering, I'd
iterate through shaders 0..N, replacing them with "draw hot pink" until
whatever shader was drawing the bad stuff was brightly illuminated.
Once it was identified, I'd start investigating that shader.

These days, we have frameretrace and renderdoc which are like, actual
tools that let you highlight draws and replace shaders.  So we don't
need to resort iterative driver hacks anymore.  Again, I can't think of
the last time I actually did that.

So, this code is basically just dead.  And it's using legacy MRF paths,
which we could update...or we could just delete it.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Lionel Landwerlin
cdca0b2ce4 anv: fix corner case of mutable descriptor pool creation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 63e91148b7 ("anv: Enable VK_VALVE_mutable_descriptor_type")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10065
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25958>
2023-10-30 18:29:46 +00:00
Lionel Landwerlin
e64a97694a anv: use anv_state_pool_state_address for blorp vertex buffer address
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
2023-10-30 14:47:18 +00:00
Lionel Landwerlin
8d813a90d6 anv: fail pool allocation when over the maximal size
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
2023-10-30 14:47:18 +00:00
Lionel Landwerlin
8fc42d83be anv: make sure pools can handle more than 2Gb
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
2023-10-30 14:47:18 +00:00
Lionel Landwerlin
cc67bd48d9 anv: add max_size argument for block & state pools
Not enforced yet.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
2023-10-30 14:47:18 +00:00
Lionel Landwerlin
b30428416a anv: deal with state stream allocation failures
In case we run out of space, all the parts of the driver that rely on
this should deal with failure. The helpers will set the batch in error
state so that it cannot be submitted by the application.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
2023-10-30 14:47:18 +00:00
Lionel Landwerlin
ed83d1415c anv: rename internal heaps
Some of the names are a bit confusing. The main change is to introduce
the "indirect_" prefix.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
2023-10-30 14:47:18 +00:00
Lionel Landwerlin
f9753488ec blorp: handle binding table & surface state allocation failures
The embedding driver could be failing the allocation for whatever
reason, in which case we should skip the surface state writes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
2023-10-30 14:47:18 +00:00
Tapani Pälli
2833d1ade1 intel/dev: fix intel_device_info_is_adln check
We cannot compare pointer, patch adds is_adl_n to devinfo for detection.

Fixes: 3cf71ddfac ("intel/dev: provide intel_device_info_is_adln helper")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25934>
2023-10-30 11:20:41 +00:00
Jordan Justen
9bd47aabaf anv: Add more space for init_render_queue_state() batch (MTL regression)
It may be some MTL specific code paths, but 7cdacaf493 is triggering
anvil to run out of space when initializing the render batch.

Fixes: 7cdacaf493 ("intel/xehp: Adjust TBIMR performance chicken bits.")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25949>
2023-10-30 10:05:10 +00:00
Francisco Jerez
57decad976 intel/xehp: Enable TBIMR by default.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:50:42 -07:00
Francisco Jerez
ed9886321c intel/xehp+: Use TBIMR tile box check in order to avoid performance regressions.
This allows the hardware to behave as if TBIMR was disabled until a
polygon is processed which spans at least one tile.  This is a rather
heavy-handed heuristic meant to prevent regressions in heavily
geometry-bound workloads that render large numbers of tiny primitives
much smaller than a TBIMR tile.

A particularly bad example of this was observed in SoTR, where certain
draw calls with a long-running VS and a mostly trivial PS render more
triangles than pixels, filling up the URB and TBIMR batch pretty
quickly, which causes EU utilization to tank (since once the URB has
filled up the parallelism of the VS is limited by the number of
polygons that fit in a TBIMR batch at the completion of each tile
walk, which isn't a lot in relation to the total EU count of a DG2),
and causes the bottleneck to be the rate at which the tile sequencer
performs additional tile passes, each one processing a small number
(<1024 polygons) of the hundreds of thousands of triangles of the
draw call.

Enabling this heuristic seems effective at avoiding that scenario in
SoTR among other titles (e.g. Total War Warhammer 3), but it's a bit
of a compromise since one could imagine cases where TBIMR is helpful
even if the geometry doesn't pass the box check, so a better heuristic
or a driconf rule may be useful in the future.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:50:42 -07:00
Francisco Jerez
f0d24b155b intel/xehp+: Adjust TBIMR batch size based on slice count.
This programs a TBIMR batch size equal to 128 polygons per slice in
order to match the hardware spec recommendation (BSpec 68436).  This
has been confirmed to improve performance slightly relative to the
hardware default batch size of 256 polygons.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:50:42 -07:00
Francisco Jerez
7cdacaf493 intel/xehp: Adjust TBIMR performance chicken bits.
This enables a couple of TBIMR performance tunables in
CHICKEN_RASTER_2 that default to disabled.  TBIMR fast clip appears to
help slightly with some geometry-bound workloads.  TBIMR open batch
allows the rasterizer to start working immediately on the first tile
of the framebuffer, even before the batch has been closed, which helps
reduce the latency cost of the tile walk.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:50:42 -07:00
Francisco Jerez
08fd259b5b anv/xehp+: Enable TBIMR in generated draw calls.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:50:42 -07:00
Francisco Jerez
65bbe58b25 anv/xehp: Implement TBIMR tile pass setup and pipeline bandwidth estimation.
This sets up the basic parameters needed for tiled rendering based on
a back-of-the-envelope estimate of the amount of memory used by the
pixel pipeline during the tile pass.  The actual cache footprint of a
tile can vary wildly based on runtime factors which aren't easily
predictable based on static analysis, so this is only intended to
provide a rough approximation within the right order of magnitude.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:50:42 -07:00
Francisco Jerez
694d64188b intel/xehp+: Define driconf option for selectively disabling TBIMR.
This may help debugging performance problems in the possible case that
TBIMR negatively impacts the performance of some application.  It could
also allow applying application-specific band-aid fixes in the XML file
until a more general workaround is implemented.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:29 -07:00
Francisco Jerez
da28582eec intel/xehp+: Add dynamic state flags controlling whether TBIMR is enabled during 3D primitives.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:29 -07:00
Francisco Jerez
622c2498d4 intel/xehp+: Import algorithm for TBIMR tiling parameter calculation.
This implements a minimalistic algorithm that can be used to obtain an
approximate solution for the integer programming problem of finding
the optimal tile dimensions based on an estimate of the tile cache
consumption per pixel of the current graphics pipeline -- Including
the TC footprint of render targets, depth and stencil buffers and
their auxiliary surfaces.  Considering other (less local) memory
accesses performed by the pipeline (like texturing and shader storage)
would be useful (and could be considered by this algorithm with little
modification), but it would be pretty difficult to estimate the L3
cache consumption per pixel of such accesses based on static analysis
of the pipeline state alone without some sort of dynamic feedback.

The present algorithm returns a config with tile area large enough to
utilize a target fraction of the L3, which can be adjusted to obtain
greater/lower utilization of the L3 at the cost of higher/lower risk
of L3 cache thrashing respectively.  The aspect ratio of the tile
layout returned attempts to minimize the number of poorly utilized
tiles around the boundaries of the framebuffer (due to partial
coverage), since having the tile sequencer process additional tiles
comes at a cost due to the latency of the additional passes, even if
they're mostly empty.  Finally, among the solutions with satisfactory
cache footprint and tile count, the tile aspect ratio closest to 1 is
returned where possible, since tiles with very high aspect ratios can
have a negative impact on cache locality.

The algorithm is primarily intended for TBIMR, but it could be used
for PTBR as well with little modifications, since the TBIMR-specific
assumptions are few and noted in comments below.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:29 -07:00
Francisco Jerez
cec5541b02 intel/xehp+: Add TBIMR-related genxml definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:29 -07:00
Francisco Jerez
3e3fd921ac intel/mtl: Import L3 cache configurations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:28 -07:00
Francisco Jerez
468904e833 intel/dg2: Import L3 cache configurations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:28 -07:00
Jordan Justen
524996106c intel/l3: Use devinfo->urb.size when cfg urb-size is 0.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:28 -07:00
Anuj Phogat
ed5ff8f297 intel/l3: Adjust URB weight calculation for gfx12.5+.
Gfx12.5+ devices use special-purpose memory for the URB instead of
requiring a portion of the L3 to be carved out.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:28 -07:00
Francisco Jerez
6b9583734b intel/l3: Set up L3FullWayAllocationEnable config if ALL partition has over 126 ways.
L3 configurations with an ALL partition of 128 ways per bank or more
cannot be represented with the normal L3ALLOC partitioning mechanism
since the "All L3 client pool" field would overflow, instead the
L3FullWayAllocationEnable bit has to be set, which causes the whole L3
to be used in a unified cache configuration.

That's precisely the configuration we're currently using on recent
platforms, but previously we were relying on the L3 config tables
being empty and the selected L3 configuration being a NULL pointer to
detect this condition.  This is about change, the L3 configuration
structure will be defined for gfx12.5+ platforms since they provide
useful information about the cache hierarchy to the drivers.  Instead
of checking whether the pointer is NULL in order to apply a unified L3
cache configuration, use it when there is a single ALL partition
larger than can be represented via L3ALLOC.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:28 -07:00
Francisco Jerez
f36027f389 intel/l3: Define helper for obtaining the size of an L3 partition in KB.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:28 -07:00
Francisco Jerez
19e62e8fba intel/l3/gfx11+: Add tile cache partition to intel_l3_config struct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27 14:48:28 -07:00