We use SHADER_OPCODE_SEND directly instead of FS_OPCODE_FB_WRITE (for a
while now) and FS_OPCODE_REP_FB_WRITE (since the previous commit).
Assert that it isn't used on Gfx7+.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
Sandybridge uses this code and needs MRFs, but all other platforms send
from GRFs. Do that directly rather than relying on the MRF hack.
Ivybridge and later also use SHADER_OPCODE_SEND directly rather than
a virtual opcode that's handled in the generator, so we follow suit.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
We never set key->clamp_fragment_color when compiling the BLORP fast
clear shaders. Besides, we were setting saturate on an FB write opcode,
which...isn't even a thing. We would need it on the MOV, and weren't
setting it there. So it wouldn't have even worked.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
This is basically just once through the loop but copy and pasted. One
difference is that the single render target case used a headerless
message, and the multiple render target case always used headers.
Now we use headerless messages for the first render target, even in the
multiple render target case. While we already have it set up for the
other RTs, it's still 2 fewer registers to send. Minor improvement.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
A long time ago, we used a uniform for the clear color. Back in 2014,
we added support for using a flat input instead, as this was easier for
Vulkan, but we left the option of using a uniform for OpenGL.
Eventually nobody used the uniform approach anymore, but the compiler
code to handle it remained. Drop the dead code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
This code is compiled out, but has been left in place in case we wanted
to use it for debugging something. In the olden days, we'd use it for
platform enabling. I can't think of the last time we did that, though.
I also used to use it for debugging. If something was misrendering, I'd
iterate through shaders 0..N, replacing them with "draw hot pink" until
whatever shader was drawing the bad stuff was brightly illuminated.
Once it was identified, I'd start investigating that shader.
These days, we have frameretrace and renderdoc which are like, actual
tools that let you highlight draws and replace shaders. So we don't
need to resort iterative driver hacks anymore. Again, I can't think of
the last time I actually did that.
So, this code is basically just dead. And it's using legacy MRF paths,
which we could update...or we could just delete it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
In case we run out of space, all the parts of the driver that rely on
this should deal with failure. The helpers will set the batch in error
state so that it cannot be submitted by the application.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
Some of the names are a bit confusing. The main change is to introduce
the "indirect_" prefix.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
The embedding driver could be failing the allocation for whatever
reason, in which case we should skip the surface state writes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
It may be some MTL specific code paths, but 7cdacaf493 is triggering
anvil to run out of space when initializing the render batch.
Fixes: 7cdacaf493 ("intel/xehp: Adjust TBIMR performance chicken bits.")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25949>
This allows the hardware to behave as if TBIMR was disabled until a
polygon is processed which spans at least one tile. This is a rather
heavy-handed heuristic meant to prevent regressions in heavily
geometry-bound workloads that render large numbers of tiny primitives
much smaller than a TBIMR tile.
A particularly bad example of this was observed in SoTR, where certain
draw calls with a long-running VS and a mostly trivial PS render more
triangles than pixels, filling up the URB and TBIMR batch pretty
quickly, which causes EU utilization to tank (since once the URB has
filled up the parallelism of the VS is limited by the number of
polygons that fit in a TBIMR batch at the completion of each tile
walk, which isn't a lot in relation to the total EU count of a DG2),
and causes the bottleneck to be the rate at which the tile sequencer
performs additional tile passes, each one processing a small number
(<1024 polygons) of the hundreds of thousands of triangles of the
draw call.
Enabling this heuristic seems effective at avoiding that scenario in
SoTR among other titles (e.g. Total War Warhammer 3), but it's a bit
of a compromise since one could imagine cases where TBIMR is helpful
even if the geometry doesn't pass the box check, so a better heuristic
or a driconf rule may be useful in the future.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This programs a TBIMR batch size equal to 128 polygons per slice in
order to match the hardware spec recommendation (BSpec 68436). This
has been confirmed to improve performance slightly relative to the
hardware default batch size of 256 polygons.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This enables a couple of TBIMR performance tunables in
CHICKEN_RASTER_2 that default to disabled. TBIMR fast clip appears to
help slightly with some geometry-bound workloads. TBIMR open batch
allows the rasterizer to start working immediately on the first tile
of the framebuffer, even before the batch has been closed, which helps
reduce the latency cost of the tile walk.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This sets up the basic parameters needed for tiled rendering based on
a back-of-the-envelope estimate of the amount of memory used by the
pixel pipeline during the tile pass. The actual cache footprint of a
tile can vary wildly based on runtime factors which aren't easily
predictable based on static analysis, so this is only intended to
provide a rough approximation within the right order of magnitude.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This may help debugging performance problems in the possible case that
TBIMR negatively impacts the performance of some application. It could
also allow applying application-specific band-aid fixes in the XML file
until a more general workaround is implemented.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This implements a minimalistic algorithm that can be used to obtain an
approximate solution for the integer programming problem of finding
the optimal tile dimensions based on an estimate of the tile cache
consumption per pixel of the current graphics pipeline -- Including
the TC footprint of render targets, depth and stencil buffers and
their auxiliary surfaces. Considering other (less local) memory
accesses performed by the pipeline (like texturing and shader storage)
would be useful (and could be considered by this algorithm with little
modification), but it would be pretty difficult to estimate the L3
cache consumption per pixel of such accesses based on static analysis
of the pipeline state alone without some sort of dynamic feedback.
The present algorithm returns a config with tile area large enough to
utilize a target fraction of the L3, which can be adjusted to obtain
greater/lower utilization of the L3 at the cost of higher/lower risk
of L3 cache thrashing respectively. The aspect ratio of the tile
layout returned attempts to minimize the number of poorly utilized
tiles around the boundaries of the framebuffer (due to partial
coverage), since having the tile sequencer process additional tiles
comes at a cost due to the latency of the additional passes, even if
they're mostly empty. Finally, among the solutions with satisfactory
cache footprint and tile count, the tile aspect ratio closest to 1 is
returned where possible, since tiles with very high aspect ratios can
have a negative impact on cache locality.
The algorithm is primarily intended for TBIMR, but it could be used
for PTBR as well with little modifications, since the TBIMR-specific
assumptions are few and noted in comments below.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
Gfx12.5+ devices use special-purpose memory for the URB instead of
requiring a portion of the L3 to be carved out.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
L3 configurations with an ALL partition of 128 ways per bank or more
cannot be represented with the normal L3ALLOC partitioning mechanism
since the "All L3 client pool" field would overflow, instead the
L3FullWayAllocationEnable bit has to be set, which causes the whole L3
to be used in a unified cache configuration.
That's precisely the configuration we're currently using on recent
platforms, but previously we were relying on the L3 config tables
being empty and the selected L3 configuration being a NULL pointer to
detect this condition. This is about change, the L3 configuration
structure will be defined for gfx12.5+ platforms since they provide
useful information about the cache hierarchy to the drivers. Instead
of checking whether the pointer is NULL in order to apply a unified L3
cache configuration, use it when there is a single ALL partition
larger than can be represented via L3ALLOC.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
Doing something like this is allowed :
vkCreateGraphicsPipeline(.., scissorState, &pipeline);
vkCmdBindPipeline(pipeline);
vkCmdSetScissor(...)
vkCmdBindPipeline(pipeline)
If we don't reapply the pipeline dynamic state, the command buffer
runtime state will keep the dynamic state set in between the 2 binds.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25915>
Use the hawkmoth c:auto* directives to incorporate isl documentation.
Convert @param style parameter descriptions to rst info field lists.
Add static stubs for generated headers. Fix a lot of references, in
particular the symbols are now in the Sphinx C domain, not C++
domain. Tweak syntax here and there.
Based on the earlier work by Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
Prepare for using Hawkmoth.
Hawkmoth does not support trailing comments using /**< ... */
syntax. Replace with regular documentation comments.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
This way we can implemented workarounds depending on the pipeline.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25671>
We can end up in situation where we are dispatched with a multisample
framebuffer but not at per-sample. In this case we would request the
at_sample value with the wrong message configuration.
Relying on the BRW_WM_MSAA_FLAG_MULTISAMPLE_FBO flag superseeds
BRW_WM_MSAA_FLAG_PERSAMPLE_DISPATCH.
Fixes piglit tests :
spec@arb_gpu_shader5@arb_gpu_shader5-interpolateatsample*
With Zink on Anv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25854>
We sometimes fail initialization.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 09d12e6727 ("anv: Add support for I915_ENGINE_CLASS_COMPUTE in init_device_state()")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25891>
Commit 7db1b94e07 added a fix for ADL-N but this issue has been
reproduced also on RPL-S and is likely common with all gfx12 variants
with a small EU count.
cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25861>
gen9 does not handle denorms in void extent blocks correctly. We need
to flush them to zero.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25800>
We will reuse astc emu for gen9 astc workaround. This commit contains
minor cleanups and has no functional change.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25800>