No Foz-DB changes, but this fixes some issues when the spiller inserts
scratch loads after p_logical_end for p_return.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27119>
On chordal graphs, a greedy coloring can be done in a way that never uses
more colors than are required for the largest clique. However, since we
have vector values and force phi resources into the same spill slots, the
interference graphs are not chordal, and thus, this assumption doesn't hold.
Use twice as many spill slots as upper bound.
Totals from 10 (0.01% of 79242) affected shaders: (GFX11)
MaxWaves: 52 -> 54 (+3.85%)
Instrs: 271386 -> 271779 (+0.14%)
CodeSize: 1362544 -> 1365432 (+0.21%)
VGPRs: 2536 -> 2532 (-0.16%)
SpillVGPRs: 778 -> 818 (+5.14%)
Scratch: 73472 -> 76800 (+4.53%)
Latency: 3331718 -> 3328798 (-0.09%); split: -0.14%, +0.05%
InvThroughput: 1665860 -> 1643350 (-1.35%); split: -1.40%, +0.05%
VClause: 3292 -> 3329 (+1.12%); split: -0.06%, +1.18%
Copies: 46082 -> 46257 (+0.38%)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27011>
We can combine the previously introduced mechanisms to implement
line-smoothing, combined with some bespoke late RSD-patching to force
the correct multisampling and rasterization states.
The reason we need to do this patching late, is that we need to know the
primitive-type before we can determine this state, and we don't know
this until draw-time. Luckily, we have a convenient spot to do this.
This bespoke patching only happens for gen7 and earlier. Luckily, the
bits here remain the same for all of these gens. On later gens, we can
just emit this normally, without any bespoke patching.
We also need to use a fresh batch when changing the effective smoothing
state, because that's framebuffer-state. This applies to all gens.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10281
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
How we're updating active_prim is currently a bit convoluted, because we
update it from two different places: First, we update it right before
updating the shader-variant in the case of point-sprite changes, and
then later on we update it unconditionally.
But we're about to need to change this logic, because we also need to
deal with line-smooth lowering. So let's clean this up a tad first, by
moving both updates to the same function, and renaming it a bit. This
let's us cleanly update it in one place, and then react to the change.
While we're at it, let's replace the bitwise xor with a logical
not-equals, which makes this all logical operations.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
Unfortunately, the lowering-pass has gotten a condition added that we
don't really want. So we need to lower away the
load_poly_line_smooth_enabled-intrinsic to always return true to get rid
of it again.
It's not great, but line-smoothing isn't the most important feature to
have max performance on, so... meh?
An alternative here could be to make the condition in
nir_lower_poly_line_smooth optional, but that seems kinda hairy
code-wise. Dunno.
We also need to run nir_lower_alu in order to lower away bit_count on
midgard.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
We're soon going to have to check for other primitive-types, like lines.
So let's pass the reduced primitive type here instead.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
This implements a D3D11-style forcing of the sample-counts, which will
be used by the line-smoothing code we're about to add.
Even though we don't actually use the single-sample mode, I left it in
for completeness, as it's documented in the TRM.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
There are quite a lot of PowerVR GPUs out there and maintaining the device info
for all of them in the same file will become unwieldy.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27144>
freeing the nir shader should free the xfb info too. found with valgrind
leakcheck.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27093>
Only mark the unused channels according to the writemask for now, but
this can go as well when we translate directly form NIR. Right now we
still need to do this separatelly as TGSI has no concept of unused
swizzle.
The old DCE pass was known to be buggy, so this also fixes at least
piglit glsl-vs-copy-propagation-1.
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9279
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26848>
load_ubo_vec4 has always 4 components, however when translating to TGSI
just set the writemask according to the channels that are actually used
later. This is now done by deadcode analysis, but that one is going away
soon.
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26848>
When a device lost occured, we were not ensuring that
vkGetQueryPoolResults with VK_QUERY_RESULT_WAIT_BIT was returning.
We could have the caller stuck in our while(...) loop polling the query
atomics forever which violates the finite-time guarantees of this function.
This solves that by adding a timeout to this call to be double the
default TDR timeout for the potential queues relating to this query.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27091>
We don't use this anymore, it is all dead code.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27091>
Following discussion on kernel mailing list[1], we are not gaining
anything from this right now, and it does not handle soft recovery.
We will hear about the context loss and rationale when we vkQueueSubmit
next.
We can come back to this if there is ever a Vulkan extension for
figuring out innocent vs guilty like GL_EXT_robustness.
This does mean however that we return VK_SUCCESS for cancelled semaphore
and fence waits, but this is legal per the Vulkan spec:
"Commands that wait indefinitely for device execution (namely
vkDeviceWaitIdle, vkQueueWaitIdle, vkWaitForFences with a maximum
timeout, and vkGetQueryPoolResults with the VK_QUERY_RESULT_WAIT_BIT
bit set in flags) must return in finite time even in the case of a lost
device, and return either VK_SUCCESS or VK_ERROR_DEVICE_LOST."
"If device loss occurs (see Lost Device) before the timeout has expired,
vkWaitSemaphores must return in finite time with either VK_SUCCESS or
VK_ERROR_DEVICE_LOST."
[1]: https://lists.freedesktop.org/archives/amd-gfx/2024-January/103337.html
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27091>
Otherwise we rely on check_status alone, and that's going away.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27091>
This can be sent in the event of a soft/hard recovery.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27091>
Xe KMD reports SMEM size as half of RAM while i915 returns the whole
RAM size, so to keep it consistent here adjusting the values
returned by i915 KMD.
The free i915 SMEM also needs to be ajusted but as this is needed by
both KMDs because KMD uAPIs only reports free memory for applications
running elevated privileges, so this was moved to
intel_device_info_ajust_memory() to be shared by both KMD backends.
sram.mappable.size asserts had to be removed from i915 code paths
because of this adjustment.
anv_compute_sys_heap_size() was dropped in ANV and reduce in HASVK
because adjustments are now done in intel/dev level.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26567>
This function should only be used when i915 versions that don't have
DRM_I915_QUERY_MEMORY_REGIONS.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26567>
This improves vkpeak perf by 40-50% on the Ada GPU in my laptop. There
are probably more optimal things we can be doing (Intel's pass is pretty
clever) but this is already an improvement.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27157>
When building the CFG the instructions are taken of the list in
fs_visitor and added to the lists inside each block. The single
"exec_node" in the instruction is used for those memberships.
In the case the pass rebuilt the CFG, it had no instructions, so
calculate_cfg() had nothing to work with. For now fix the bug by
pulling all the instructions back to the original list.
We can do better here, but punting until upcoming work on
CFG itself.
Issue found in an unpublished CTS test. Small reproduction in our
unit tests now enabled.
Fixes: 65237f8bbc ("intel/fs: Don't add MOV instructions to DO blocks in combine constants")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27131>
Add a simple test to kick off the infrastructure. And also a test
(for now disabled) that fails because the code is returning an
empty shader. Next patch will fix and enable it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27131>
Following discussion on kernel mailing list[1], we are not gaining
anything from using this to figure out if we reset, and it does not
handle soft recovery.
We will hear about the context loss and rationale when we submit.
Instead, only use this for figuring out if the reset we already knew
about was completed.
[1]: https://lists.freedesktop.org/archives/amd-gfx/2024-January/103337.html
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27097>