On GFX12, int8 is twice as fast as fp16/bf16.
On GFX11, they have the same throughput, but int8 at least still uses
less registers.
Also reorder 16bit accumulators before 32bit, because they use less
registers on GFX12.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37002>
This has been annoying me for quite some while, the level of indention
makes reviewing code changes in Gitlab harder.
I think now is a good time to change this before more cmat lowering is added.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37002>
uses_fbfetch_output might be updated when a fragment shader is bound.
This would only affect ESO and I'm not sure it's possible though.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37005>
Initialization of driver has been moved to register_data_source() from
OnSetup() by: a739889789 ("pps: Report available counters when
gpu.counters* data source is registered")
With above change, pps will destroy driver when collecting data stops
(pps may keep running) then the driver will become nullptr when user try
to collect data again. This will cause segmentation fault in OnSetup().
So this remove driver = nullptr in OnStop() and init driver in OnSetup()
to make sure driver exists when pps-producer run more than once.
Fixes: a739889789 ("pps: Report available counters when gpu.counters*
data source is registered")
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36548>
The previous algorithm just looked at the dominator's loop header.
However, if you have multiple consecutive loops like:
function_impl {
loop {
// Stuff
}
loop {
// Other stuff
}
}
then it will look like the second loop is contained in the first loop
because the first loop's header dominates the second loop. This isn't
actually what we want. Instead, we want a node N to be considered part
of a loop with header H if H dominates N and H is reachable from N.
Fixes: 741f7067f1 ("nak: Add loop detection to the CFG")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36524>
We already do +8 on HMMA. This gives us a +4 on IMMA which seems to fix
the test failures I'm seeing. This fixes the IMMA CTS tests on my RTX
5090 with NAK_DEBUG=spill.
Fixes: 477533ee00 ("nvk: add sm120 latencies via csv files.")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36524>
KHR_maintenance9 allows for the creation of VkDevices with no queues for the
purpose of offline compilation and other similar usages. Notably, devices with
0 queues aren't allowed to allocate memory, create command buffers, or create
fences/semaphores; this allows us to completely skip creating a nvkmd device
and going through the kernel for these objects.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35804>
We need to be able to create samplers without necessarily having to have to
upload them. This is to allow us later to create a VkDevice without having to
touch the kernel for VkDevice objects without VkQueues used for things like
shader compilation.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35804>
This is to allow us later to create a VkDevice without having to touch the
kernel for VkDevice objects without VkQueues used for things like shader
compilation.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35804>
Previously, these were going through nir_lower_bit_size, which handles
upscaling from lower bit sizes to 32-bit only and is incorrect for higher sizes.
This change uses nir_lower_int64 instead, which handles them properly and
correctly.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35804>
The regressed changes have been reverted for Linux 6.16 in commit 9d85a32f0e6bf6878676a9ec7ce3c8032c500c7c
This should now allow us to use Linux 6.16 for the rest of the testing
Signed-off-by: Ritesh Raj Sarraf <ritesh.sarraf@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36909>
Long term, we don't want to support nouveau gl on new cards. Remove
the fallback so users without zink will get software rendering
instead of nouveau gl.
For now, NOUVEAU_USE_ZINK will still select nouveau gl on cards where
that is possible, but that isn't really supported and will likely be
removed for a lot of cards in the future.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36971>
Saw this when implementing something else, that I could just add the handling here to radv_choose_tiling in order to expose OPTIMAL.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36943>
Also add a #define NAK_QMD_ALIGN_B for alignments. The alignment is
always 256B and there's no evidence of that changing.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Backport-to: 25.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36995>
A single lock on allocate/free is nothing compared to the ioctls we're
already taking. This ensures that we always have all our memory
objects.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36995>
We add the syncobj and exec right before the wait so we know it's been
submitted. WAIT_FOR_SUBMIT does nothing except fail to catch potential
errors.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36995>