The recommended settings is just a guidance and not a programming
requirement as per the Bspec.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35904>
It was previously hardcoded since the switch to nvk_mem_arena.
Fixes: 9e52e296f7 ("nvk/heap: Use an nvk_mem_arena")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36059>
It was only added to indirect compute walkers while HSD don't say
anything about this optimization be specific to indirect compute
walkers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36058>
It was only added to indirect compute walkers while HSD don't say
anything about this optimization be specific to indirect compute
walkers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36058>
It was only added to indirect compute walkers while HSD don't say
anything about this optimization be specific to indirect compute
walkers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36058>
Improves average FPS across a set of 63 android and GL-with-zink traces by
1.9% (+/- 0.1%). If we assume that SpaceEngine (most-improved traces by a
significant margin) is just an outlier, it still shows a .4% improvement.
Closes: #12747
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35759>
It's a feature of the compiled shader that affects how it executes, but
it's not present in the binary itself. Needed for debug tooling looking
into the effects of double_threadsize.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35759>
hardware seems to sign extend with a signed comparison, which I guess is
reasonable! so our logic was busted if we had a zero-extend source with a signed
comparison. this broke someone's OpenCL app, and could probably be hit from
GLES/Vulkan too...
on fossil-db, only parallel-rdp affected:
Totals from 312 (0.58% of 53701) affected shaders:
Instrs: 404772 -> 405697 (+0.23%); split: -0.01%, +0.24%
CodeSize: 2863314 -> 2868998 (+0.20%); split: -0.01%, +0.21%
Spills: 40239 -> 40286 (+0.12%); split: -0.02%, +0.14%
Fills: 33763 -> 33810 (+0.14%); split: -0.03%, +0.17%
ALU: 290757 -> 291071 (+0.11%); split: -0.02%, +0.13%
FSCIB: 261844 -> 262652 (+0.31%); split: -0.02%, +0.33%
IC: 230312 -> 230336 (+0.01%); split: -0.01%, +0.02%
GPRs: 24656 -> 24648 (-0.03%); split: -0.05%, +0.02%
Reported-by: RowanG
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
this is more explicit than vec2's and hence has fewer footguns. in particular
it's easier to handle with preambles in a sane way.
modelled on what ir3 does.
there's probably room for more clean up but for now this unblocks what I want to
do.
stats don't seem concerning.
Totals from 692 (1.29% of 53701) affected shaders:
MaxWaves: 441920 -> 442112 (+0.04%)
Instrs: 1588748 -> 1589304 (+0.03%); split: -0.05%, +0.08%
CodeSize: 11487976 -> 11491620 (+0.03%); split: -0.04%, +0.07%
ALU: 1234867 -> 1235407 (+0.04%); split: -0.06%, +0.10%
FSCIB: 1234707 -> 1235249 (+0.04%); split: -0.06%, +0.10%
IC: 380514 -> 380518 (+0.00%)
GPRs: 117292 -> 117332 (+0.03%); split: -0.08%, +0.11%
Preamble instrs: 314064 -> 313948 (-0.04%); split: -0.05%, +0.01%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
it is sometimes useful to turn lowered bindless intrinsics into bound or vice
versa, and it is annoying to do so without this helper, so generalize the
helper.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
Class represents an indexed "ideal" register class, where non-general classes
only allow defs that choose that class in the def_size callback.
nir_opt_preamble will try to assign specialized classes where possible, falling
back to the general class once the special-purpose classes are exhausted.
AGX will use this mechanism to promote bindless texture handles to bound texture
registers where possible, falling back to pushing the handle as a uniform where
not possible. Supporting multiple classes in nir_opt_preamble allows this
multi-level hoisting to work in a single nir_opt_preamble call with proper
global behaviour.
Add this concept to nir_opt_preamble so we can use it in AGX later in this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
That farm is not coming back any time soon, so let's just disable the
jobs to avoid having to keep them working while refactoring; they'll
have to be mostly re-written if/when they're brought back anyway.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36024>
This hasn't been reproducible because RADV and GLSL always lower
non-constant slot and vertex indexing of GS inputs, but we'll stop
lowering it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This reduces GLSL compile times with the gallium noop driver by 0.6%.
This might decrease register usage and do less code reordering because
nir_lower_io_vars_to_temporaries is no longer called for inputs, which
moved most input loads to the top.
radeonsi+ACO shader-db results are noise.
More uniforms are identified as inlinable.
TOTALS FROM ALL SHADERS (58138):
VGPRs: 2152680 -> 2158032 (0.25 %)
Code Size: 71008908 -> 71064812 (0.08 %) bytes
Max Waves: 916943 -> 916924 (-0.00 %)
Inline Uniforms: 6395 -> 6414 (0.30 %)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This must be done before the GLSL compiler stops using
nir_lower_io_vars_to_temporaries for inputs to work around an LLVM bug.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This can be used to move input loads to top after we stop using
nir_lower_io_vars_to_temporaries that does it unconditionally.
It's more flexible than what nir_lower_io_vars_to_temporaries was doing,
and can be extended to handle any instructions.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This is a partial replacement for nir_lower_io_vars_to_temporaries.
It supports all input and output loads. It doesn't handle stores.
The motivation is to improve compile times.
The main differences compared to nir_lower_io_vars_to_temporaries are:
- it only lowers indirect loads to temps and doesn't touch direct loads
which improves compile times and removes the need for nir_lower_vars_to_ssa
afterward because indirect temp access can't be lowered to SSA
- it doesn't move all input loads to the top; it only moves those input
loads to the top whose indirect loads are lowered (which improves
register usage because direct loads are not moved)
- it doesn't have to deal with complexities of variables
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This commit ports instruction latency information found in codegen emitter.
Previously every instruction was delayed by 16 cycles even if it was not
necessary.
PixMark Piano is highly affected by instruction latencies and gets a 2.5x boost,
other benchmarks still get better performance.
The other two missing pieces to get feature parity with codegen are
functional unit resource tracking and instruction dual-issue.
Performance measures on a GT770 (with 0f pstate)
Pixmark piano: 519 -> 14526 pts (has rendering issues in both!)
Furmark: 3247 -> 5786 pts
The talos principle (high settings): 30-33 -> 55-60 FPS
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35821>
From Maxwell onward, GPUs can encode at most 15 cycles of delay,
Kepler instead can encode up to 32 cycles.
This patch makes the maximum encodeable delay architecture-dependent.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35821>
Previous code took inspiration from the SM50 encoder where scheduling
instructions are interleaved every 3 instructions and jumps between
scheduling blocks are not permitted.
in Kepler scheduling instructions are interleaved once every 7
instructions, if we disallow jumps inside scheduling blocks we need
to fill the remaining instructions in the block with NOPs.
This lead to 1-instruction basic block generating 6 unnecessary NOPs.
In the new code basic blocks are tightly packed, only inserting padding
NOPs at the end of the function, reducing the emitted code in complex
CFGs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35821>
First, we handle the case where GetMemoryFdKHR fails. This is unlikely
and, if it's a Mesa driver it probably won't stomp the FD but we should
be extra careful. Then, we can close the dma-buf file immediately after
we call drmIoctl() on it, ensuring we don't leak the dma-buf file
descriptor if drmIoctl() fails. If ImportSemaphoreFdKHR() fails, then
we need to clean up the sync file.
Fixes: d4f8ad27f2 ("zink: handle implicit sync for dmabufs")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36048>
Similar to how support for X11's DRI2 protocol was deprecated in 24.2,
begin deprecating EGL_WL_bind_wayland_display (including
eglBindWaylandDisplayWL et al) by moving it behind a legacy-wayland
build option.
This extension was originally created in a pre-dmabuf world, where we
didn't have a universally-accepted way of exchanging buffers between
client and compositor, or even really the ability to describe formats
and modifiers universally.
Since then, the world has settled on dmabuf with DRM FourCC and
modifiers. We've had the zwp_linux_dmabuf_v1 protocol for 10 years now:
both clients and compositors implement this protocol to handle buffer
sharing. Compositors either use EGL_EXT_image_dma_buf_import or the
Vulkan dmabuf extensions to import these into GPU world.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36026>
Even though XWayland 22i.1 (the version in Debian Bookworm) supports
modifiers, it refuses to use the GBM back-end if wl_drm is not
available. We need XWayland 24.1 in order to get GBM support without
wl_drm.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36026>
This lets us avoid a few command line options. Also, we're to need it
for setting the XWayland path, which isn't available as a command line
option.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36026>
There is a bug in Weston 10 that's causes instability when we don't have
wl_drm which isn't likely to get fixed in a point release. Most of CI
is fine but the final patch in this MR causes AMD raven to kill weston
part-way through runs, destroying the run. Just update weston to
14.0.1.
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36026>
This is required to update Weston. This also requires that we start
building the scanner from source since libwayland 1.24 also requires
libwayland-scanner 1.24 which means there's no point in installing the
libwayland dev packages.
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36026>