Setting the old window coordinate to NaN is more likely to hide the
problem in debug builds because the NaN vertices are dropped later in
the pipeline, either through explicit NaN checks or implicit line
length checks.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36653>
Registers that don't need to be pinned to a channel right from the start
can be pinned when the instrcutions writing to them are scheduled.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36921>
Results of TEX and FETCH are pinned to group automatically when
creating instructions from string. With the new scheduling code
the channel pinning might be added and this needs to be handled
when reading the expectation shaders.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36921>
Missing 32-bit entry point in GLSL
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 2ce20170 ("mesa: Add support for GL_EXT_shader_clock")
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36041>
When using FDM, we need approximately square tiles to maintain
proper density distribution across the framebuffer.
Way to wide or tall tiles would distort the density mapping, causing
areas intended for low density to receive higher density and vice
versa.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37032>
The original lower_phis_to_regs_block() is a little too clever. It
crawls up the predecessor tree until it finds a cross edge and places
the register writes as deep as it can. This breaks nak_nir_lower_cf().
Say you have a shader like...
con %0 = load_uniform()
con loop {
if div {
} else {
}
break;
}
con %1 = phi %0
The original lower_phis_to_regs_block() will turn it into
con %0 = load_uniform()
con %r = decl_reg();
con loop {
if div {
reg_store(%r, %0)
} else {
reg_store(%r, %0)
}
break;
}
con %1 = reg_load(%r)
We then convert it into unstructured control-flow and run regs_to_ssa()
to get our phis back, which lowers each of the registers we inserted to
a phi tree. When we try to recover divergence information on phis by
looking at their sources, this works fine if each source maps directly
to a reg_store() whic maps directly to a phi in the original IR.
However, because the reg_store() instructions are placed deeper, it may
introduce false divergence.
Switch to the simple version of nir_lower_phis_to_regs_block() which
places reg writes directly in phi predecessor blocks. We could probably
be more conservative and just avoid placing writes to uniform regs in
divergent control-flow but it's more robust to make the load/store_reg
intrinsics match the original phis directly.
This fixes some shaders in Horizon: Zero Dawn Remastered
Fixes: b013d54e4f ("nak/lower_cf: Flag phis as convergent when possible")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36914>
Right now it tries to place reg_write instructions as far up the
predecessor chain as possible. This is useful for a bunch of the passes
that call it since it ensures they don't get placed in dead blocks or in
single successors and things like that. But it screws up NAK's control
flow lowering so we need the option to turn it off and make the pass
place the reg_write instructions in the most obvious place possible.
Fixes: b013d54e4f ("nak/lower_cf: Flag phis as convergent when possible")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36914>
In 14b4160792 ("vulkan/wsi: Only test for dma-buf sync file support
once"), I moved the dma-buf sync file import/export check earlier. This
is fine for hardware implementations where we have real dma-buf
import/export but it broke lavapipe because the check itself ignored
whether or not we actually have dma-buf import/export. Add a couple
more checks to wsi_drm_check_dma_buf_sync_file_import_export() so it's
safe even for SW drivers. Also, in wsi_create_sync_for_dma_buf_wait(),
check if we actually have a dma-buf.
Fixes: 14b4160792 ("vulkan/wsi: Only test for dma-buf sync file support once")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37033>
v2: Rebase on ac2b072312 ("brw: Add more specific brw_builder
helpers"), and fix a bug that caused the new instruction to possibly be
put in the wrong place.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233675305 -> 233641585 (-0.01%)
Cycle count: 32593658094 -> 32591467794 (-0.01%); split: -0.01%, +0.00%
Totals from 33513 (4.25% of 789264) affected shaders:
Instrs: 5200332 -> 5166612 (-0.65%)
Cycle count: 1499831128 -> 1497640828 (-0.15%); split: -0.15%, +0.00%
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
This makes lavapipe act like other DRM drivers whenever we have udmabuf
and just make everything a dma-buf even if it doesn't strictly have to
be. Without this we can end up in weird cases if the client asks to
allocate a memory object with multiple export types. Before, if this
happened, we would allocate a memfd and then return that when the client
calls GetMemoryFd() even if they asked for a dma-buf. In theory, we
could add additional plumbing to allow for using the memfd itself for
OPAQUE_FD and only wrap in a udmabuf if DMA_BUF is requested but this is
simpler and more in line with what hardware DRM drivers do.
Fixes: c1657de63c ("lavapipe: support VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13798
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37067>
We need to make sure that earlier gfx commands don't see the committed
resource. Also, the commit doesn't necessarily happen atomically,
because the kernel has to unmap the NULL page and then map the BO page,
so earlier gfx submits could race with the kernel an see an unmapped
page and fault. This is the same race when uncommitting but in reverse.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37065>
If implementation does not actually replay the VA, it must return 0
to not violate:
"If the memory object was allocated with a non-zero value of
opaqueCaptureAddress, the return value must be the same address."
Fixes RenderDoc capture replay, which asserts on the this spec rule
being followed.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: ed6d5c33 ("nvk: Implement VK_EXT/KHR_buffer_device_address")
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Closes#13784
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37047>
When building with libdrm as an internal fallback dependency, i.e.
a submodule, meson does not find `amdgpu.h` from the installed external
dep, failing to build with the following error:
-----------------------------------------------------------------------
In file included from ../src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c:10:
In file included from ../src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h:18:
../src/amd/common/ac_linux_drm.h:14:10: fatal error: 'amdgpu.h' file not found
14 | #include "amdgpu.h"
| ^~~~~~~~~~
1 error generated.
-----------------------------------------------------------------------
Make libvulkan_radeon depend explicitly on dep_libdrm_amdgpu to also use
the include_directories declared for that dependency in case it's an
internal dependency.
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36932>
Don't include amdgpu.h directly in AMDGPU/RADV code, the only libdrm
pieces that are needed are handled in src/amd/common/ac_linux_drm.h
which already includes amdgpu.h
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36932>
This is a dynamic state. This also replaces the stride by a 16-bit
value because it's required to not exceed
VkPhysicalDeviceLimits::maxVertexInputBindingStride which is defined
to 2048.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37025>
It is a bit annoying to have the resulting increased duplication, but
the table magic doesn't play nicely with variant regs.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37009>
Registers declared as a "naked" array, ie:
<array offset="0x1234" name="SOME_REG" stride="1" .../>
were not getting register packers generated for them. Insert an
implicit <reg32> child for them, so the above would be equivalent
to:
<array offset="0x1234" name="SOME_REG" stride="1" .../>
<reg32 offset="0" name="REG"/>
</array>
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37009>
When we have a variant range, we need to test against the inclusive
range. Ie. if we have variants="A6XX-A7XX" we can't just test for
chip == A6XX.
So switch the variant key to preserve the range (ie. "A6XX-A7XX" instead
of just "A6XX"), add a helper to sanitize the range for symbol/#define
names, and handle the three possible cases when generating the if/else
ladder:
- single variant, ie. variants="A6XX", keeps the current logic
- open ended range, ie. variants="A6XX-", generates a >=
- closed ended range, ie. variants="A6XX-A7XX", generates a
(CHIP >= A6XX) && (CHIP <= A7XX)
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37009>