So far we only write a maximum of 4 dwords further into the batch and
it seems just going over the CS prefetch was enough.
Turns out writing more dwords can delay the writes and we start
prefetching stuff that hasn't landed in memory yet.
This fixes the issue by stalling the CS to ensure the writes have
landed before we go over the prefetch.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 796fccce63 ("intel/mi-builder: add framework for self modifying batches")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8525>
(cherry picked from commit d8154c4006)
It is using the source level instead of the destiny level (base_level)
to compute the dest offset.
This fixes `framebuffer-blit-levels draw rgba -auto -fbo` piglit test.
Fixes: 976ea90bdc ("v3d: Add support for using the TFU to do some blits.")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8491>
(cherry picked from commit 08b16cfe0b)
If we don't tag compute sgpr as dirty they will point to the
ol buffer location.
This fixes arb_compute_shader-dlist with mcbp enabled.
Fixes: 85a6bcca61 ("radeonsi: pass at most 3 images and/or shader buffers via user SGPRs for compute")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8433>
(cherry picked from commit 17f8e56c96)
We sometimes use anv_layout_to_aux_state() to compute the aux state of
an image during the resolve operations at the end of a render
(sub)pass.
If we're dealing with a multisampled image that is created without a
transfer usage, our internal code might trigger a resolve using the
transfer layout (see genX_cmd_buffer.c:cmd_buffer_end_subpass), for
which the image doesn't the usage bit. The current code tries to AND
the 2 usages which won't have any bit in common, thus skipping all
checks below.
v2: Add the transfer usages depending on attachment usage (Lionel)
v3: Limit to samples > 1 (Jason) && DEPTH_STENCIL_ATTACHMENT_BIT (Lionel)
v4: Add transfer usage at image creation (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 54b525caf0 ("anv: Rework anv_layout_to_aux_state")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4037
Reviewed-by: Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8307>
(cherry picked from commit d4b4d69d4d)
Add a check to vaDeriveImage to see if a non-interlaced buffer was
created successfully. Otherwise, return an error, since we won't be able
to derive an image from the interlaced buffer.
Prevents a null pointer dereference from occuring on some nVidia cards,
reported by Alexander Kapshuk.
v2: Check for PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE support (Ilia)
Fixes: fcb558321e ("frontends/va: Derive image from interlaced buffers")
Signed-off-by: Thong Thai <thong.thai@amd.com>
Tested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8320>
(cherry picked from commit 4b208cc503)
vkGetInstanceProcAddr(instance, "vkGetInstanceProcAddr") should return our
vkGetInstanceProcAddr not the next in the chain.
CC: mesa-stable
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8285>
(cherry picked from commit fff77e4b43)
The Android Vulkan loader needs this symbol, so the addition of the
linker script broke Vulkan for Android.
(For non-Android builds: I checked that having a non-existent symbol in
the linker script works ok and doesn't put the symbol in the library)
Fixes: 41bb6459d3 ("radv: restrict exported symbols with static llvm")
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8437>
(cherry picked from commit 4956f6d0bf)
The driver interface doesn't take ownership of the TGSI tokens, so free
our temporary.
Fixes: 57effa342b ("st/mesa: Drop the TGSI paths for PBOs and use nir-to-tgsi if needed.")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8422>
(cherry picked from commit 4ddcd9cf16)
This can be used to work around a common class of bugs appearing as
flickering.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8104>
(cherry picked from commit f17de6a803)
vkGetInstanceProcAddr(instance, "vkGetInstanceProcAddr") should return our
vkGetInstanceProcAddr not the next in the chain.
CC: mesa-stable
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8286>
(cherry picked from commit 67de6356f8)
If we change the virtual address we also have to change the offset in the buffer
to be mapped.
Fixes: 715df30a4e "radv/amdgpu: Add winsys implementation of virtual buffers."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7953>
(cherry picked from commit d3286bdd76)
Found a case where we mapped a range too many.
Per the comment the constraint is:
/* [first, last] is exactly the range of ranges that either overlap the
* new parent, or are adjacent to it. This corresponds to the bind ranges
* that may change.
*/
So that means that after the ++last we the ranges[last] should still
be adjacent. So we need to test the post-increment value to see whether
it is adjacent.
Failure case:
ranges:
0: 0 - ffff
1: 10000 - 1ffff
2: 20000 - 2ffff
3: 30000 - 3ffff
new range: 10000 - 1ffff
wrong first, last: 0,3
However range 3 clearly isn't adjacent at all.
Fixes: 715df30a4e "radv/amdgpu: Add winsys implementation of virtual buffers."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7953>
(cherry picked from commit 2b12e6931e)
NetBSD's variant has a different prototype from the Linux version
the code expects. It might make sense to add support for NetBSD's
version, however, since NetBSD defaults to not allowing non-root
users to set processor affinity, there would be little gain here.
This is a build fix for NetBSD.
Signed-off-by: Nia Alarie <nia@NetBSD.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
CC: 20.3 <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7947>
(cherry picked from commit 275079e3ad)
Compute the number of components of the destination vector from the
bitsize when eg. a 16-bit vec2 vertex fetches is splitted. This is
because the dst will be a v1, so the p_create_vector should be created
from two v2b fro both sizes to match.
This prevents a regression from the next change which will split
typed vertex buffer loads on GFX6 and GFX10+.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8363>
(cherry picked from commit 68c2537062)
Not sure why I thought this was correct, but we should consider them for
optimization purposes.
Fixes: ce9205c03b ('nir: add a load/store vectorization pass')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4202>
(cherry picked from commit f4eb833a12)
No shader-db or fossil-db changes on any Intel platform.
v2: Add a coding line to fix SCons build problems caused by the ±
character.
Fixes: 25bfba3335 ("nir/algebraic: Recognize open-coded copysign(1.0, a)")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
(cherry picked from commit 9771af5dde)
Conflicts:
src/compiler/nir/nir_opt_algebraic.py
I originally noticed that 3b30814791 ("nir/algebraic: Optimize 1-bit
Booleans") caused this pattern no longer be matched by incorrectly
replacing b@32 with b@1. Making that correct had no effect on
shader-db. When this pattern originally was added, it only affected 4
shaders, so it's not worth the effort to debug further.
This reverts commit f50400cc80.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
(cherry picked from commit 314a40c902)
We already loop n times here, no point in doing n instances as well.
Fixes: e8a40715a8 ("gallium/util: add blitter-support for stencil-fallback")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8301>
(cherry picked from commit 96ceca33c1)
OpenGL GLSL, OpenGL ARB assembly shaders, and DX9 are pretty loose about
the behavior in the presence of NaNs. Many GPUs that implement these
specifications do not even have a representation of NaN. However,
OpenCL and Vulkan SPIR-V are not so lax. Both actually have some
required behavior in the presence of NaN, and, of the two, OpenCL is the
most strict.
For years we have implemented SPIR-V by using the same comparison
opcodes as we use for OpenGL GLSL and OpenGL assembly shaders. This has
repeatedly caused problems where an optimization that is valid in the
NaN-relaxed world is not valid in Vulkan or OpenCL. To fix this, set
the "exact" flag on comparisons instructions generated from SPIR-V.
This will block optimizations that may have different NaN behavior.
v2: Set the exact flag in the nir_builder, not in the vtn_builder.
v3: Add an assertion in vtn_handle_constant that the exact flag wasn't
set (because it's ignored). Rebase on 80163bbec3 ("nir/vtn: Support
OpOrdered and OpUnordered opcodes"). Mark the NIR generated for those
opcodes as exact as well.
v4: s/unused_exact/exact/ in a couple places, and assert that exact has
the expected value (true in one place, false in the other). Suggested
by Caio.
Closes: #3345
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Fixes: 8513b12590 ("nir/opt_if: split ALU from Phi more aggressively")
This commit doesn't really fix anything in 8513b12590. However,
without 8513b12590, a regression is triggered in RADV on No Man's
Sky. I want to ensure that this change is only applied on top of
8513b12590, and Fixes: seems the safest way to do that.
No shader-db changes on any Intel platform. This only affects SPIR-V,
and we have no OpenGL SPIR-V shaders in shader-db.
124 shaders in Shadow of the Tomb Raider (Steam "native") were hurt by 1
spill and 1 fill each.
All Intel platforms had similar results. (Tiger Lake shown)
Instructions in all programs: 155668276 -> 155685764 (+0.0%)
SENDs in all programs: 6474570 -> 6474570 (+0.0%)
Loops in all programs: 35271 -> 35271 (+0.0%)
Cycles in all programs: 3198055373 -> 3198628031 (+0.0%)
Spills in all programs: 231522 -> 231646 (+0.1%)
Fills in all programs: 347571 -> 347695 (+0.0%)
Vega
Totals:
SGPRs: 20955712 -> 20956756 (+0.00%); split: -0.02%, +0.03%
VGPRs: 13476920 -> 13473132 (-0.03%); split: -0.07%, +0.04%
CodeSize: 613371940 -> 613339348 (-0.01%); split: -0.06%, +0.05%
MaxWaves: 3111886 -> 3112481 (+0.02%); split: +0.02%, -0.00%
Instrs: 120723785 -> 120746991 (+0.02%); split: -0.04%, +0.06%
Cycles: 626658992 -> 626862708 (+0.03%); split: -0.05%, +0.08%
VMEM: 216330854 -> 216343196 (+0.01%); split: +0.04%, -0.04%
SMEM: 32079391 -> 32081972 (+0.01%); split: +0.05%, -0.04%
VClause: 2688784 -> 2688789 (+0.00%); split: -0.03%, +0.03%
SClause: 6554669 -> 6556251 (+0.02%); split: -0.01%, +0.03%
Copies: 5356667 -> 5353283 (-0.06%); split: -0.36%, +0.29%
Branches: 954466 -> 954716 (+0.03%); split: -0.01%, +0.04%
PreSGPRs: 9078300 -> 9081626 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 10972090 -> 10966576 (-0.05%); split: -0.06%, +0.01%
Totals from 48239 (12.08% of 399432) affected shaders:
SGPRs: 2713984 -> 2715028 (+0.04%); split: -0.16%, +0.19%
VGPRs: 1997804 -> 1994016 (-0.19%); split: -0.46%, +0.27%
CodeSize: 172094092 -> 172061500 (-0.02%); split: -0.21%, +0.19%
MaxWaves: 337327 -> 337922 (+0.18%); split: +0.20%, -0.02%
Instrs: 33053657 -> 33076863 (+0.07%); split: -0.15%, +0.22%
Cycles: 254961228 -> 255164944 (+0.08%); split: -0.12%, +0.20%
VMEM: 15165226 -> 15177568 (+0.08%); split: +0.59%, -0.51%
SMEM: 3304938 -> 3307519 (+0.08%); split: +0.49%, -0.41%
VClause: 766225 -> 766230 (+0.00%); split: -0.12%, +0.12%
SClause: 1332645 -> 1334227 (+0.12%); split: -0.04%, +0.16%
Copies: 2040651 -> 2037267 (-0.17%); split: -0.94%, +0.77%
Branches: 743668 -> 743918 (+0.03%); split: -0.01%, +0.05%
PreSGPRs: 1697667 -> 1700993 (+0.20%); split: -0.07%, +0.27%
PreVGPRs: 1718424 -> 1712910 (-0.32%); split: -0.39%, +0.07%
Polaris
Totals:
SGPRs: 21349172 -> 21354376 (+0.02%); split: -0.02%, +0.04%
VGPRs: 13690680 -> 13686920 (-0.03%); split: -0.07%, +0.04%
CodeSize: 613745824 -> 613704988 (-0.01%); split: -0.06%, +0.05%
MaxWaves: 2775012 -> 2775189 (+0.01%); split: +0.01%, -0.00%
Instrs: 120735079 -> 120756209 (+0.02%); split: -0.04%, +0.06%
Cycles: 627906100 -> 628076156 (+0.03%); split: -0.05%, +0.08%
VMEM: 216623065 -> 216641838 (+0.01%); split: +0.04%, -0.04%
SMEM: 32295618 -> 32299338 (+0.01%); split: +0.05%, -0.04%
VClause: 2711025 -> 2711141 (+0.00%); split: -0.03%, +0.04%
SClause: 6545185 -> 6546769 (+0.02%); split: -0.01%, +0.03%
Copies: 5387723 -> 5383249 (-0.08%); split: -0.37%, +0.29%
Branches: 953775 -> 953954 (+0.02%); split: -0.01%, +0.03%
PreSGPRs: 9148814 -> 9153211 (+0.05%); split: -0.01%, +0.06%
PreVGPRs: 11029429 -> 11023915 (-0.05%); split: -0.06%, +0.01%
Totals from 48239 (12.00% of 402052) affected shaders:
SGPRs: 2682056 -> 2687260 (+0.19%); split: -0.16%, +0.35%
VGPRs: 1994436 -> 1990676 (-0.19%); split: -0.46%, +0.27%
CodeSize: 170857060 -> 170816224 (-0.02%); split: -0.21%, +0.19%
MaxWaves: 295429 -> 295606 (+0.06%); split: +0.07%, -0.01%
Instrs: 32808802 -> 32829932 (+0.06%); split: -0.16%, +0.22%
Cycles: 254633252 -> 254803308 (+0.07%); split: -0.13%, +0.20%
VMEM: 14897934 -> 14916707 (+0.13%); split: +0.65%, -0.52%
SMEM: 3289726 -> 3293446 (+0.11%); split: +0.53%, -0.42%
VClause: 775318 -> 775434 (+0.01%); split: -0.11%, +0.13%
SClause: 1304867 -> 1306451 (+0.12%); split: -0.04%, +0.16%
Copies: 2026334 -> 2021860 (-0.22%); split: -0.99%, +0.77%
Branches: 742554 -> 742733 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1690887 -> 1695284 (+0.26%); split: -0.07%, +0.33%
PreVGPRs: 1717709 -> 1712195 (-0.32%); split: -0.40%, +0.07%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
(cherry picked from commit 010e663cc3)
The resulting point-coord origin not only depends on whether
the draw buffer is flipped but also on GL_POINT_SPRITE_COORD_ORIGIN
state. Which makes its transform differ from a transform of wpos.
On freedreno fixes:
gl-3.2-pointsprite-origin
gl-3.2-pointsprite-origin -fbo
Fixes: d934d320 "nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform."
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8200>
(cherry picked from commit 33fd9e5d8a)
In case the stencil is modified, it is also enabled. That was the
behavior of the original code, which was also the correct behavior,
so reinstate the behavior.
Fixes dEQP-GLES2.functional.fragment_ops.depth_stencil.* on STM32MP1 GC400T.
Fixes: b29fe26d43 ("etnaviv: rework ZSA into a derived state")
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8174>
(cherry picked from commit 33a6c01e12)