If we have NIR such as:
32x4 %48 = @load_vulkan_descriptor (%47) (desc_type=SSBO)
32x4 %76 = deref_cast (tint_symbol_11 *)%48 (ssbo tint_symbol_11) (ptr_stride=0, align_mul=4, align_offset=0)
32x4 %77 = deref_struct &%76->tint_symbol_10 (ssbo int) // &((tint_symbol_11 *)%48)->tint_symbol_10
A single nir_rematerialize_deref_in_use_blocks() will rematerialize the
deref_struct and then it's deref_cast. However,
nir_foreach_instr_reverse_safe is not safe if the next iteration's
instruction is removed. This can result in the instruction loop exiting
and the load_vulkan_descriptor never having an LCSSA phi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 439e8c42cc ("nir/lcssa: Fix rematerializing derefs")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11770
(cherry picked from commit 65a54b4ec4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
iabs(a) is not positive if "a" is the minimum signed value, so this is
incorrect in that case for some values of "b".
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 2b76de9b5d ("nir/algebraic: Add a couple optimizations for iabs and ishr")
(cherry picked from commit ecd6ae12fb)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
If applications doesn't send any attributes to describe the format,
we would always use driver preferred format (NV12). This is wrong
for any RT format other than the driver preferred (YUV420).
Driver doesn't have a choice here, we must use the matching format.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
(cherry picked from commit c8a893becd)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
The code wants the number of components used by the variable in the
current attribute slot, not the total number of components.
For e.g. a 4x3 matrix, glsl_get_components() returns 12, leading to the
following error reported by AddressSanitizer:
```
Test case 'dEQP-VK.tessellation.shader_input_output.cross_invocation_per_patch_mat4x3'..
../src/compiler/nir/nir_lower_io_to_vector.c:265:16: runtime error: index 4 out of bounds for type 'nir_variable *[4]'
```
Fixes: 5ef2b8f1f2 ("nir: Add a pass for lowering IO back to vector when possible")
(cherry picked from commit ba5c65f10b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
The game aliases two images. It binds a memory object to two different
images, the first one being an image with 4 mips and the second with
only one mip but the bind offset is incorrect. It's like it queried
the first image size with different usage flags, so that DCC was
disabled.
Force disabling DCC for mips fixes the incorrect rendering and doesn't
hurt performance.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10200
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 2f13723c0a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
In two functions implementing resource discard rebind_resource is called
on resource before its track record is reset. This prevents update of
dirty_resource or dirty_shader_resource because of conditions in
needs_dirty_resource. With rsc->track reset and dirty_resource bits
missing further calls to transfer_map will not try to reallocate
resource storage when needed.
A way to reproduce the issue in both functions is by executing at least
3 draws modifying bound texture or VBO each time. This patch fixes those
cases and some related piglit tests on a5xx and should fix it on other
GPUs. Also it fixes rendering in Firefox and vsraytrace (except vertical
line at right edge).
Fixes: 0a62a874fc ("freedreno: Re-work dirty-resource tracking")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10374
Reviewed-by: Rob Clark <robclark@freedesktop.org>
(cherry picked from commit 6d14cad330)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
Many debug flags influence shader codegen but are currently not included
in the hash key. This causes surprising effects as cache lookups may
return shaders compiled with different debug flags than currently in
effect. This patch fixes this by including all debug flags in the
shader hash key.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: c323848b0b ("ir3, tu: Plumb through support for per-shader robustness")
(cherry picked from commit d8c90806e4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
This fixes a corner case of the LNL sub-dword integer restrictions
that wasn't being detected by has_subdword_integer_region_restriction(),
specifically:
> if(Src.Type==Byte && Dst.Type==Byte && Dst.Stride==1 && W!=2) {
> // ...
> if(Src.Stride == 2) && (Src.UniformStride) && (Dst.SubReg%32 == Src.SubReg/2 ) { Allowed }
> // ...
> }
All the other restrictions that require agreement between the SubReg
number of source and destination only affect sources with a stride
greater than a dword, which is why
has_subdword_integer_region_restriction() was returning false except
when "byte_stride(srcs[i]) >= 4" evaluated to true, but as implied by
the pseudocode above, in the particular case of a packed byte
destination, the restriction applies for source strides as narrow as
2B.
The form of the equation that relates the subreg numbers is consistent
with the existing calculations in brw_fs_lower_regioning (see
required_src_byte_offset()), we just need to enable lowering for this
corner case, and change lower_dst_region() to call lower_instruction()
recursively, since some of the cases where we break this restriction
are copy instructions introduced by brw_fs_lower_regioning() itself
trying to lower other instructions with byte destinations.
This fixes some Vulkan CTS test-cases that were hitting these
restrictions with byte data types.
Fixes: 217d412360 ("intel/fs/gfx20+: Implement sub-dword integer regioning restrictions.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
We were accidentally doing a signed integer comparison here for ult32,
or a sign-extending shift for ushr.
One notable bit of fallout was that load_global_uniform_block_intel
address calculations broke on platforms that don't have native 64-bit
integer support, as the iadd64 lowering for "do I need to carry?" was
using ult32...and performing the wrong comparison. We spotted this in
Borderlands 3 on Alchemist once we turned on other optimizations.
Thanks to Lionel Landwerlin for helping spot the problem!
Fixes: c7b312ad45 ("brw: factor out source extraction for rematerialization")
Fixes: 339630ab05 ("brw: enable A64 loads source rematerialization")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 5848035443)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
Due to LLVM ABI reasons the SPIRV-LLVM-Translator always uses pointers to
private memory for struct function parameters. This includes kernel entry
points.
However technically it's also legal to pass those parameters by value
according to the OpenCL SPIR-V Env spec.
One compiler making use of this is e.g. artic based on Thorin.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12149
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
(cherry picked from commit d0560f59ce)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
nir_lower_pack can generate split operations, execute algebraic again
to handle them.
This fix an assert on
"dEQP-VK.spirv_assembly.instruction.compute.opphi.vartype_float16" and
probably others tests.
Fixes: 3904cfabd6 ("bi: Use nir_opt_load_store_vectorize")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
(cherry picked from commit e5d64ca69c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
The BROADCOM_SAND128 modifier is usually used with an extra parameter
to pass in the stride via a side channel. Quoting from drm_fourcc.h:
> The pitch between the start of each column is set to optimally
> switch between SDRAM banks. This is passed as the number of lines
> of column width in the modifier (we can't use the stride value due
> to various core checks that look at it , so you should set the
> stride to width*cpp).
So apparently this is just a workaround for limitations in some kernel
APIs. DRM modifiers, however, are arguably a bad fit for extra
parameters that aren't known in advance. In the Wayland/KMS ecosystem
many components depend on being able to treat modifiers as opaque, e.g.
for negotiations etc. In practice the current approach requires various
software components to manually use the
`DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT()` macro - using the
`DRM_FORMAT_MOD_BROADCOM_SAND128` modifier directly with formats like
`NV12` results in a rejection in the KMS driver and corrupted output
in Mesa (because we'd bail out early in `v3d_sand8_blit()`).
Fortunately the stride check limitations mentioned above don't seem to
apply to Mesa though. Thus we can just add support for the base modifier
and stride (coming from V4L2), allowing various toolkits, Wayland
compositors and V4L2 decoder implementations to support e.g.
`NV12` + `DRM_FORMAT_MOD_BROADCOM_SAND128` (`NC12` in V4L2) in a generic
way.
Notes:
1. Wayland compositors trying to offload composition to KMS will still
fail when doing a test commit.
2. There is another limitation - in the V4L2 MPLANE API - that
requires userspace to know the correct offset of the second plane. That's
a known API limitation though and only affects V4L2 decoder implementations.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
(cherry picked from commit 758941ab0c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
Since the shader parameters are passed as inline data, push constants
are no longer used and so, not actually set on dispatch. But the
nr_params = 4 was still making the shader emit the code to load them,
causing page faults on simulation, and would also on HW if we didn't
always have a scratch page set.
The uses_inline_data parameter will be set from brw_compile_cs(), called
shortly after this point, so we don't need it here.
The subgroup_size is misleading, as we don't actually require that size
and the code that checks for it isn't even running for this shader.
Fixes: 97b17aa0b1 ("brw/nir: rework inline_data_intel to work with compute")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12152
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit d32a26b3e6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
When (off_const > max) there is a wrap around uint when calling
try_extract_const_addition.
Exit early since folding doesn't make sense in this case.
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
(cherry picked from commit b501cbf153)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
All of the spilling code should work with physical register units
because for example SEND messages will expect a physical register as
destination.
So always allocate a full physical register for the spilled/unspilled
values and adjust the offsets of the registers to physical sizes too.
Cc: mesa-stable
Fixes: aa494cba ("brw: align spilling offsets to physical register sizes")
Closes: mesa/mesa#11967
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Found-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit a21cd8c5b6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
This needs to use aligned size, otherwise it will output two
slices when the size is not 16 aligned.
Fixes: 54d499818c ("radv/video: add initial support for encoding with h264.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6a121f1507)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
Introduced in 2013 with prospect of being used in future.
... 11 years later.
Fixes: 4b45b61fef ("util: add avx2 and xop detection to cpu detection code") # 24.3
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
(cherry picked from commit 962b996d4c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
We don't use it for anything.
Cc: mesa-stable # 24.3
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
(cherry picked from commit ca947e1295)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
Currently pointless, Pentium II or Celeron and later has SSE.
Cc: mesa-stable # 24.3
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
(cherry picked from commit a78c2bf2a4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
As per the code comment added in this commit the nir produced from
glsl to nir doesn't always keep function declarations before the
code that calls them e.g. calls from within other function
implementations. The change in this commit works around this problem by
first cloning all function declarations in a first pass, then cloning
the implementations in a second pass once we have filled the remap
table.
Fixes: cbfc225e2b ("glsl: switch to a full nir based linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12115
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 59b2549279)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
KernelCI jobs have priority 44 and are very long-running jobs (and
there might be an issue with the KernelCI that makes it create hundreds
of jobs, @sergi is looking into that).
While bumping to 45+ would be enough to allow Mesa release staging
pipelines to run despite the KernelCI, during the CI meeting with @sergi
and @mupuf it was determined that the Mesa releases are an important
enough operation to warrant being a higher priority than user forks
pipelines, so priority 70 was picked (still under the 75 of Marge
pipelines).
Cc: mesa-stable
(cherry picked from commit 50f9bec3ce)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32119>