Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4193>
(cherry picked from commit 5193688e1a)
Starting with Gen12, we can fast-clear a lot more surface formats and we
are suddenly in the position of having to fast-clear surfaces with
formats with an implicit swizzle such as VK_FORMAT_R4G4B4A4_UNORM_PACK16
which is represented as ISL_FORMAT_A4B4G4R4 with a BGRA swizzle. In
order for blorp to do the fast-clear color conversion for us, it needs
a properly swizzled color.
This fixes the following Vulkan CTS groups on TGL:
- dEQP-VK.pipeline.blend.format.b4g4r4a4_unorm_pack16.*
- dEQP-VK.api.image_clearing.core.clear_color_image.*.b4g4r4a4*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4218>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4218>
(cherry picked from commit 46187bb54f)
This function has code like:
if (0x7FD <= zExp) {
if ((0x7FD < zExp) ||
((zExp == 0x7FD) &&
(0x001FFFFFu == zFrac0 && 0xFFFFFFFFu == zFrac1) &&
increment)) {
...
return ...;
}
if (zExp < 0) {
I saw that, and I thought, "Uh... what? Dead code?" I thought it was a
bit fishy, so I grabbed the Berkeley SoftFloat Library 3e code, and
there is similar code in softfloat_roundPackToF64
(source/s_roundPackToF64.c), but it has an extra (uint16_t) cast in the
first comparison. This is basicially a shortcut for
if (zExp < 0 || zExp >= 0x7FD) {
So, having the nesting kind of makes sense. On a CPU, nesting the flow
control can be an optimization. On a GPU, it's just fail. Split the
block so that we don't need the uint16_t cast magic.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 683638 -> 658127 (-3.73%)
instructions in affected programs: 666839 -> 641328 (-3.83%)
helped: 92
HURT: 0
helped stats (abs) min: 26 max: 2456 x̄: 277.29 x̃: 144
helped stats (rel) min: 3.21% max: 4.22% x̄: 3.79% x̃: 3.90%
95% mean confidence interval for instructions value: -345.84 -208.75
95% mean confidence interval for instructions %-change: -3.86% -3.73%
Instructions are helped.
total cycles in shared programs: 5458858 -> 5344600 (-2.09%)
cycles in affected programs: 5360114 -> 5245856 (-2.13%)
helped: 92
HURT: 0
helped stats (abs) min: 126 max: 10300 x̄: 1241.93 x̃: 655
helped stats (rel) min: 1.71% max: 2.37% x̄: 2.12% x̃: 2.17%
95% mean confidence interval for cycles value: -1539.93 -943.94
95% mean confidence interval for cycles %-change: -2.16% -2.08%
Cycles are helped.
Fixes: f111d72596 ("glsl: Add "built-in" functions to do add(fp64, fp64)")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
(cherry picked from commit bf2eb3e0ee)
fsat is defined as min(max(a, 0.0), 1.0), and IEEE defines both min and
max to return the non-NaN value when one value is NaN. Based on this,
fsat should definitely return 0.0 for NaN.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 841666 -> 841647 (<.01%)
instructions in affected programs: 122033 -> 122014 (-0.02%)
helped: 7
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.71 x̃: 3
helped stats (rel) min: 0.01% max: 0.02% x̄: 0.02% x̃: 0.01%
95% mean confidence interval for instructions value: -3.74 -1.69
95% mean confidence interval for instructions %-change: -0.02% -0.01%
Instructions are helped.
total cycles in shared programs: 6927246 -> 6926904 (<.01%)
cycles in affected programs: 1038987 -> 1038645 (-0.03%)
helped: 7
HURT: 0
helped stats (abs) min: 18 max: 72 x̄: 48.86 x̃: 54
helped stats (rel) min: 0.03% max: 0.05% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -67.38 -30.33
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: a42163cbbc ("compiler: Add lowering support for 64-bit saturate operations to software")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
(cherry picked from commit 7673dcbd21)
When trying to build mesa/master under AOSP, I've run into the
following error:
external/mesa3d/src/gallium/auxiliary/hud/hud_context.c:1821:31: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
struct sigaction action = {{0}};
^~~
1 error generated.
This patch addresses this by switching to using memset instead of
using an initializer.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4141>
(cherry picked from commit be22995ecf)
Signed-off-by: John Stultz <john.stultz@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4239>
If compute shaders require a specific subgroup size (ie. Wave32),
we have to return the correct one.
Fixes dEQP-VK.subgroups.size_control.compute.required_subgroup_size_*.
Fixes: fb07fd4e6c ("radv: implement VK_EXT_subgroup_size_control")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4230>
v2: Do end-of-pipe sync after clear depth stencil too (Jason).
v3: Also do end-of-pipe sync before clear depth stencil too (Jason).
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
(cherry picked from commit 3ca3050de5)
The shader module name is used to compute the pipeline key. The
driver used to load the wrong pipelines because the shader names
were similar.
This should fix random failures of
dEQP-VK.pipeline.depth_range_unrestricted.*
Fixes: f11ea22666 ("radv: fix a performance regression with graphics depth/stencil clears")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4216>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4216>
(cherry picked from commit 94e37859a9)
piglit/bin/arb_bindless_texture-limit -auto -fbo:
Needed to deal with non-NULL dynamic_index without deref in tex instructions.
piglit/bin/shader_runner tests/spec/arb_bindless_texture/execution/images/multiple-resident-images-reading.shader_test -auto:
Need to deal with non-deref images in enter_waterfall_imae.
Fixes: b83c9aca4a "amd/llvm: Fix divergent descriptor indexing. (v3)"
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4191>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4191>
(cherry picked from commit 8e4e2cedcf)
In Fedora 32 build was failing with meson-0.53.2-1.git88e40c7.fc32
and gcc-10.0.1-0.9.fc32.x86_64.
Worked with meson-0.53.1-1 and same gcc.
/usr/bin/ld: src/gallium/state_trackers/dri/libdri.a(dri2.c.o): in function `dri2_interop_export_object':
/home/airlied/devel/mesa/mesa/build/../src/gallium/state_trackers/dri/dri2.c:1813: undefined reference to `st_finalize_texture'
/usr/bin/ld: src/gallium/state_trackers/dri/libdri.a(dri_screen.c.o): in function `dri_init_screen_helper':
/home/airlied/devel/mesa/mesa/build/../src/gallium/state_trackers/dri/dri_screen.c:580: undefined reference to `st_gl_api_create'
Moving this around seems to fix it.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4220>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4220>
(cherry picked from commit 040ce9a1b3)
Instead of uint64_t. Fixes potentially writing beyond the end of the
handles pointer array on 32-bit architectures (and copying all 0s
instead of the computed pointer values to the array on big endian
ones).
Corresponding compiler warning:
../src/gallium/drivers/llvmpipe/lp_state_cs.c: In function ‘llvmpipe_set_global_binding’:
../src/gallium/drivers/llvmpipe/lp_state_cs.c:1312:12: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1312 | va = (uint64_t)((char *)lp_res->data + offset);
| ^
Fixes: 264663d55d "gallivm/llvmpipe: add support for global
operations."
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4166>
(cherry picked from commit 106bf59ca9)
All the stubs in src/compiler/glsl/glcpp/pp_standalone_scaffolding.c
are duplicate symbols. They should only be used as replacement for
Mesa functions when building glcpp and glsl standalone compilers, but
in fact they are getting linked with Mesa.
This change fixes this by moving the standalone stubs to a
libglcpp_standalone target, that's only linked with the glcpp/glsl
tools.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4186>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4186>
(cherry picked from commit f6dad10d04)
There are multiple LLVM passes that very much move the
intrinsic using the descriptor outside of the loop, defeating
the entire point of creating the loop.
Defeat the optimizer by splitting the break into a separate
if-statement and putting an optimization barrier on the bool
in between.
v2: Move from a callback based system to begin/end loop.
This does not make it significantly less intrusive but
is a bit nicer with all the extra struct and callback
stubs.
v3: Deal with non-divergent values in divergent path.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2160
Fixes: 028ce52739 "radv: Add non-uniform indexing lowering."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit b83c9aca4a)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4171>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4171>
With the live shader cache, equivalent shaders can be backed by the same
CSO. This breaks the logic that identifies whether the shader being deleted
is bound.
For example, having shaders A and B, you can bind shader A and delete
shader B. Deleting shader B will unbind shader A if they are equivalent.
Pierre-Eric figured out the root cause for this issue.
Fixes: 0db74f479b - radeonsi: use the live shader cache
Closes: #2596
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4078>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4078>
(cherry picked from commit 2dc300421d)
Some hardware doesn't support subgroup shuffle, and on such hardware
it makes no sense to lower quad broadcasts to shuffle. Instead, let's
lower them to four const quad broadcasts, paired with bcsel instructions.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4147>
(cherry picked from commit ec16535b49)
The generation of a6xx-pack.xml.h was missing in the android build scripts
leading to a build failure.
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
(cherry picked from commit fad9924315)
Signed-off-by: John Stultz <john.stultz@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4151>
The freedreno gen_header.py script now only works under python3.
It contains a "print()" call which prints a blank line under python3
but prints "()" under python2.7.
However the Android build currently uses python2.
This leads to incorrect code generation and a later build error.
.../STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/adreno_common.xml.h:163:2: error: expected identifier or '('
()
Fix this by adding MESA_PYTHON3 and using it for the freedreno scripts.
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
(cherry picked from commit cad400a59e)
Signed-off-by: John Stultz <john.stultz@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4151>