__fabs64 doesn't do anything special, and the value is still NaN
regardless of the value of the MSB. In a strict sense, it's possible
that both functions should set the "signal" bit.
lts on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 844558 -> 843005 (-0.18%)
instructions in affected programs: 725975 -> 724422 (-0.21%)
helped: 53
HURT: 4
helped stats (abs) min: 1 max: 313 x̄: 29.87 x̃: 21
helped stats (rel) min: 0.01% max: 0.94% x̄: 0.30% x̃: 0.22%
HURT stats (abs) min: 4 max: 11 x̄: 7.50 x̃: 7
HURT stats (rel) min: 0.03% max: 0.09% x̄: 0.05% x̃: 0.04%
95% mean confidence interval for instructions value: -39.02 -15.47
95% mean confidence interval for instructions %-change: -0.34% -0.21%
Instructions are helped.
total cycles in shared programs: 6962024 -> 6944998 (-0.24%)
cycles in affected programs: 6185470 -> 6168444 (-0.28%)
helped: 59
HURT: 0
helped stats (abs) min: 64 max: 2863 x̄: 288.58 x̃: 208
helped stats (rel) min: 0.11% max: 0.87% x̄: 0.33% x̃: 0.27%
95% mean confidence interval for cycles value: -387.15 -190.00
95% mean confidence interval for cycles %-change: -0.38% -0.28%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Replace all of the bool(qSign) with qSign != 0u. Remove unnecessary
parenthesis from around most of the existing qSign != 0u.
This dramatically simplifies the next commit.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 848109 -> 848106 (<.01%)
instructions in affected programs: 53 -> 50 (-5.66%)
helped: 1
HURT: 0
total cycles in shared programs: 6969145 -> 6969125 (<.01%)
cycles in affected programs: 396 -> 376 (-5.05%)
helped: 1
HURT: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
findMSB returns -1 for an input of zero, so 31 - findMSB(a) is
sufficient on its own.
There's only one user of findMSB in shader-db, and it does not match
this pattern.
TODO: Add a pattern in the backend code generator that emits 31 -
nir_op_ufind_msb(a) as if it were nir_op_uclz. That should save a couple
instructions.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 859509 -> 848109 (-1.33%)
instructions in affected programs: 841058 -> 829658 (-1.36%)
helped: 97
HURT: 0
helped stats (abs) min: 3 max: 1161 x̄: 117.53 x̃: 72
helped stats (rel) min: 0.98% max: 6.74% x̄: 1.70% x̃: 1.35%
95% mean confidence interval for instructions value: -147.21 -87.84
95% mean confidence interval for instructions %-change: -1.94% -1.46%
Instructions are helped.
total cycles in shared programs: 7072275 -> 6969145 (-1.46%)
cycles in affected programs: 6955767 -> 6852637 (-1.48%)
helped: 97
HURT: 0
helped stats (abs) min: 32 max: 10900 x̄: 1063.20 x̃: 560
helped stats (rel) min: 1.18% max: 7.58% x̄: 1.84% x̃: 1.45%
95% mean confidence interval for cycles value: -1339.43 -786.96
95% mean confidence interval for cycles %-change: -2.11% -1.57%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
This doesn't help a lot of shaders, but it helps those few a LOT.
This could also be implemented using bcsel. That version is very
slightly worse because the generated SEL instruction wants to have two
immediate sources, so one of them usually needs an extra MOV instruction
to load.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 929619 -> 928859 (-0.08%)
instructions in affected programs: 1651 -> 891 (-46.03%)
helped: 8
HURT: 0
helped stats (abs) min: 38 max: 152 x̄: 95.00 x̃: 95
helped stats (rel) min: 42.70% max: 86.36% x̄: 49.88% x̃: 44.66%
95% mean confidence interval for instructions value: -132.97 -57.03
95% mean confidence interval for instructions %-change: -62.28% -37.49%
Instructions are helped.
total cycles in shared programs: 7280180 -> 7272912 (-0.10%)
cycles in affected programs: 12960 -> 5692 (-56.08%)
helped: 8
HURT: 0
helped stats (abs) min: 352 max: 1456 x̄: 908.50 x̃: 910
helped stats (rel) min: 52.45% max: 91.19% x̄: 59.24% x̃: 55.15%
95% mean confidence interval for cycles value: -1274.03 -542.97
95% mean confidence interval for cycles %-change: -70.06% -48.41%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
The pattern is added to opt_algebraic because, for example, comparisons
with constant 0.0 will produce (a1 < 0).
Even with a pass that optimized Boolean expressions, I think this would
be very difficult to automatically recognize and optimize.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 933054 -> 929619 (-0.37%)
instructions in affected programs: 784041 -> 780606 (-0.44%)
helped: 59
HURT: 0
helped stats (abs) min: 2 max: 213 x̄: 58.22 x̃: 44
helped stats (rel) min: 0.02% max: 2.51% x̄: 0.72% x̃: 0.46%
95% mean confidence interval for instructions value: -70.80 -45.64
95% mean confidence interval for instructions %-change: -0.92% -0.53%
Instructions are helped.
total cycles in shared programs: 7304712 -> 7280180 (-0.34%)
cycles in affected programs: 7176260 -> 7151728 (-0.34%)
helped: 92
HURT: 0
helped stats (abs) min: 8 max: 1414 x̄: 266.65 x̃: 166
helped stats (rel) min: 0.04% max: 2.34% x̄: 0.43% x̃: 0.22%
95% mean confidence interval for cycles value: -333.05 -200.26
95% mean confidence interval for cycles %-change: -0.54% -0.31%
Cycles are helped.
Regular shader-db changes:
No changes on any Intel platform.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
The shader module name is used to compute the pipeline key. The
driver used to load the wrong pipelines because the shader names
were similar.
This should fix random failures of
dEQP-VK.pipeline.depth_range_unrestricted.*
Fixes: f11ea22666 ("radv: fix a performance regression with graphics depth/stencil clears")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4216>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4216>
Fragment shader can write depth and stencil if we set necessary flags
in RSW. In addition to that we need to use special format for Z24S8.
Original format is apparently Z24X8 since we can't sample stencil in GLES2.
This new format also seems to use several components for storing depth
since we saw r != g != b when sampling with this format.
[vasily: - initialize clear->depth to 0xffffff if we reload depth, just
like blob does. Reloading doesn't work otherwise
- use single bitmap for reload type]
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4197>
Unfortunately we don't know how to reload Z16 buffers yet and blob
is using Z24 for dEQP tests that need depth reload.
Reviewed-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4197>
The util_float_to_half() implementation was much smaller, but when trying
to switch _mesa_float_to_half to it, many testcases
(dEQP-VK.spirv_assembly.instruction.graphics.opquantize.*,
piglit.spec.arb_shading_language_packing.*packhalf2x16) start failing on
Intel. Replace the broken impl so that people don't have to debug it
later.
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3699>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3699>
piglit/bin/arb_bindless_texture-limit -auto -fbo:
Needed to deal with non-NULL dynamic_index without deref in tex instructions.
piglit/bin/shader_runner tests/spec/arb_bindless_texture/execution/images/multiple-resident-images-reading.shader_test -auto:
Need to deal with non-deref images in enter_waterfall_imae.
Fixes: b83c9aca4a "amd/llvm: Fix divergent descriptor indexing. (v3)"
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4191>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4191>
In Fedora 32 build was failing with meson-0.53.2-1.git88e40c7.fc32
and gcc-10.0.1-0.9.fc32.x86_64.
Worked with meson-0.53.1-1 and same gcc.
/usr/bin/ld: src/gallium/state_trackers/dri/libdri.a(dri2.c.o): in function `dri2_interop_export_object':
/home/airlied/devel/mesa/mesa/build/../src/gallium/state_trackers/dri/dri2.c:1813: undefined reference to `st_finalize_texture'
/usr/bin/ld: src/gallium/state_trackers/dri/libdri.a(dri_screen.c.o): in function `dri_init_screen_helper':
/home/airlied/devel/mesa/mesa/build/../src/gallium/state_trackers/dri/dri_screen.c:580: undefined reference to `st_gl_api_create'
Moving this around seems to fix it.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4220>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4220>
This was missing for the READ case. This improves glBegin/End performance.
(vbo maps with WRITE | READ | UNSYCHRONIZED)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4153>
Following on from the db410c conversion to baremetal testing, reuse the
same scripts in the same rack to run 7 db820c boards (#4/8 is failing in
the bootloader for unknown reasons).
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4177>
This should reduce our container rebuild times, particularly on the
40-minute ARM build (which is split across only 2 runners and thus likely
to have a hot cache) when working on updating containers.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4099>
Since zink_render_pass_state is used as a hash-key, the entire struct gets
compared. This means we don't want any uninitialized padding in there, or
else we risk getting false negatives. This has led to issues on macOS builds.
So let's zero out the struct before we start filling it out.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4212>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4212>
If compute shaders require a specific subgroup size (ie. Wave32),
we have to return the correct one.
Fixes dEQP-VK.subgroups.size_control.compute.required_subgroup_size_*.
Fixes: fb07fd4e6c ("radv: implement VK_EXT_subgroup_size_control")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4215>
Instead of uint64_t. Fixes potentially writing beyond the end of the
handles pointer array on 32-bit architectures (and copying all 0s
instead of the computed pointer values to the array on big endian
ones).
Corresponding compiler warning:
../src/gallium/drivers/llvmpipe/lp_state_cs.c: In function ‘llvmpipe_set_global_binding’:
../src/gallium/drivers/llvmpipe/lp_state_cs.c:1312:12: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1312 | va = (uint64_t)((char *)lp_res->data + offset);
| ^
Fixes: 264663d55d "gallivm/llvmpipe: add support for global
operations."
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4166>
The motivation is to allow llvmpipe to be enabled instead in the
meson-i386 job.
v2: (Eric Engestrom)
* Rename meson-main job to meson-gallium
* Remove stale comment above meson-i386 job
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4166>
Instead of using git, use python and the Gitlab API
to fetch traces. This helps us slim down our ramdisks
in preparation for integrating trace replay on LAVA
devices.
Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4000>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4000>
In preparation for using apitrace to replay traces in LAVA jobs, build a
newer waffle so apitrace can use the surfaceless EGL platform.
As things were before this commit, Xvfb would have been needed in the
LAVA images.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4000>