Fixes "left shift of negative value -128" with parallel_rdp/00f93a9497dfbb3b
and UBSan.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35255>
Without this cast, the left shift is promoted to 'int'.
Fixes "left shift of 50432 by 16 places cannot be represented in type 'int'"
with horizon_zero_dawn/001064f580f8e3be and UBSan.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35255>
Kepler cards do not support shared atomic operations directly, but they
have special ldslk and stsul that can implement mutex locks on
addresses. Shared atomics can be lowered into operations in mutexes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35028>
We still allow mesh shader promote constant output to flat, but
mesh shader like geometry shader may store multi vertices'
varying in a single thread. So mesh shader may store different
constant values to different vertices in a single thread, we
should not promote this case to flat.
I'm not using shader_info.mesh.ms_cross_invocation_output_access
because OpenGL does not require IO to have explicit location, so
when nir_shader_gather_info is called in OpenGL GLSL compiler to
compute ms_cross_invocation_output_access, some implicit output
has -1 location which causes ms_cross_invocation_output_access
unset for it.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13134
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35081>
This functionality is useful with the software fp64
implementation. It allows running the remaining
tests.
Note: the same tests do not generate this indirect
access on cayman which has the hardware fp64
implementation enabled.
This change was tested on cypress, palm and barts.
Here are the tests fixed:
spec/arb_gpu_shader_fp64/execution/gs-isnan-dvec: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-array-copy: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-dmat4: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-dmat4-row-major: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-double-array-const-index: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-double-array-variable-index: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-double-bool-double: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-double-uniform-array-direct-indirect: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-doubles-float-mixed: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-dvec4-uniform-array-direct-indirect: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-nested-struct: fail pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34926>
These were initially observed in Hogwarts Legacy while working on
something else entirely. Two compute shaders in that app are helped
for spills and fills. On Skylake, one of the shaders benefits from
this change, and the other is hurt pretty significantly.
About 40 vertex shaders in Shadow of the Tomb Raider were helped for
instructions.
v2: Use ~0xff instead of 0xffffff00 to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.
v3: No, really, fix the various errors to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.
No shader-db changes on any Intel platform.
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake)
Totals:
Instrs: 210566294 -> 210561118 (-0.00%)
Cycle count: 31582309052 -> 31576352808 (-0.02%); split: -0.02%, +0.00%
Spill count: 519300 -> 519280 (-0.00%)
Fill count: 625181 -> 625161 (-0.00%)
Scratch Memory Size: 36289536 -> 36281344 (-0.02%)
Max live registers: 66068413 -> 66068161 (-0.00%)
Non SSA regs after NIR: 60230773 -> 60230775 (+0.00%)
Totals from 1662 (0.24% of 707082) affected shaders:
Instrs: 635064 -> 629888 (-0.82%)
Cycle count: 36549632 -> 30593388 (-16.30%); split: -16.43%, +0.14%
Spill count: 246 -> 226 (-8.13%)
Fill count: 280 -> 260 (-7.14%)
Scratch Memory Size: 16384 -> 8192 (-50.00%)
Max live registers: 178491 -> 178239 (-0.14%)
Non SSA regs after NIR: 169552 -> 169554 (+0.00%)
Tiger Lake
Totals:
Instrs: 238544730 -> 238539407 (-0.00%)
Cycle count: 23679446097 -> 23673238578 (-0.03%); split: -0.03%, +0.00%
Max live registers: 42494925 -> 42494799 (-0.00%)
Non SSA regs after NIR: 63639071 -> 63639074 (+0.00%)
Totals from 1662 (0.21% of 802704) affected shaders:
Instrs: 626604 -> 621281 (-0.85%)
Cycle count: 26444363 -> 20236844 (-23.47%); split: -23.50%, +0.02%
Max live registers: 95405 -> 95279 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)
Ice Lake
Totals:
Instrs: 238855310 -> 238826534 (-0.01%)
Cycle count: 24952257277 -> 24944589398 (-0.03%); split: -0.03%, +0.00%
Spill count: 575510 -> 575117 (-0.07%)
Fill count: 713007 -> 708632 (-0.61%)
Max live registers: 42499556 -> 42499432 (-0.00%)
Non SSA regs after NIR: 64388747 -> 64388750 (+0.00%)
Totals from 1662 (0.21% of 805149) affected shaders:
Instrs: 926887 -> 898111 (-3.10%)
Cycle count: 67025583 -> 59357704 (-11.44%); split: -11.45%, +0.01%
Spill count: 5168 -> 4775 (-7.60%)
Fill count: 32883 -> 28508 (-13.30%)
Max live registers: 95614 -> 95490 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)
Skylake
Totals:
Instrs: 161904416 -> 161895239 (-0.01%); split: -0.01%, +0.00%
Cycle count: 20098067714 -> 20090767583 (-0.04%); split: -0.04%, +0.00%
Spill count: 525546 -> 525789 (+0.05%); split: -0.04%, +0.09%
Fill count: 603369 -> 602276 (-0.18%); split: -0.28%, +0.10%
Max live registers: 33895714 -> 33895590 (-0.00%)
Non SSA regs after NIR: 57348729 -> 57348730 (+0.00%)
Totals from 1655 (0.25% of 653734) affected shaders:
Instrs: 769979 -> 760802 (-1.19%); split: -1.83%, +0.64%
Cycle count: 51365416 -> 44065285 (-14.21%); split: -14.22%, +0.01%
Spill count: 4186 -> 4429 (+5.81%); split: -4.90%, +10.70%
Fill count: 16356 -> 15263 (-6.68%); split: -10.50%, +3.82%
Max live registers: 95115 -> 94991 (-0.13%)
Non SSA regs after NIR: 180797 -> 180798 (+0.00%)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>
Some shaders contain back-to-back atomic accesses in SPIR-V with
AcquireRelease semantics. In NIR, we translate these to a release
memory barrier, the atomic, then an acquire memory barrier.
This results in a lot of unnecessary memory barriers in the middle
of the sequence of atomics:
0. Release barrier
1. Atomic
2. Acquire barrier
3. Release barrier
4. Atomic
5. Acquire barrier
6. Release barrier
7. Atomic
8. Acquire barrier
In the absence of loads/stores, and when the atomic destinations are
unused, these barriers in-between atomics shouldn't be required.
This optimization pass would drop them (lines 2-3 and 5-6 above) while
leaving the first and last barriers (0 and 8), so the sequence remains
synchronized against other access elsewhere in the program.
One common example where this occurs is a sequence of min and max
atomics to clamp a certain memory location's value within a range.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33504>
for specifiers like %v4f, we need to store the whole vector. u_printf can
already handle this from OpenCL, we just need to match that here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34909>
Kepler and earlier GPUs do not support the ISBERD instruction but have a
different VILD (Vertex Indirect Load) instruction that provides less
functionality. This commit adds support for the op in nak and nir,
needed for the upcoming encoder commit.
Signed-off-by: Lorenzo Rossi <snowycoder@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34329>
This handles basic operations where clang promotes integers to 32 bits
according to the C99 spec in OpenCL C source code.
This is its own opt_algerbraic pass, because we don't wanna fight with
nir_lower_bit_size.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34641>