This was originally turned into a separate struct for reuse between vec4
and fs backends, that's not needed anymore.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33334>
Defaults to true. When set to false Iris and various tools can be
built without ELK support. In both cases this means supporting
only Gfx9+. This option must be true to build Crocus or Hasvk.
This allows skipping re-building ELK when developing for newer platforms
with tools/tests enabled.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11575
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
In theory you can build a driver using OpenCL kernels with a
-Dmesa-clc=system. That shouldn't require any LLVM/Clang/etc...
But the checks to find the pre-compiled mesa_clc & vtn_bindgen
binaries are in meson files or conditions only triggered if you build
with LLVM (:
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33014>
As Asahi, Intel and soon Panfrost requires an offline compiler for their
respective internal shaders, this commit adds generic new options to
workaround meson current limitations around cross-compillation.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32719>
Borderlands 3 (both DX11 and DX12 renderers) have a common pattern
across many shaders:
con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture)
con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture)
...
con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture)
con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture)
A single basic block contains piles of texelFetches from a 1D buffer
texture, with constant coordinates. In most cases, only the .x channel
of the result is read. So we have something on the order of 28 sampler
messages, each asking for...a single uint32_t scalar value. Because our
sampler doesn't have any support for convergent block loads (like the
untyped LSC transpose messages for SSBOs)...this means we were emitting
SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar,
replicating what's effectively a SIMD1 value to the entire register.
This is hugely wasteful, both in terms of register pressure, and also in
back-and-forth sending and receiving memory messages.
The good news is we can take advantage of our explicit SIMD model to
handle this more efficiently. This patch adds a new optimization pass
that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic
block, with constant offsets, from the same texture. It constructs a
new divergent coordinate where each channel is one of the constants
(i.e <10, 11, 12, ..., 26> in the above example). It issues a new
NoMask divergent texel fetch which loads N useful channels in one go,
and replaces the rest with expansion MOVs that splat the SIMD1 result
back to the full SIMD width. (These get copy propagated away.)
We can pick the SIMD size of the load independently of the native shader
width as well. On Xe2, those 28 convergent loads become a single SIMD32
ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can
use a smaller size when there aren't many to combine.
In fossil-db, this cuts 27% of send messages in affected shaders, 3-6%
of cycles, 2-3% of instructions, and 8-12% of live registers. On A770,
this improves performance of Borderlands 3 by roughly 2.5-3.5%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>
Intel GPUs have a saturate destination modifier, and
brw_fs_opt_saturate_propagation tries to replace explicit saturate
operations with this destination modifier. That pass is limited in
several ways. If the source of the explicit saturate is in a different
block or if the source of the explicit saturate is live after the
explicit saturate, brw_fs_opt_saturate_propagation will be unable to
make progress.
This optimization exists to help brw_fs_opt_saturate_propagation make
more progress. It tries to move NIR fsat instructions to the same block
that contains the definition of its source. It does this only in cases
where it will not create additional live values. It also attempts to do
this only in cases where the explicit saturate will ultimiately be
converted to a destination modifier.
v2: Fix metadata_preserve when theres no progress and use
nir_metadata_control_flow when there is progress. All suggested by
Alyssa.
v3: Fix a typo in the file header comment. Noticed by Ken. Don't
require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead
of open-coding something slightly more specific. Both suggested by Ken.
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733645 -> 19733028 (<.01%)
instructions in affected programs: 193300 -> 192683 (-0.32%)
helped: 246
HURT: 1
helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2
helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for instructions value: -2.87 -2.13
95% mean confidence interval for instructions %-change: -0.34% -0.32%
Instructions are helped.
total cycles in shared programs: 916180971 -> 916264656 (<.01%)
cycles in affected programs: 30197180 -> 30280865 (0.28%)
helped: 194
HURT: 142
helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19
helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23%
HURT stats (abs) min: 1 max: 28058 x̄: 1781.68 x̃: 399
HURT stats (rel) min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63%
95% mean confidence interval for cycles value: -196.84 694.97
95% mean confidence interval for cycles %-change: -0.17% 1.27%
Inconclusive result (value mean confidence interval includes 0).
fossil-db:
Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02%
Max live registers: 32013312 -> 32013549 (+0.00%)
Max dispatch width: 5512304 -> 5512136 (-0.00%)
Totals from 774 (0.12% of 630172) affected shaders:
Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01%
Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30%
Max live registers: 82195 -> 82432 (+0.29%)
Max dispatch width: 6664 -> 6496 (-2.52%)
Ice Lake
Totals:
Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01%
Max live registers: 32471367 -> 32471603 (+0.00%)
Max dispatch width: 5623752 -> 5623712 (-0.00%)
Totals from 733 (0.12% of 635598) affected shaders:
Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01%
Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64%
Max live registers: 72067 -> 72303 (+0.33%)
Max dispatch width: 6216 -> 6176 (-0.64%)
Skylake
Totals:
Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01%
Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00%
Totals from 659 (0.11% of 625670) affected shaders:
Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00%
Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41%
Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
We can achieve most of what brw_fs_opt_predicated_break() does with
simple peepholes at NIR -> BRW conversion time.
For predicated break and continue, we can simply look at an IF ... ENDIF
sequence after emitting it. If there's a single instruction between the
two, and it's a BREAK or CONTINUE, then we can move the predicate from
the IF onto the jump, and delete the IF/ENDIF. Because we haven't built
the CFG at this stage, we only need to remove them from the linked list
of instructions, which is trivial to do.
For the predicated while optimization, we can rely on the fact that we
already did the predicated break optimization, and simply look for a
predicated BREAK just before the WHILE. If so, we move the predicate
onto the WHILE, invert it, and remove the BREAK.
There are a few cases where this approach does a worse job than the old
one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks
containing break, and nir_trivialize_registers may decide it needs to
insert movs into those blocks. So, at NIR -> BRW time, we'll actually
emit some MOVs there, which might have been possible to copy propagate
out after later optimizations.
However, the fossil-db results show that it's still pretty competitive.
For instructions, 1017 shaders were helped (average -1.87 instructions),
while only 62 were hurt (average +2.19 instructions). In affected
shaders, it was -0.08% for instructions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
Instead of having a hardcoded list of endian-independent format aliases
in the header, generate them from the format definitions.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>
When the destination of both instructions is NULL and the conditional
modifier matches, operands_match (by way of instructions_match) will
only test the first two operands. This can result in bad CSE
happening.
This is a very, very narrow edge case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
This introduces a new analysis pass that opportunistically looks for
VGRFs which happen to satisfy the SSA definition properties.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
In what is probably the most common case cross of compilation, x86_64
-> x86, it should be possible to build intel-clc for the host machine
and run it. Doing so simplifies the build by not needing to be able to
cross compile half of mesa, and should ease developer and distro strain
for building Intel drivers for x86.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28222>
This code uses cfg_t which we are going to rework a bit as part of
flattening the IR types. It is easier if it can see C++ types for now.
At the end we can change this back if needed.
To avoid casting and be consistent with existing structs,
use int for some offset parameters in the functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>
The compilers (brw and elk) static libraries depend only on
idep_nir_headers instead of idep_nir. This was done to
increase the parallelism in the build. One side effect is that
consumers of the compilers must depend on idep_nir themselves to
ensure nir symbols are resolved.
Various intel tools don't use NIR directly, so don't depend on it,
and only use a few functions of the compiler, that *mostly* don't
depend on linking NIR functions except for the case of nir_print_instr.
The current code adds a weak empty function to take its place in case
it is not linked. This is sort of a hack because if we change the
compiler in ways that use NIR differently, or we use different functions
of the compiler in the tools, we will end up having to add other
dummy definitions.
A better solution here (suggested by Dylan) is to add the idep_nir
to the list of dependencies of the compilers idep's. The static
libraries of the compilers still don't depend directly on NIR,
but any user of idep_compiler_* will get that dependency.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27865>
Instead of link_with, use meson dependency for the compilers. Will
be useful later to propagate some extra dependencies.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27865>