We can achieve most of what brw_fs_opt_predicated_break() does with
simple peepholes at NIR -> BRW conversion time.
For predicated break and continue, we can simply look at an IF ... ENDIF
sequence after emitting it. If there's a single instruction between the
two, and it's a BREAK or CONTINUE, then we can move the predicate from
the IF onto the jump, and delete the IF/ENDIF. Because we haven't built
the CFG at this stage, we only need to remove them from the linked list
of instructions, which is trivial to do.
For the predicated while optimization, we can rely on the fact that we
already did the predicated break optimization, and simply look for a
predicated BREAK just before the WHILE. If so, we move the predicate
onto the WHILE, invert it, and remove the BREAK.
There are a few cases where this approach does a worse job than the old
one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks
containing break, and nir_trivialize_registers may decide it needs to
insert movs into those blocks. So, at NIR -> BRW time, we'll actually
emit some MOVs there, which might have been possible to copy propagate
out after later optimizations.
However, the fossil-db results show that it's still pretty competitive.
For instructions, 1017 shaders were helped (average -1.87 instructions),
while only 62 were hurt (average +2.19 instructions). In affected
shaders, it was -0.08% for instructions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
Instead of having a hardcoded list of endian-independent format aliases
in the header, generate them from the format definitions.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>
When the destination of both instructions is NULL and the conditional
modifier matches, operands_match (by way of instructions_match) will
only test the first two operands. This can result in bad CSE
happening.
This is a very, very narrow edge case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
This introduces a new analysis pass that opportunistically looks for
VGRFs which happen to satisfy the SSA definition properties.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
In what is probably the most common case cross of compilation, x86_64
-> x86, it should be possible to build intel-clc for the host machine
and run it. Doing so simplifies the build by not needing to be able to
cross compile half of mesa, and should ease developer and distro strain
for building Intel drivers for x86.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28222>
This code uses cfg_t which we are going to rework a bit as part of
flattening the IR types. It is easier if it can see C++ types for now.
At the end we can change this back if needed.
To avoid casting and be consistent with existing structs,
use int for some offset parameters in the functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>
The compilers (brw and elk) static libraries depend only on
idep_nir_headers instead of idep_nir. This was done to
increase the parallelism in the build. One side effect is that
consumers of the compilers must depend on idep_nir themselves to
ensure nir symbols are resolved.
Various intel tools don't use NIR directly, so don't depend on it,
and only use a few functions of the compiler, that *mostly* don't
depend on linking NIR functions except for the case of nir_print_instr.
The current code adds a weak empty function to take its place in case
it is not linked. This is sort of a hack because if we change the
compiler in ways that use NIR differently, or we use different functions
of the compiler in the tools, we will end up having to add other
dummy definitions.
A better solution here (suggested by Dylan) is to add the idep_nir
to the list of dependencies of the compilers idep's. The static
libraries of the compilers still don't depend directly on NIR,
but any user of idep_compiler_* will get that dependency.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27865>
Instead of link_with, use meson dependency for the compilers. Will
be useful later to propagate some extra dependencies.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27865>
All the workarounds are relatively small, so keep them in a single file.
Promote (or add) them to a separate file if they get large -- like it is
done for opt and lower.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>
For now is not linked to any driver. The tools were renamed to use elk
prefix to avoid conflicting with the brw ones. The run-test.py script
was also updated due to that change.
Before the new compiler can be linked together with the old (going to be
done for Iris and other tools), the symbol conflicts need to be fixed
first. This will happen in a later commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27563>
Currently, Intel's shader cache incorporates PCI ID into shader cache
keys.
Many devices with different PCI IDs have identical shader compilation
functionality. Using PCI ID as a component of the shader cache hash
means that a multi-platform shader cache will have redundant,
identical entries for similar platforms.
All Intel compiler functionality is selected based on device
configuration in `struct intel_device_info`. intel_device_info.py
flags all fields accessed by intel/compiler.
This commit generates a hash function incorporating intel/compiler
device info fields. Using this hash function in place of PCI ID will
produce a multiplatform cache with no duplicated content.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26844>
And move them inside the compiler since they (especially asm) rely on
a bunch of internal types.
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27579>
This pass combines the LOD or LOD bias and array index into a single
32-bit value since Xe2+ sampler messages requires us to do that.
v2: (Alyssa)
- Use nir_iand_imm instead of nir_iand and nir_imm_int
- Use nir_trim_vector instead of nir_swizzle
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
Add a simple test to kick off the infrastructure. And also a test
(for now disabled) that fails because the code is returning an
empty shader. Next patch will fix and enable it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27131>
Implements integer dot product lowering both with and without
DP4A. Implements half-float dot product lowering.
There are a couple FINISHME comments describing future optimizations.
v2: Add a brw_compiler::lower_dpas flag to track when the lowering
should be applied.
v3: Use is_null() instead of checking file != ARF. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
This is just the skeleton of the implementation. Future commits will
fill it all in.
v2: Move to src/intel/compiler
v3 (idr): Use vecN instead of array[N] for slice type.
v4 (idr): Refactor lower_cooperative_matrix_load and
lower_cooperative_matrix_store into a single function.
v5 (idr): Remove old, verbose debug logging. Assert that entry is not
NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of
0xffff. s/cooperative_matrix/cmat/. All suggested by Caio.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
I put both R-b on this because, at this point, we've each done equal
parts authoring and reviewing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
v2 (idr): Fix expectations BottomBreakWithContinue. opt_predicated_break
will remove the IF and make the CONTINUE predicated.
v3 (idr): Temporarily disable the one test that fails.
v4 (idr): Free strings allocated by open_memstream. Fixes gitlab CI
failures in debian-testing-asan.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25216>
With -Dintel-clc=system, the build system will search for an `intel_clc`
binary and use it instead of building `intel_clc` itself.
This allows Intel Vulkan ray tracing support to be built when cross
compiling without terrible hacks (that would otherwise be necessary due
to `intel_clc`'s dependence on SPIRV-LLVM-Translator, libclc, clang, and
LLVM).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24983>
We use a non-uniform lowering loop in the backend which we can do
better in NIR because we can also use divergence analysis there.
This change also limits VGRF usage to a single VGRF to hold the sample
ID in the backend.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>