Reworks:
* Francisco: Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file")
* Jordan: Rebase on 952a523abb ("intel: switch over to unified
atomics")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
These fields are overlapping with the ones set by brw_message_desc(),
so the latter should be used instead. This fixes corruption of the
LSC message descriptors when inconsistent values are specified through
both helpers, which can happen if the 'inst->mlen' field is modified
during optimization (e.g. by opt_split_sends()).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
We cannot rely on the immediate message descriptor having accurate
values for mlen and rlen at the IR level, since they are updated at
codegen time via 'inst->mlen' and 'inst->size_written', which could
end up with values inconsistent with the message descriptor if
e.g. the split sends optimization had an effect. Instead, define
helpers that do the computation without relying on the message
descriptor, and use the pre-existing
brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully
equivalent to the lsc helpers deleted here) during disassembly.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
Rework:
* Jordan: Drop FINISHME (s-b Caio)
* Jordan: Use reg_unit() in asserts rather than a ver check (s-b Caio)
* Ian: Make use of reg_unit() in round_components_to_whole_registers()
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
In the common case, fs_inst will have up to 4 sources (the HW
instructions have up to 3, and our representation of SENDs have 4).
Embed such array into the fs_inst, and use it whenever applicable
instead of allocating a new array.
Also change the code to reuse the allocated src array when resizing to
a smaller length.
Between the changes above and the reduced amount of initializing
fs_regs, this reduces fossil-db time by around 2% for Borderlands 3
and Rise of the Tomb Raider, and around 1.5% for Total War Warhammer 3.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>
It was by pure luck that all sources (and the result) of nir_dpas_intel
had the same number of components. It is possible to support matrix
sizes where the accumlator matrix and the result matrix are larger
(e.g., 16x8 * 8x16 = 16x16).
This breaks all of the assumptions of NIR's infrastructure for code
generating intrinsics. Fix the by making the accumulator matrix be the
first source. The accumulator and the result will always have the same
dimensions (due to rules of matrix multiplication) and the same type
(due to restructions of the cooperative matrix extension). This forces
them to have the same number of components.
This doesn't fix all the potential problems. NIR expects that all
0-sized sources will have the same number of components. This just
ensures that the result has the correct number of components.
Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
Was previously passing 1, 1, 0 as the regioning. This generated
incorrect disassembly because the encoding for a width of 1 is 0. Use
the enums to ensure the correct values are used.
Fixes: 1c92dad5cb ("intel/disasm: Disassembly support for DPAS")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
If the destination was the accumulator but is no longer, having the flag
set is not correct. On Xe2 this also causes a validation error.
v2: Reword the comment to be more clear. Suggested by Jordan.
Fixes: efa4e4bc5f ("intel/fs: Introduce regioning lowering pass.")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
BSpec 56797:
Math operation rules when half-floats are used on both source and
destination operands and both source and destinations are packed.
The execution size must be 16.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>
We were asserting that entry->dst.offset % REG_SIZE == 0, which is
easily tripped by a simple LOAD_PAYLOAD that writes a 16-bit vec2:
load_payload(8) vgrf1:UW, vgrf2+0.0:UW, vgrf3+0.0:UW
We create separate ACP entries corresponding to the values coming from
vgrf2 and vgrf3, with entry->dst set to the location within vgrf1 where
those sources get written to. So the second entry will have offset 16,
which is not REG_SIZE aligned.
It looks like this assert was originally added back in 2014 (see commit
1728e74957) and adjusted through the ages,
including at a point when we combined reg and subreg offsets into a
single byte offset, and over time also extended copy propagation.
Here the destination offset is already accounted for via rel_offset,
at the byte offset level, so things ought to work and there is no need
to assert that this is the case. Ian had already noted that the assert
tripped in commit e3f502e007, but checking
for inst->opcode == MOV here doesn't really make sense - it's just the
case that he found that broke.
Remove the erroneous assertion.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
opt_copy_propagation() can sometimes propagate FIXED_GRF sources into
SHADER_OPCODE_SENDs as the message payload. For example, GS input
reads, which simply take a URB handle and have the offset in the
descriptor. For non-VGRFs, there isn't a payload to split, so just
skip past such send messages.
Fixes: 589b03d02f ("intel/fs: Opportunistically split SEND message payloads")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
This is achieved by the following steps:
#ifndef DEBUG => #if !MESA_DEBUG
defined(DEBUG) => MESA_DEBUG
#ifdef DEBUG => #if MESA_DEBUG
This is done by replace in vscode
excludes
docs,*.rs,addrlib,src/imgui,*.sh,src/intel/vulkan/grl/gpu
These are safe because those files should keep DEBUG macro is already excluded;
and not directly replace DEBUG, as we have some symbols around it.
Use debug or NDEBUG instead of DEBUG in comments when proper
This for reduce the usage of DEBUG,
so it's easier migrating to MESA_DEBUG
These are found when migrating DEBUG to MESA_DEBUG,
these are all comment update, so it's safe
Replace comment /* DEBUG */ and /* !DEBUG */ with proper /* MESA_DEBUG */ or /* !MESA_DEBUG */ manually
DEBUG || !NDEBUG -> MESA_DEBUG || !NDEBUG
!DEBUG && NDEBUG -> !(MESA_DEBUG || !NDEBUG)
Replace the DEBUG present in comment with proper new MESA_DEBUG manually
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28092>
The compiler only references `intel_device_info->subslice_masks` for
ray tracing workloads. Platforms which lack raytracing support can
share a cache even if they differ on this field.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28311>
The 64-bit type lowering for SEL in opt_algebraic had a pre-existing bug
where it only triggered when 64-bit float _and_ integer types were
unsupported. Meteorlake supports 64-bit float but not integer, so we
need to lower Q/UQ in that case still.
When I moved this to a later pass, opt_peephole_sel started generating
Q/UQ SEL instructions which were failing to be lowered.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10867
Fixes: ea423aba1b ("intel/brw: Split out 64-bit lowering from algebraic optimizations")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28328>
This is similar to what RADV implements using the NIR_LOOP_PASS
helpers. I have not used those helpers for a couple of reasons:
1. They use the pointer to the optimization function, which doesn't
work if the same function is called multiple times in one invocation
of the loop (fixable)
2. After fixing them, due to Intel's use of sub-expressions, the amount
of code added to wrap the shared macro becomes more than simply
reimplementing them for the Intel compiler
On most workloads the results are a wash, but on compile heavy
workloads like Cyberpunk 2077 and Rise of the Tomb Raider, I saw
fossil-db runtimes fall by 1-2% on my ICL, with no changes to the
compiled shaders. Caio saw closer to 2.5% on TGL.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27510>
The mlen tracking is in REG_SIZE units, but in Xe2 each GRF has
doubled the size. The optimization can only elide full GRFs, so
round down the amount of trailing zeros to ensure the optimization
will remove only full GRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28279>
Folks, there's more than one accumulator. In general, when the
register file is ARF, the upper 4 bits of the register number specify
which ARF, and the lower 4 bits specify which one of that ARF. This
can be further partitioned by the subregister number.
This is already mostly handled correctly for flags register, but lots
of places wanted to check the register number for equality with
BRW_ARF_ACCUMULATOR. If acc1 is ever specified, that won't work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>
In what is probably the most common case cross of compilation, x86_64
-> x86, it should be possible to build intel-clc for the host machine
and run it. Doing so simplifies the build by not needing to be able to
cross compile half of mesa, and should ease developer and distro strain
for building Intel drivers for x86.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28222>