In ancient days, we directly used the hardware register type encodings
throughout the compiler. As more GPU generations came out, encodings
shifted, and we moved to an abstract enum that we could encode/decode
to a particular GPU's hardware encoding. But there was no particular
meaning behind any particular value.
One downside to this approach is that we end up with switch statements
galore. Want to know a type's size? Switch. Convert a unsigned type
to a signed one? Switch. Get a type with the same base type, but
different bit size? Switch. This is both inefficient and inconvenient.
In contrast, nir_alu_type takes a nicer approach - the type encoding has
certain bits representing the base type, and others encoding the size of
the type. Switching base types or sizes is a simple matter of masking
out the relevant field and substituting a different one.
Tigerlake's encoding adopts a similar approach: two bits represent the
size as a 2-bit unsigned number n, where the bit size is (8 * 2^n).
Two more bits represent the base type. Past encodings were a bit ad hoc
as new data types were added over time, but Gfx12 is organized (mostly).
This patch converts our brw_reg_type enum over to a new system that's
patterned after the Tigerlake style (for easy conversion) while
deviating in a few ways that make our vector immediate type size
handling simpler. Should we add additional base types, we're likely
to continue deviating. Still, converting is much simpler.
Type size calculations (which are performed all the time) are now a
simple mask and shift, instead of a switch.
We also adopt the name BRW_TYPE_* instead of BRW_REGISTER_TYPE_* because
it's much shorter and easier to type. Similarly, we create new helper
functions named brw_type_* for working with these types, with a cleaner
naming convention. Legacy names still exist but will we dropped over
the next few patches as pieces get cleaned up.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Icelake removed the PLN instruction for interpolating fragment shader
inputs, instead adding a special "Native Float" (NF) data type which
was a 66-bit floating point data type that could only be used with the
accumulator. On Tigerlake, they dropped NF support in favor of just
doing the interpolation with MAD instructions.
We stopped using NF years ago (commit 9ea90aae1e),
instead just using the fs_visitor::lower_linterp() pass to emit MADs.
Since this existed only for a short time, and had very limited utility,
we drop it from the compiler. One downside is that we can no longer
disassemble Icelake shaders containing NF types properly, but I doubt
anyone really minds.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
align1 three-source instructions do not exist on gfx9, and this
compiler does not support gfx10. So the oldest case is gfx11.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
Moves the lowering of VGRFs into FIXED_GRFs from the code generation
to (almost) right after the register allocation.
This will allow: (1) later passes not worry about VGRFs (and what they
mean in a post reg alloc phase) and (2) make easier to add certain
types of validation post reg alloc phase using the backend IR.
Note that a couple of passes still take advantage of seeing "allocated
VGRFs", so perform lowering after they run.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>
This makes sure that copy propagation doesn't undo the lowering of
restricted sub-dword integer regions done by brw_fs_lower_regioning().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>
This patch introduces code to enforce the pages-long regioning
restrictions introduced by Xe2 that apply to sub-dword integer
datatypes (See BSpec page 56640). They impose a number of
restrictions on what the regioning parameters of a source can be
depending on the source and destination datatypes as well as the
alignment of the destination. The tricky cases are when the
destination stride is smaller than 32 bits and the source stride is at
least 32 bits, since such cases require the destination and source
offsets to be in agreement based on an equation determined by the
source and destination strides. The second source of instructions
with multiple sources is even more restricted, and due to the
existence of hardware bug HSDES#16012383669 it basically requires the
source data to be packed in the GRF if the destination stride isn't
dword-aligned.
In order to address those restrictions this patch leverages the
existing infrastructure from brw_fs_lower_regioning.cpp. The same
general approach can be used to handle this restriction we were using
to handle restrictions of the floating-point pipeline in previous
generations: Unsupported source regions are lowered by emitting an
additional copy before the instruction that shuffles the data in a way
that allows using a valid region in the original instruction. The
main difficulty that wasn't encountered in previous platforms is that
it is non-trivial to come up with a copy instruction that doesn't
break the regioning restrictions itself, since on previous platforms
we could just bitcast floating-point data and use integer copies in
order to implement arbitrary regioning, which is unfortunately no
longer a choice lacking a magic third pipeline able to do the
regioning modes the integer pipeline is no longer able to do.
The required_src_byte_stride() and required_src_byte_offset() helpers
introduced here try to calculate parameters for both regions that
avoid that situation, but it isn't always possible, and actually in
some cases that involve the second source of ALU instructions a chain
of multiple copy instructions will be required, so the
lower_instruction() routine needs to be applied recursively to the
instructions emitted to lower the original instruction.
XXX - Allow more flexible regioning for the second source of an
instruction if bug HSDES#16012383669 is fixed in a future
hardware platform.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>
The OPT macro will call validate() after each pass, so both cases
removed by this patch are just redundant calls. Will only affect
Debug builds since in Release builds validation is a no-op.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>
Use `a` and `b` (already identified as that in the output message)
instead of `f` and `s` for the two values being compared, since in
a later patch `s` will be used to hold the fs_visitor shader.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>
For integer types, the signedness is determined by flags on the muladd
instruction. The types of the sources play no role. Previously we were
using the signedness of the type and ignoring the mask.
Adjust the types passed to the dpas_intel intrinsic to match.
Fixes various
dEQP-VK.compute.*.cooperative_matrix.khr_*.matrixmuladd_cross.* tests on
different Intel platforms. Some platforms had failing tests, and some
platforms failed EU validation before the tests could fail.
Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN. The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
We already have a logical opcode and lower to what is basically a send
instruction. We just weren't using SHADER_OPCODE_SEND, instead having
extra redundant infrastructure for no real gain.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
According to the workaround description:
"For all Data Port messages, DP_FLUSH_TYPE should not be
programmed to Discard."
This issue happens only with certain circumstances but as we are not
using discard, add assert and deal with it later if discard is taken in
to use.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24422>
When I bumped the max size of VGRFs, I should have bumped the values
in the scheduler too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d33aff783d ("intel/fs: add support for sparse accesses")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28188>
HF sources to math instructions cannot be scalar. This is very similar
to an old Gfx6 restriction on POW, so let's fix it in a similar way.
As an extra bit of saftey, lower any occurances that might slip through
in brw_fs_lower_regioning.
The primary change is to prevent copy propagation from violating the
restriction. With that change, nothing should be able to generate these
invalid source strides. The modification to fs_visitor::validate should
detect potential problems sooner rather than later.
Previous attempts to implement this Wa when emitting the math
instruction (in brw_eu_emit.c gfx6_math) didn't work for several
reasons. The lowering happens after the SWSB pass, so the scoreboarding
was incorrect (thanks to Curro for finding that). In addition, the
lowering happens after register allocation, so it's impossible to
allocate a non-scalar register to expand the scalar value.
Fixes 113 tests in the dEQP-VK.spirv_assembly.* group on LNL.
v2: Add changes to brw_fs_lower_regioning. Suggested by Curro.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>
This previously worked by luck because we were incorrectly calling
nir_opt_remove_phis before calling nir_convert_from_ssa. See also #10727.
No shader-db or fossil-db changes on any Intel platform.
v2: Handle the load_interpolated_input and load_barycentric_at_sample as
separate passes. Based on discussion with Ken starting at
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136#note_2330424.
Fixes: 74a40cc4b6 ("intel/fs: move lower of non-uniform at_sample barycentric to NIR")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
align16 support is only used on Gen9 for 3-source instructions, quad
swizzling, and dPdy calculations. We don't need it for broadcast.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
brw_broadcast and generate_mov_indirect both had similar comments, both
with typos ("insead"). One still referred to IVB bugs, while the other
dropped that during the compiler split. The one that dropped the
comment mentioned "both of these" issues, while citing only one issue;
there was in fact a third issue (no-Q/UQ) that wasn't mentioned in
either comment. One also had some bad grammar in the comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
For BROADCAST and MOV_INDIRECT, required_exec_type was returning
brw_int_type(type_sz(t), false), which is an unsigned type. However,
get_exec_type(inst) returns the original type for either Q or UQ.
This meant that has_invalid_exec_type would detect a mismatch and
trigger lowering.
That lowering would insert new 64-bit MOVs, which would need to be
lowered on platforms which don't support Q/UQ. Except, we already
ran that lowering pass earlier. So, the unlowered Q/UQ MOVs would
reach the software scoreboarding pass, and trigger failures in the
inferred_exec_pipe() function, as no pipe is available to handle
64-bit integer operations.
It turns out that we don't need the region lowering pass to do
anything for these opcodes. The generator code for both BROADCAST
and MOV_INDIRECT already handle decomposing Q/UQ operations into
32-bit MOVs when they're not supported. And, it also implicitly
converts to integer types, even for floating point sources. The
inferred_exec_pipe function already special cases them to note
that they'll always be handled on the integer pipe, so that matches.
Just drop the region lowering code for these opcodes.
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
We are overriding the type to Q/UQ, so we need to split to two MOVs
if 64-bit integer math is not supported. For reference, Meteorlake
does support 64-bit floats but would still not work correctly here.
See also brw_broadcast(), which does similar indirects but correctly
checks has_64bit_int instead of has_64bit_float.
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
Rework:
* Francisco Jerez: Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file")
* Jordan: Move SHADER_OPCODE_DWORD_SCATTERED_*_LOGICAL from previous
patch, as it seems to make more sense here.
* Jordan: Change `devinfo->has_lsc` ?: to if/else as suggested by idr
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
Reworks:
* Francisco: Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file")
* Jordan: Rebase on 952a523abb ("intel: switch over to unified
atomics")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
These fields are overlapping with the ones set by brw_message_desc(),
so the latter should be used instead. This fixes corruption of the
LSC message descriptors when inconsistent values are specified through
both helpers, which can happen if the 'inst->mlen' field is modified
during optimization (e.g. by opt_split_sends()).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
We cannot rely on the immediate message descriptor having accurate
values for mlen and rlen at the IR level, since they are updated at
codegen time via 'inst->mlen' and 'inst->size_written', which could
end up with values inconsistent with the message descriptor if
e.g. the split sends optimization had an effect. Instead, define
helpers that do the computation without relying on the message
descriptor, and use the pre-existing
brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully
equivalent to the lsc helpers deleted here) during disassembly.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
Rework:
* Jordan: Drop FINISHME (s-b Caio)
* Jordan: Use reg_unit() in asserts rather than a ver check (s-b Caio)
* Ian: Make use of reg_unit() in round_components_to_whole_registers()
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>