This is necessary to appropriately uniformize the first component
access of a convergent vector. Without this, this is produced:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0:F, 0.5f
add(32) %22:F, %18+2.0<0>:F, 0.5f
This is the correct code:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0<0>:F, 0.5f
add(32) %22:F, %18+2.0<0>:F, 0.5f
Without 38b58e286f, the code generated was more incorrect, but happened
to work for this test case:
load_payload(16) %18:D, 0d, 0d NoMask group0
add(32) %21:F, %18+0.0<0>:F, 0.5f
add(32) %22:F, %18+0.4<0>:F, 0.5f
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 38b58e286f ("brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset")
Closes: #12969
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34427>
Expand constant sources to cover the region read by DPAS, and also
use NULL register as accumulator when possible.
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34373>
Funny story... this is how regular b2f was implemented before Curro
implmented the `MOV dst:F -src:D` method 9 years ago (see
3ee2daf23d).
Eliminating the type conversion in the arithmetic operation enables the
next commit.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
The vast majority of the callers want channel = 0. During the
development process, using this default parameter value saved a lot of
pain in rebasing. However, it seems to be more trouble than it's worth.
Issue #12464 occurred because LNL was merged while this code was in
review. As a result, one caller of get_nir_src that wanted channel = -1
was not inspected closely, and it got the default channel = 0 instead.
To prevent this happening in the future (with possible branches still
yet to be merged, for example), remove the default parameter. This will
force the inspection of any callers that don't have an explicit channel
parameter. Hopefully that will prevent more problems.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
The source of nir_intrinsic_load_barycentric_at_offset is a vector, so
-1 should be passed to get_nir_src. This is also done for texture
sampling intrinsics.
I skimmed the other user of get_nir_src, and I believe they are
correct. This one was just missed as LNL support landed an many, many
rebases of the original MR occurred.
v2: Fix another get_nir_src call. Suggested by Lionel.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: d5d7ae22ae ("brw/nir: Fix up handling of sources that might be convergent vectors")
Closes: #12464
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
That way we can deal with upcoming non constant values for
VK_KHR_maintenance8.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
All remaining uses of that constructor would also use at_end(),
and vice-versa. So just implement that behavior in the constructor
itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
Our name for this enum was brw_message_target, but it's better known as
shared function ID or SFID. Call it brw_sfid to make it easier to find.
Now that brw only supports Gfx9+, we don't particularly care whether
SFIDs were introduced on Gfx4, Gfx6, or Gfx7.5. Also, the LSC SFIDs
were confusingly tagged "GFX12" but aren't available on Gfx12.0; they
were introduced with Alchemist/Meteorlake.
GFX6_SFID_DATAPORT_SAMPLER_CACHE in particular was confusing. It sounds
like the SFID to use for the sampler on Gfx6+, however it has nothing to
do with the sampler at all. BRW_SFID_SAMPLER remains the sampler SFID.
On Haswell, we ran out of messages on the main data cache data port, and
so they introduced two additional ones, for more messages. The modern
Tigerlake PRMs simply call these DP_DC0, DP_DC1, and DP_DC2. I think
the "sampler" name came from some idea about reorganizing messages that
never materialized (instead, the LSC came as a much larger cleanup).
Recently we've adopted the term "HDC" for the legacy data cluster, as
opposed to "LSC" for the modern Load/Store Cache. To make clear which
SFIDs target the legacy HDC dataports, we use BRW_SFID_HDC0/1/2.
We were also citing the G45, Sandybridge, and Ivybridge PRMs for a
compiler that supports none of those platforms. Cite modern docs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33650>
Make the "block after DO" more stable so that adding instructions after
a DO doesn't require repairing the CFG. Use a new SHADER_OPCODE_FLOW
instruction that is a placeholder representing "go to the next block"
and disappears at code generation.
For some context, there are a few facts about how CFG currently works
- Blocks are assumed to not be empty;
- DO is always by itself in a block, i.e. starts and ends a block;
- There are no empty blocks;
- Predicated WHILE and CONTINUE will link to the "block after DO";
- When nesting loops, it is possible that the "block after DO" is
another "DO".
Reasons and further explanations for those are in the brw_cfg.c comments.
What makes this new change useful is that a pass might want to add
instructions between two DO instructions. When that happens, a new
block must be created and any predicated WHILE and CONTINUE must be
repaired.
So, instead of requiring a repair (which has proven to be tricky in
the past), this change adds a block that can be "virtually" empty but
allow instructions to be added without further changes.
One alternative design would be allowing empty blocks, that would be
a deeper change since the blocks are currently assumed to be not empty
in various places. We'll save that for when other changes are made to
the CFG.
The problem described happens in brw_opt_combine_constants, and a
different patch will clean that up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>
Shorter and a preparation to add some functionality to DO().
Had to make it const since that's the convention for builder, so
just made all the sibling helpers const too.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>
The logical send lowering code sets these, and is the code which
-should- set these.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
We teach lower_logical_sends to lower these to SHADER_OPCODE_SEND
and drop all the corresponding generator and eu_emit code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
Memory fences do not refer to an element of a binding table. Rather,
the reason we had "BTI" in these opcodes was to distinguish what in
modern terms are called UGM (untyped memory data cache) vs. SLM
(cross-thread shared local memory) fences.
Icelake and older platforms used the "data cache" SFID for both
purposes, distinguishing them by having a special binding table
index, 254, meaning "this is actually SLM access". This is where
the notion that fences had BTIs came in. (In fact, prior to Icelake,
separate SLM fences were not a thing, so BTI wasn't used there either.)
To avoid confusion about BTI being involved, we choose a simpler lie: we
have Icelake SLM fences target GFX12_SFID_SLM (like modern platforms
would), even though it didn't really exist back then. Later lowering
code sets it back to the correct Data Cache SFID with magic SLM binding
table index. This eliminates BTI everywhere and an unnecessary source.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
brw_memory_fence() overrides the instructions generated by the
MEMORY_FENCE or INTERLOCK opcodes to be force_writemask_all with
exec_size == 1. But the IR was emitting it in SIMD8 (regardless
of dispatch width). Instead, just emit the IR as SIMD1/NoMask so
the IR matches what we actually generate. Have size_written indicate
that the entire destination is written, however, as it is ultimately
going to be a SEND that writes a whole register.
We were also using a UD register for the source of
FS_OPCODE_SCHEDULING_FENCE when the generator overrides it to UW,
so just specify UW in the IR as well so that they line up.
Also add validation for MEMORY_FENCE/INTERLOCK that we've done the
exec_size and masking right in the IR.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
We can just specify this as a source to the logical FB read/write
opcodes. Notably FB reads had no sources before; now they have one.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
We already have logical pixel interpolator messages that get lowered
to send messages. We can just add an extra boolean source to those
opcodes rather than sticking a opcode-specific boolean in the generic
fs_inst data structure.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>
brw_lower_logical_sends can just check for the TEX_LOGICAL_SRC_SHADOW_C
source; we don't need a generic instruction bit for this. We used to
have one because this was handled in the generator for older hardware
before the advent of logical opcode lowering.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33297>