All remaining uses of that constructor would also use at_end(),
and vice-versa. So just implement that behavior in the constructor
itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
And use brw_builder(brw_shader *) and brw_builder() constructors
where possible.
The way tests are written, it is necessary to initialize an "empty"
builder -- which is later replaced by a proper one. Default parameter
NULL make that initialization implicit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
Since brw_inst now has the block it belongs and the block can
reach the shader, the only necessary information to create a
builder is the brw_inst itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
There's already code to verify that any compacted instruction
that we produce is equivalent to the original uncompacted
instruction -- including detailed output if it fails.
This patch enables this verification in debug build and will
abort in case it fails.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33821>
Calculation was wrongly walking uncompacted instructions, even if we had
some compacted in the middle, generating invalid size. Since we are
here just drop the instruction count, since in practice the caller will
have to walk the instruction stream anyway.
Fixes: 6267585778 ("intel/brw: Also return the size of the assembled shader")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33532>
Our name for this enum was brw_message_target, but it's better known as
shared function ID or SFID. Call it brw_sfid to make it easier to find.
Now that brw only supports Gfx9+, we don't particularly care whether
SFIDs were introduced on Gfx4, Gfx6, or Gfx7.5. Also, the LSC SFIDs
were confusingly tagged "GFX12" but aren't available on Gfx12.0; they
were introduced with Alchemist/Meteorlake.
GFX6_SFID_DATAPORT_SAMPLER_CACHE in particular was confusing. It sounds
like the SFID to use for the sampler on Gfx6+, however it has nothing to
do with the sampler at all. BRW_SFID_SAMPLER remains the sampler SFID.
On Haswell, we ran out of messages on the main data cache data port, and
so they introduced two additional ones, for more messages. The modern
Tigerlake PRMs simply call these DP_DC0, DP_DC1, and DP_DC2. I think
the "sampler" name came from some idea about reorganizing messages that
never materialized (instead, the LSC came as a much larger cleanup).
Recently we've adopted the term "HDC" for the legacy data cluster, as
opposed to "LSC" for the modern Load/Store Cache. To make clear which
SFIDs target the legacy HDC dataports, we use BRW_SFID_HDC0/1/2.
We were also citing the G45, Sandybridge, and Ivybridge PRMs for a
compiler that supports none of those platforms. Cite modern docs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33650>
It turns out that we need to add a NOP not only in between two
consecutive WHILE instructions, but also after every control flow
instruction that immediately precedes a WHILE.
v2: Rebase after the renames.
Fixes: 5ca883505e ("brw: add a NOP in between WHILE instructions on LNL")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33021>
I'm not aware of any workloads that will be impacted by this change,
but let's keep our list of control flow instructions complete. A
shader-db run on MTL tells me nothing changes.
v2: "The scheduler relies on HALT not being considered control flow to
be able to move code past HALT instructions. Doing this would prevent
such optimization from happening and would reduce performance
dramatically in some cases." - Francisco.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33021>
In `fossilize-replay --pipeline-hash 375a63e14afa96c4
fossils/fossil-db/steam-dxvk/f1_22_abu_dhabi.dx12vk-ultra.foz`,
`cf_count` would get decremented below zero. This would lead trying to
print `UINT_MAX` levels of indentation just a few lines below. I ran
out of disk space and patience before that finished. 🤣
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33748>
Currently the condition to use a single MOV is failing on immediate
values, so we emit 2 MOVs in SIMD8 instead of a single SIMD16.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33704>
apply our semantic patch manually to the remaining users. Coccinelle bailed on
these files for whatever reason, I guess.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
It looks like even if we pass the header not present in the sampler descriptor,
it's not helping with the correct behavior of texelFetch.
Experiment on real HW shows that if we just zero out the header and include it
in the message, it helps with the correct behavior. I'm not sure if there is a
valid HW workaround for this one.
We can skip masking the sampler message header bits 4:0 but masking them out
doesn't hurt in this case.
Increasing number of parameter impact sampler performance, For example,
a sample message using 5 parameters will not be able to sustain the same
throughput as a sample message with only 4 valid parameters. We should
look out for any perf impact with respect to texel fetch.
This patch fixes ~3k tests involving texelFetch instruction on Xe3+
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33562>
Since a previous change ensured that a DO-block is guaranteed to not be
followed by a DO-block, it is sufficient to pick the next block without
requiring to repair the CFG.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>
Make the "block after DO" more stable so that adding instructions after
a DO doesn't require repairing the CFG. Use a new SHADER_OPCODE_FLOW
instruction that is a placeholder representing "go to the next block"
and disappears at code generation.
For some context, there are a few facts about how CFG currently works
- Blocks are assumed to not be empty;
- DO is always by itself in a block, i.e. starts and ends a block;
- There are no empty blocks;
- Predicated WHILE and CONTINUE will link to the "block after DO";
- When nesting loops, it is possible that the "block after DO" is
another "DO".
Reasons and further explanations for those are in the brw_cfg.c comments.
What makes this new change useful is that a pass might want to add
instructions between two DO instructions. When that happens, a new
block must be created and any predicated WHILE and CONTINUE must be
repaired.
So, instead of requiring a repair (which has proven to be tricky in
the past), this change adds a block that can be "virtually" empty but
allow instructions to be added without further changes.
One alternative design would be allowing empty blocks, that would be
a deeper change since the blocks are currently assumed to be not empty
in various places. We'll save that for when other changes are made to
the CFG.
The problem described happens in brw_opt_combine_constants, and a
different patch will clean that up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>
Shorter and a preparation to add some functionality to DO().
Had to make it const since that's the convention for builder, so
just made all the sibling helpers const too.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>
The expectations are :
- no MSAA images
- a single tiling mode is used when not linear
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32676>
This is a debug feature that we kind of manage in the driver atm. It's
better that we move this completely to the compiler and can load it
from the cache.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Michael Cheng <michael.cheng@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33643>
I dumped assembly generated by our driver with INTEL_DEBUG=shaders,
copied and pasted it into a lua file, tried to run it with
src/intel/executor, but the disassembler started telling me some
instructions were invalid.
This happened because we print the "compacted" flag in our assembly
text, so when brw_gram.y parses our assembly flag, it sees the
"compacted" flag and sets it to the instruction by calling
add_instruction_option(). But the executor tool never sets the
BRW_ASSEMBLE_COMPACT flag when it calls brw_assemble(), so when
brw_assemble() calls dump_assembly(), which calls brw_disassbemble(),
the disassembler gets confused and prints misinterpreted instructions
and calls them invalid.
It is not the job of brw_gram.y (our text assembly parser) to mark
instructions as compacted. Whatever is later assembling the
instruction is the entity that should decide if the instructions are
compacted or not. So in this patch we just ignore this flag.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33614>
For the instructions we parse with brw_gram.y, don't unconditionally
call brw_eu_inst_set_cond_modifier(). Do it like we do in
brw_generator::generate_code() and only call it if we have a
cond_modifier to set.
Why? Because for ONE_SRC instructions, CondCtrl (bits 95:92) only
exists if Src.IsImm is false. If Src.Imm is true, then bits 95:64 are
actually Src0.ImmValue[63:32]. If we unconditionally call
brw_eu_inst_set_cond_modifier(), we'll end up zeroing bits 95:92 for
ONE_SRC instructions with 64bit immediates. See BSpec page
Structure_EU_INSTRUCTION_BASIC_ONE_SRC (56880).
This issue can be reproduced with src/intel/executor if you try to
have the following instruction:
mov(16) g10<1>Q 0xfedcba9876543210:Q { align1 WE_all 1H };
our parser will end up zeroing the top bits, so the value of the
immediate will be 0x0edcba9876543210.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33559>
Since Xe2, the registers are bigger and even the instruction
structures got updated to have 6 bits.
The way I detected this issue was when I tried to use
src/intel/executor to add the following instruction:
add(8) g6.8<1>UD g4<8,8,1>UD 0x00000008UD { align1 WE_all 1Q I@1 };
Executor would read this and end up emitting an add with dst being
g6<1>UD instead of what we wanted. It turns out that inside
brw_gram.y, at dstoperand and dstoperandex we do:
$$.subnr = $$.subnr * brw_type_size_bytes($4);
which would overflow subnr back to 0.
The overflow doesn't seem to be a problem with code we emit directly
(unlike the code we parse, like above) due to the fact that we seem to
treat Xe2 registers as smaller all the way until we call phys_nr() and
phys_subnr() during code generation. The phys_subnr() function can
generate a value that would overflow reg.subnr, but this value is
never written back to reg.subnr, it's just returned as an unsigned
int.
Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33539>
Normally the driver & compiler work together to use as few
3DSTATE_VERTEX_ELEMENTS/VERTEX_BUFFER_ELEMENT data as possible.
The compiler ignores unused bits and driver avoids emitting the
corresponding elements in 3DSTATE_VERTEX_ELEMENTS.
For device generated commands, we want an 3DSTATE_VERTEX_ELEMENTS
programming that is independent from the shader so that we can
implement indirect pipeline binding without complicating the
generation shader as well as emitting fewer generated commands.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>
It's not used outside of the compiler.
We add a new nr_attribute_regs which now seems useless but will be
useful in a later change.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32418>
Looks like we are not using the primitive leaf desc loading code part at
all. Let's just drop it.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33497>