Doxygen documentation says
> If the file name is omitted (i.e. the line after \file is left
> blank) then the documentation block that contains the \file command will
> belong to the file it is located in.
so we can omit the filename itself when using the annotation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>
Gfx 12.5 moved scratch to a surface and SURFTYPE_SCRATCH has this pitch
restriction:
RENDER_SURFACE_STATE::Surface Pitch
For surfaces of type SURFTYPE_SCRATCH, valid range of pitch is:
[63,262143] -> [64B, 256KB]
The pitch of the surface is the scratch size per thread and the surface
should be large enough to accommodate every physical thread.
So here adding a new field to intel_device_info, setting it in
intel_device_info_init_common() so even offline tools can have it set.
And finally adding a check to fail shader compilation if needed
scratch is larger than supported.
This issue can be reproduced in debug builds when running
dEQP-VK.protected_memory.stack.stacksize_1024 on Gfx 12.5 or newer
platforms.
Ref: BSpec 43862 (r52666)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30271>
This implements an undocumented workaround for a hardware bug that
affects draw calls with a pixel shader that has 0 push constant cycles
when TBIMR is enabled, which has been seen to lead to a hang with
Fallout 3 and Metal Gear Rising Revengeance. This hardware bug has
been reported as HSDES#22020184996 which is still pending a resolution
by the hardware team. However since this workaround found empirically
has been confirmed to fix the issue reliably and it's relatively
harmless it seems worth checking in already even though no final W/A
number is available nor has the W/A json file been updated.
To avoid the issue we simply pad the push constant payload to be at
least 1 register. This is enabled via a brw_wm_prog_key since the
driver needs to be in agreement with the compiler on whether the dummy
push constant cycle is present, and it can be avoided in cases where
the driver knows that TBIMR will be disabled (e.g. for BLORP).
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10728
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11399
Fixes: 57decad976 ("intel/xehp: Enable TBIMR by default.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30031>
The ALU-based implementation of the barycentric interpolation
intrinsics introduced by a subsequent commit will require some
primitive setup information not delivered in the PS thread payload
unless explicitly requested:
- "Source Depth and/or W Attribute Vertex Deltas" if a
perspective-correct interpolation mode is used -- Note that this is
already requested for CPS interpolation, we just need to enable it
in more cases.
- "Perspective Bary Planes" if a perspective-correct interpolation
mode is used.
- "Non-Perspective Bary Planes" if a non-perspective-corrected
interpolation mode is used.
- "Sample offsets" if any at_sample interpolation is used so the
coordinate offsets of the sample can be calculated.
This ALU implementation of barycentric interpolation will only be
needed for *_at_offset and *_at_sample interpolation, since the fixed
function hardware still computes barycentrics for us at the current
sample coordinates, only the cases that previously relied on the Pixel
Interpolator shared function need to be re-implemented with ALU
instructions, since that shared function will no longer exist on Xe2
hardware.
Thanks to Rohan for a bugfix of the uses_sample_offsets calculation,
this patch includes his fix squashed in.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
Change so the size rounds up to the next multiple of the horizontal stride like
is done for VGRF. This was causing an inconsistency in regs_read() -- The original
component_size() calculation for FIXED_GRF excluded any padding at the end but it was
still being discounted by regs_read().
Suggested by Curro.
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11069
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29736>
Like NIR, we print SSA defs as %1, %2, and so on. The number here is
the VGRF number. VGRFs that don't correspond to a SSA def remain
printed as vgrf1, vgrf2, and so on.
This makes it much easier to see what values are SSA and which aren't.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
This introduces a new analysis pass that opportunistically looks for
VGRFs which happen to satisfy the SSA definition properties.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
Our code to initialize gl_SubgroupInvocation uses multiple instructions
some of which are partial writes. This makes it difficult to analyze
expressions involving gl_SubgroupInvocation, which appear very
frequently in compute shaders.
To make this easier, we add a new virtual opcode which initializes
a full VGRF to the value of gl_SubgroupInvocation. (We also expand
it to UD for SIMD8 so there are not partial write issues.) We then
lower it to the original code later on in compilation, after we've
done the bulk of our optimizations.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
This flag is mostly redundant with uses_discard and was only
introduced to implement demote with LLVM when it didn't have
that intrinsic.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
These were removed with Icelake. While they technically still exist on
Skylake, which this compiler supports, we have never used these opcodes
in the 14 years we could have done so. So just scrap them.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29665>
Removed mention of Wa_* when referencing an intended harware behavior
since version 12. This will prevent the erroneous usage of the
`intel_needs_workaround` in the future.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29559>
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
Another architecture register that requires some care before reading.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 49ee3ae9e8 ("intel/compiler: Lower FIND_[LAST_]LIVE_CHANNEL in IR on Gfx8+")
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29319>
Some commas were being skipped, according to history as an attempt
to elide BAD_FILEs, but we still print them, so be consistent. Also
for instructions without any sources, the trailing comma was always
being printed. Fix that too.
Example of instruction output before the change
halt_target(8) (null):UD,
send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD(null):UD NoMask
and after it
halt_target(8) (null):UD
send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD, (null):UD NoMask
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>
It was the default to show register pressure for each instruction,
but it gets in the way of cleaner diffs before/after an optimization pass.
Add INTEL_DEBUG=reg-pressure option to show it again.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>
The sequential IP cause noise when diffing before/after a pass that
either add or remove instructions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>
With the previous commit, we now have new builder helpers that will
allocate a temporary destination for us. So we can eliminate a lot
of the temporary naming and declarations, and build up expressions.
In a number of cases here, the code was confusingly mixing D-type
addresses with UD-immediates, or expecting a UD destination. But the
underlying values should always be positive anyway. To accomodate the
type inference restriction that the base types much match, we switch
these over to be purely UD calculations. It's cleaner to do so anyway.
Compared to the old code, this may in some cases allocate additional
temporary registers for subexpressions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
Always select sample barycentric when persample dispatch is unknown at
compile time and let the payload adjustments feed the expected value
based on dispatch.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>
Both of these helpers do the same thing. We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Moves the lowering of VGRFs into FIXED_GRFs from the code generation
to (almost) right after the register allocation.
This will allow: (1) later passes not worry about VGRFs (and what they
mean in a post reg alloc phase) and (2) make easier to add certain
types of validation post reg alloc phase using the backend IR.
Note that a couple of passes still take advantage of seeing "allocated
VGRFs", so perform lowering after they run.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN. The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
We already have a logical opcode and lower to what is basically a send
instruction. We just weren't using SHADER_OPCODE_SEND, instead having
extra redundant infrastructure for no real gain.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
We cannot rely on the immediate message descriptor having accurate
values for mlen and rlen at the IR level, since they are updated at
codegen time via 'inst->mlen' and 'inst->size_written', which could
end up with values inconsistent with the message descriptor if
e.g. the split sends optimization had an effect. Instead, define
helpers that do the computation without relying on the message
descriptor, and use the pre-existing
brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully
equivalent to the lsc helpers deleted here) during disassembly.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
Rework:
* Jordan: Drop FINISHME (s-b Caio)
* Jordan: Use reg_unit() in asserts rather than a ver check (s-b Caio)
* Ian: Make use of reg_unit() in round_components_to_whole_registers()
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
In the common case, fs_inst will have up to 4 sources (the HW
instructions have up to 3, and our representation of SENDs have 4).
Embed such array into the fs_inst, and use it whenever applicable
instead of allocating a new array.
Also change the code to reuse the allocated src array when resizing to
a smaller length.
Between the changes above and the reduced amount of initializing
fs_regs, this reduces fossil-db time by around 2% for Borderlands 3
and Rise of the Tomb Raider, and around 1.5% for Total War Warhammer 3.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>