If there's only a single instruction in a basic block, then removing it
would create an empty block. We seem to have trouble representing those
as there are no instructions with an IP inside the block; several places
mess up connections. While most blocks end in control flow instructions
(which are rarely eliminated), ones preceding a DO instruction may end
in an ordinary instruction. This makes such blocks tricky to merge with
adjacent blocks - they may be between loops. Any optimization pass may
may find such an instruction and want to eliminate it, and most of them
are unprepared to perform such CFG link surgery. Nor do we want to make
every pass aware of this issue.
To work around this, we simply replace an instruction with a NOP when
removing it from a block containing only that instruction, leaving the
block in place.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
With the previous commit, we now have new builder helpers that will
allocate a temporary destination for us. So we can eliminate a lot
of the temporary naming and declarations, and build up expressions.
In a number of cases here, the code was confusingly mixing D-type
addresses with UD-immediates, or expecting a UD destination. But the
underlying values should always be positive anyway. To accomodate the
type inference restriction that the base types much match, we switch
these over to be purely UD calculations. It's cleaner to do so anyway.
Compared to the old code, this may in some cases allocate additional
temporary registers for subexpressions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
In many cases, we calculate an expression by generating a series of
instructions. We'd either overwrite the same register repeatedly,
or call vgrf(BRW_TYPE_X) repeatedly to allocate temporaries for each
intermediate step. In many cases, we overwrote the same register simply
because allocating and naming temporaries for each step was annoying.
This commit adds new builder helpers that will allocate a temporary
destination for you, using simple type interference: unary operations
use the source type, and binary operations require a matching base type
and return the largest of the two types.
The helpers return the destination register, allowing us to write in an
expression-tree style, chaining together builder operations to produce
whole values. Sort of like nir_builder. We still optionally will write
out the fs_inst pointer in case the caller wants to do things like set
predicates or saturation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
Some instructions can operate on mixed types. Typically this is
something like a binary operation with UD and UW sources resulting
in a UD destination. In order to make it easier to find the result
type of such operations, let's make a type helper that returns the
larger of the two types (but requires the base type to match).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
We just want to emit an instruction, but we don't need to do anything
further with it, so we don't need to store the resulting inst pointer
anywhere.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
At the moment this is useless as the pipeline already holds the same
value. But in the next changes we'll stop building this value on the
pipeline to allow for more dynamic states.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>
Always select sample barycentric when persample dispatch is unknown at
compile time and let the payload adjustments feed the expected value
based on dispatch.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>
Somehow I missed this one in 164c0951a0
If the format the image is being created with doesn't have the FSR
format feature, report it as unsupported.
Also fixes future CTS tests: dEQP-VK.api.info.unsupported_image_usage.*
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28913>
Align16 is only used on Gfx9, while Align1 is used on Gfx11+. We can
decode both kinds of encodings in the same function with a simple
devinfo check. One snag is that the align16 encodings didn't have a
separate exec_type field, but we can just pass 0.
This lets us have a single function named brw_type_decode_for_3src,
which is much less of a mouthful.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Align16 is only used on Gfx9, while Align1 is used on Gfx11+. We can
handle both encodings in the same function with a simple devinfo check,
and give that function a simple name like brw_type_encode_for_3src.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Both of these helpers do the same thing. We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
In ancient days, we directly used the hardware register type encodings
throughout the compiler. As more GPU generations came out, encodings
shifted, and we moved to an abstract enum that we could encode/decode
to a particular GPU's hardware encoding. But there was no particular
meaning behind any particular value.
One downside to this approach is that we end up with switch statements
galore. Want to know a type's size? Switch. Convert a unsigned type
to a signed one? Switch. Get a type with the same base type, but
different bit size? Switch. This is both inefficient and inconvenient.
In contrast, nir_alu_type takes a nicer approach - the type encoding has
certain bits representing the base type, and others encoding the size of
the type. Switching base types or sizes is a simple matter of masking
out the relevant field and substituting a different one.
Tigerlake's encoding adopts a similar approach: two bits represent the
size as a 2-bit unsigned number n, where the bit size is (8 * 2^n).
Two more bits represent the base type. Past encodings were a bit ad hoc
as new data types were added over time, but Gfx12 is organized (mostly).
This patch converts our brw_reg_type enum over to a new system that's
patterned after the Tigerlake style (for easy conversion) while
deviating in a few ways that make our vector immediate type size
handling simpler. Should we add additional base types, we're likely
to continue deviating. Still, converting is much simpler.
Type size calculations (which are performed all the time) are now a
simple mask and shift, instead of a switch.
We also adopt the name BRW_TYPE_* instead of BRW_REGISTER_TYPE_* because
it's much shorter and easier to type. Similarly, we create new helper
functions named brw_type_* for working with these types, with a cleaner
naming convention. Legacy names still exist but will we dropped over
the next few patches as pieces get cleaned up.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Icelake removed the PLN instruction for interpolating fragment shader
inputs, instead adding a special "Native Float" (NF) data type which
was a 66-bit floating point data type that could only be used with the
accumulator. On Tigerlake, they dropped NF support in favor of just
doing the interpolation with MAD instructions.
We stopped using NF years ago (commit 9ea90aae1e),
instead just using the fs_visitor::lower_linterp() pass to emit MADs.
Since this existed only for a short time, and had very limited utility,
we drop it from the compiler. One downside is that we can no longer
disassemble Icelake shaders containing NF types properly, but I doubt
anyone really minds.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
align1 three-source instructions do not exist on gfx9, and this
compiler does not support gfx10. So the oldest case is gfx11.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Fixing some simulation issues on Gfx9/11 with zink on anv running dual
source blending piglit tests like :
./bin/arb_blend_func_extended-dual-src-blending-discard-without-src1 -auto -fbo
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28901>
We were accidentally leaving XY_BLOCK_COPY_BLT's Source and Destination
MOCS fields set to 0 (Error: Reserved for Non-Use) on Gfx12.0 systems.
This was causing assert fails in debug builds, since we try to ensure
that we don't do that. In theory, MOCS 0 is supposed to be equivalent
to MOCS 2 (all the caching), but...we probably ought to use MOCS 3
(uncached). Every Gfx12.5+ platform requires it, so although there
isn't a note about Gfx12.0 needing that, it's possible that it does.
We're currently only using the blitter for DRI PRIME blits on Gfx12.0,
anyway, and I think we're flushing all the caches regardless.
This bug was somewhat obscure to hit:
- You need a hybrid graphics system with Gfx12.0 and some other GPU
- You have to be using "reverse PRIME", i.e. rendering on the integrated
GPU and displaying on the discrete one. This is not the common case.
- You have to be using a debug build.
No observable performance delta in GfxBench5 Car Chase (an arbitrary
program) when rendering on Alderlake GT1 and displaying on an Arc A770.
Fixes: 194afe8416 ("anv/iris/blorp: use the right MOCS values for each engine")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28894>
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
The compiler supports those intrinsics only for task/mesh shaders and it
never caused any issues, because the way `nir_lower_compute_system_values`
is doing its lowering.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
Starting from MTL there is registers in HW to read the IP version of
graphics, media and display IPs, those registers are called GMD.
IPs can be used in any combination to form a SOC/platform and each IP
has it own stepping/revision, making complex to track each IP stepping
using just PCI revision.
Since MTL will be supported by default by i915 KMD that don't have
a uAPI fetch IP versions, this feature will only be supported in LNL
and newer that are backed by Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26908>
This function has 2 additional parameters to set spacing before
printing register group dword or individual registers.
intel_print_group() is keept with the same spacing as before so no
changes on decoder output is expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28722>
Xe parser will also need to use the option_color parameter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28722>
Initial work by Rafael Antognolli <rafael.antognolli@intel.com>
Reworks
- Rebase to main
- Emit the right hiz op for higher mip levels when transitioning the
depth buffer
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28629>
Moves the lowering of VGRFs into FIXED_GRFs from the code generation
to (almost) right after the register allocation.
This will allow: (1) later passes not worry about VGRFs (and what they
mean in a post reg alloc phase) and (2) make easier to add certain
types of validation post reg alloc phase using the backend IR.
Note that a couple of passes still take advantage of seeing "allocated
VGRFs", so perform lowering after they run.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>