In ancient days, we directly used the hardware register type encodings
throughout the compiler. As more GPU generations came out, encodings
shifted, and we moved to an abstract enum that we could encode/decode
to a particular GPU's hardware encoding. But there was no particular
meaning behind any particular value.
One downside to this approach is that we end up with switch statements
galore. Want to know a type's size? Switch. Convert a unsigned type
to a signed one? Switch. Get a type with the same base type, but
different bit size? Switch. This is both inefficient and inconvenient.
In contrast, nir_alu_type takes a nicer approach - the type encoding has
certain bits representing the base type, and others encoding the size of
the type. Switching base types or sizes is a simple matter of masking
out the relevant field and substituting a different one.
Tigerlake's encoding adopts a similar approach: two bits represent the
size as a 2-bit unsigned number n, where the bit size is (8 * 2^n).
Two more bits represent the base type. Past encodings were a bit ad hoc
as new data types were added over time, but Gfx12 is organized (mostly).
This patch converts our brw_reg_type enum over to a new system that's
patterned after the Tigerlake style (for easy conversion) while
deviating in a few ways that make our vector immediate type size
handling simpler. Should we add additional base types, we're likely
to continue deviating. Still, converting is much simpler.
Type size calculations (which are performed all the time) are now a
simple mask and shift, instead of a switch.
We also adopt the name BRW_TYPE_* instead of BRW_REGISTER_TYPE_* because
it's much shorter and easier to type. Similarly, we create new helper
functions named brw_type_* for working with these types, with a cleaner
naming convention. Legacy names still exist but will we dropped over
the next few patches as pieces get cleaned up.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Icelake removed the PLN instruction for interpolating fragment shader
inputs, instead adding a special "Native Float" (NF) data type which
was a 66-bit floating point data type that could only be used with the
accumulator. On Tigerlake, they dropped NF support in favor of just
doing the interpolation with MAD instructions.
We stopped using NF years ago (commit 9ea90aae1e),
instead just using the fs_visitor::lower_linterp() pass to emit MADs.
Since this existed only for a short time, and had very limited utility,
we drop it from the compiler. One downside is that we can no longer
disassemble Icelake shaders containing NF types properly, but I doubt
anyone really minds.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
align1 three-source instructions do not exist on gfx9, and this
compiler does not support gfx10. So the oldest case is gfx11.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Fixing some simulation issues on Gfx9/11 with zink on anv running dual
source blending piglit tests like :
./bin/arb_blend_func_extended-dual-src-blending-discard-without-src1 -auto -fbo
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28901>
We were accidentally leaving XY_BLOCK_COPY_BLT's Source and Destination
MOCS fields set to 0 (Error: Reserved for Non-Use) on Gfx12.0 systems.
This was causing assert fails in debug builds, since we try to ensure
that we don't do that. In theory, MOCS 0 is supposed to be equivalent
to MOCS 2 (all the caching), but...we probably ought to use MOCS 3
(uncached). Every Gfx12.5+ platform requires it, so although there
isn't a note about Gfx12.0 needing that, it's possible that it does.
We're currently only using the blitter for DRI PRIME blits on Gfx12.0,
anyway, and I think we're flushing all the caches regardless.
This bug was somewhat obscure to hit:
- You need a hybrid graphics system with Gfx12.0 and some other GPU
- You have to be using "reverse PRIME", i.e. rendering on the integrated
GPU and displaying on the discrete one. This is not the common case.
- You have to be using a debug build.
No observable performance delta in GfxBench5 Car Chase (an arbitrary
program) when rendering on Alderlake GT1 and displaying on an Arc A770.
Fixes: 194afe8416 ("anv/iris/blorp: use the right MOCS values for each engine")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28894>
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
The compiler supports those intrinsics only for task/mesh shaders and it
never caused any issues, because the way `nir_lower_compute_system_values`
is doing its lowering.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
Starting from MTL there is registers in HW to read the IP version of
graphics, media and display IPs, those registers are called GMD.
IPs can be used in any combination to form a SOC/platform and each IP
has it own stepping/revision, making complex to track each IP stepping
using just PCI revision.
Since MTL will be supported by default by i915 KMD that don't have
a uAPI fetch IP versions, this feature will only be supported in LNL
and newer that are backed by Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26908>
This function has 2 additional parameters to set spacing before
printing register group dword or individual registers.
intel_print_group() is keept with the same spacing as before so no
changes on decoder output is expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28722>
Xe parser will also need to use the option_color parameter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28722>
Initial work by Rafael Antognolli <rafael.antognolli@intel.com>
Reworks
- Rebase to main
- Emit the right hiz op for higher mip levels when transitioning the
depth buffer
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28629>
Moves the lowering of VGRFs into FIXED_GRFs from the code generation
to (almost) right after the register allocation.
This will allow: (1) later passes not worry about VGRFs (and what they
mean in a post reg alloc phase) and (2) make easier to add certain
types of validation post reg alloc phase using the backend IR.
Note that a couple of passes still take advantage of seeing "allocated
VGRFs", so perform lowering after they run.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>
Bspec 57023: RENDER_SURFACE_STATE::Shader Channel Select Red
"For channels not present in the surface format, the corresponding
Surface Channel Select is either SCS_ZERO or SCS_ONE."
This restriction applies to alpha channel as well if an associated
resource is not used as a render target.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28791>
Bspec 57023: RENDER_SURFACE_STATE:: Shader Channel Select Red
"Render Target messages do not support swapping of colors with
alpha. The Red, Green, or Blue Shader Channel Selects do not
support SCS_ALPHA. The Shader Channel Select Alpha does not support
SCS_RED, SCS_GREEN, or SCS_BLUE."
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28791>
This makes sure that copy propagation doesn't undo the lowering of
restricted sub-dword integer regions done by brw_fs_lower_regioning().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>
This patch introduces code to enforce the pages-long regioning
restrictions introduced by Xe2 that apply to sub-dword integer
datatypes (See BSpec page 56640). They impose a number of
restrictions on what the regioning parameters of a source can be
depending on the source and destination datatypes as well as the
alignment of the destination. The tricky cases are when the
destination stride is smaller than 32 bits and the source stride is at
least 32 bits, since such cases require the destination and source
offsets to be in agreement based on an equation determined by the
source and destination strides. The second source of instructions
with multiple sources is even more restricted, and due to the
existence of hardware bug HSDES#16012383669 it basically requires the
source data to be packed in the GRF if the destination stride isn't
dword-aligned.
In order to address those restrictions this patch leverages the
existing infrastructure from brw_fs_lower_regioning.cpp. The same
general approach can be used to handle this restriction we were using
to handle restrictions of the floating-point pipeline in previous
generations: Unsupported source regions are lowered by emitting an
additional copy before the instruction that shuffles the data in a way
that allows using a valid region in the original instruction. The
main difficulty that wasn't encountered in previous platforms is that
it is non-trivial to come up with a copy instruction that doesn't
break the regioning restrictions itself, since on previous platforms
we could just bitcast floating-point data and use integer copies in
order to implement arbitrary regioning, which is unfortunately no
longer a choice lacking a magic third pipeline able to do the
regioning modes the integer pipeline is no longer able to do.
The required_src_byte_stride() and required_src_byte_offset() helpers
introduced here try to calculate parameters for both regions that
avoid that situation, but it isn't always possible, and actually in
some cases that involve the second source of ALU instructions a chain
of multiple copy instructions will be required, so the
lower_instruction() routine needs to be applied recursively to the
instructions emitted to lower the original instruction.
XXX - Allow more flexible regioning for the second source of an
instruction if bug HSDES#16012383669 is fixed in a future
hardware platform.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>
The OPT macro will call validate() after each pass, so both cases
removed by this patch are just redundant calls. Will only affect
Debug builds since in Release builds validation is a no-op.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>
Use `a` and `b` (already identified as that in the output message)
instead of `f` and `s` for the two values being compared, since in
a later patch `s` will be used to hold the fs_visitor shader.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>
For integer types, the signedness is determined by flags on the muladd
instruction. The types of the sources play no role. Previously we were
using the signedness of the type and ignoring the mask.
Adjust the types passed to the dpas_intel intrinsic to match.
Fixes various
dEQP-VK.compute.*.cooperative_matrix.khr_*.matrixmuladd_cross.* tests on
different Intel platforms. Some platforms had failing tests, and some
platforms failed EU validation before the tests could fail.
Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>
Some places doing driver internal allocations was not setting
ANV_BO_ALLOC_INTERNAL, so adding the flag in those places here.
This will increase the accuracy of the RMV report.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28677>
If the format doesn't support the COLOR_ATTACHMENT or DEPTH_STENCIL
features, it can't be used as an input attachment.
Fixes future CTS tests: dEQP-VK.api.info.unsupported_image_usage.*
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28818>
I had to do this in a debugging session when I wrote some extra code
to debug sparse issues. I also have a pending patch that will require
this.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28792>
If an application tries to call
vkGetDeviceImageSparseMemoryRequirements() for an image that's not
supported by Sparse, then anv_image_init_from_create_info() will fail
and we will either hit the assertion in case of a debug build or just
pretend everything works in case of a release build. Properly return
no properties to signal the image is not supported.
The spec is not clear in specifying that this is what should be done
in this case, but this behavior should match the other
query-properties-from-sparse-images-we-didn't-create-yet functions
such as vkGetPhysicalDeviceSparseImageFormatProperties().
No known application outside my computer is tripping on this failure.
I discovered it when writing my own micro test cases for MSAA sparse.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28792>
While it lived inside anv_batch_chain.c it made sense to call this
function xe_exec_fill_sync() because, well, the function was used to
fill the sync objects for the xe execbuf ioctl. Now that the function
is exported to the .h file and accessible to the rest of the driver,
let's give it a name that reflects what it does instead of what it was
used for when it was static: call it vk_sync_to_drm_xe_sync() because
it converts a vk_sync to a drm_xe_sync.
Also let's bikeshed the implementation so that it returns the struct
it builds: this should make callers cleaner and easier to understand.
No functional changes, only bikeshedding.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28792>
I had accidentally hardcoded an alternative implementation in
xe_vm_bind_op().
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28792>
Everything on our xe.ko backend is built on top of this assertion.
Assert it during driver initialization and just don't bother with it
anymore later.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28792>
The reason we made the vm_bind ioctl return DEVICE_LOST on error is
that we thought there wasn't anything we could do if the ioctl failed.
Thomas Hellstrom pointed us that in case the system is under memory
pressure ENOMEM will be returned and there are things we can try to do
to make it work: either free memory or do fewer bind operations per
ioctl. For now let's just return the appropriate error for the case
(OUT_OF_HOST_MEMORY) so that maybe applications can try to something
about it. In the next patch we'll implement an additional strategy to
try to make things work.
Due to an unrleated failure, our CI system has also pointed out that
we were submitting invalid arguments, so we were getting an EINVAL.
Although we fixed that, these situations really shouldn't happen and
they should be easy to detect, so just put an assertion there, which
will make it easier for us developers to spot any issues.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28792>
The xe_vm_bind_op() function is already way too big for my tiny little
brain. Make it 42 lines shorter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28792>
This is needed so we can run i915 or xe function depending on
the error dump.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28720>
More code tha now is used by aubinator_error_decode but in next
patches will also be used by error2hangdump.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28720>
This was re-implemented in several places, so lets share it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28720>
This functions are now used by aubinator_error_decode but will
also be used by error2hangdump tool.
More functions will be moved in the next patches.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28720>
For HSD 22015614752, we enabled Tile64 to ensure image subresources are
64K aligned. However, we neglected to disable miptails so some image
miplevels actually did not achieve the desired image alignment. Do that
now.
Fixes: c6686fda28 ("intel/isl: Use Tile64 to align images for CCS WA")
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28703>