The conversion from bit value to register file type is already done
by the brw_eu_inst_3src_a1_dst_reg_file in the FFC macro now, so doing it
again produced incorrect results.
Fixes: e7179232 ("intel/brw: Move encoding of Gfx11 3-src inside the inst helpers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13141
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35960>
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
The GLSL compiler always lowers inputs to temps for VS and GS, so exclude
them from driver support because the GLSL compiler will no longer do that
unconditionally. Thus, indirect VS and GS inputs are completely untested
and broken in a lot of drivers.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
Descriptor set layout lifetime can be shorter than what the
implementation requires. One example is :
* create descriptor set layout
* create graphics pipeline library
* destroy descriptor set layout
* link optimize library in a final pipeline
The last step might need the descriptor set layout information again.
We've so far worked around this by taking a reference on the
descriptor set layout in the pipelines. But we forgot that descriptor
set layouts have pointers to samplers (for immutable & embedded
samplers).
We could take a reference to samplers but that sucks for various
reasons :
- it consumes dynamic state heap space
- it could cause issues with capture-replay placement
So instead we copy the information from the samplers that might be
needed in cases like link optimization. This includes :
- ycbcr conversion state (used for NIR lowering)
- embedded sampler data (to recreate the sampler)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35955>
Create a hashing key on all samplers so we can just copy that anywhere
we need it. That key already contains the needed parameters for
embedded samplers, so the sha1 stuff can go away.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35955>
The executor build was failing randomly due to a missing dependency on
`idev_intel_dev`. This patch adds the required dependency to the
`meson.build` file to ensure consistent and reliable builds across
different configurations.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35928>
Aliased wsi image has to share the same private binding with the
original wsi image for memory consistency. If the private binding
exists, it needs to be released before being overridden.
Fixes: d85a9d658f ("anv/image: Call into WSI to create swapchain images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35893>
Wa_16018063123 is not a workaround that depends on stepping, so we
can use the INTEL_WA_16018063123_GFX_VER macro to reduce code generate
for non affected platforms.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35700>
Wa_16018063123 don't apply to video engine also video engine don't
support XY_FAST_COLOR_BLT.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Fixes: ec43c20182 ("anv: implement dummy blit for Wa_16018063123")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35700>
For 3D or GPGPU modes the same render engine should be used, CCS
register should only be used when using compute engine.
Fixes: 46f5359238 ("anv: Invalidate aux map for copy/video engine")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35700>
Otherwise the comparison will always be false for protected content.
Also remove extra setting of the protected bit that was happening later.
Fixes: 8d9cc6aa23 ("anv: properly flag image/imageviews for ISL protection")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35870>
We simplify the implementation by assuming the worse case, copying
entire per-vertex regions if necessary.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
The formula uses scalar indices (4bytes), not slots (16bytes).
We also incorrectly passed a scalar (vertex case) & slot (mesh case)
offset in the push constants. Use slots instead so that the value is
smaller and we can pack more stuff into fs_msaa_flags.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 18bbcf9a63 ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
load intrinsics don't have range
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 18bbcf9a63 ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
Consider an innocuous instruction like:
and(1) v250:UD, g0.3<0,1,0>:UD, 4294967264u NoMask group0
If register allocation decides to spill v250, it will see this
instruction and say, "Oh no! The other components of v250 aren't set, so
I'd better add a fill before that instruction!"
But it gets even worse than that... if register coalesce decided to
merge two of these, the live range gets massively extended because the
writes don't fully initialize the value. This causes the need to spill
these registers in the first place.
Changing that instruction to SIMD16 on Xe2 or SIMD8 on other platforms
alleviates these issues.
shader-db:
Lunar Lake
total instructions in shared programs: 17118324 -> 17113191 (-0.03%)
instructions in affected programs: 93701 -> 88568 (-5.48%)
helped: 42 / HURT: 6
total cycles in shared programs: 895422566 -> 895079488 (-0.04%)
cycles in affected programs: 30111338 -> 29768260 (-1.14%)
helped: 35 / HURT: 40
total spills in shared programs: 3588 -> 3304 (-7.92%)
spills in affected programs: 285 -> 1 (-99.65%)
helped: 10 / HURT: 0
total fills in shared programs: 2218 -> 1663 (-25.02%)
fills in affected programs: 556 -> 1 (-99.82%)
helped: 10 / HURT: 0
Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 20059218 -> 20053563 (-0.03%)
instructions in affected programs: 96938 -> 91283 (-5.83%)
helped: 43 / HURT: 6
total cycles in shared programs: 884174588 -> 883536475 (-0.07%)
cycles in affected programs: 22105268 -> 21467155 (-2.89%)
helped: 35 / HURT: 27
total spills in shared programs: 5032 -> 4679 (-7.02%)
spills in affected programs: 355 -> 2 (-99.44%)
helped: 12 / HURT: 0
total fills in shared programs: 4782 -> 4113 (-13.99%)
fills in affected programs: 671 -> 2 (-99.70%)
helped: 12 / HURT: 0
Skylake
total instructions in shared programs: 19097658 -> 19097665 (<.01%)
instructions in affected programs: 14202 -> 14209 (0.05%)
helped: 0 / HURT: 5
total cycles in shared programs: 862058109 -> 862058267 (<.01%)
cycles in affected programs: 3450244 -> 3450402 (<.01%)
helped: 7 / HURT: 11
fossil-db:
Lunar Lake
Totals:
Cycle count: 31439652246 -> 31439652272 (+0.00%)
Totals from 2 (0.00% of 707091) affected shaders:
Cycle count: 2602 -> 2628 (+1.00%)
No other Intel platforms had any fossil-db changes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35721>
Assembler supports them, so allow them on @-macro lines. For now
we don't bother with multiline comments, if becomes a thing we
can add them later.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35699>
After commit 88309a9818, SFID names were renamed
- "dp data 1" became "hdc1"
- "thread_spawner" became "ts/btd"
Update macros in executor to use the new SFID names so the
generated assembly can be parsed correctly.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35701>
We can't actually enable MSAA for images with sample count 1, and
without MSAA active, the sample location machinery does not get used.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35504>
This generator scripts uses the `write` function that, unlike `print`,
doesn't print a trailing newline. So let's add one to the template.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35697>
Currently anv exposes this on lots of devices, with the intent to be better
than apps can give, but I think this is wrong for a couple of reasons.
Apps want to know if hw exposes the fast path, Vulkan is meant to be explicit,
and telling llama.cpp if the fast path exists lets it make smarter decisions.
It seems unless someone heavily optimises the slow path, that CPU is usually
faster than GPU with llama-bench unless the hw path exists.
v2: added INTEL_LOWER_DPAS support
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35564>
It is annoying to change all function signatures when a driver needs
more information. There are also some callbacks that have a lot of
parameters and there have already been bugs related to that.
This patch tries to clean the interface by adding a struct that contains
all information that might be relevant for the driver and passing that
to most callbacks.
radv changes are:
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
anv changes are:
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
turnip changes are:
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
vulkan runtime changes are:
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>