These generated sources uses older, less portable types such as ubyte,
ushort and uint. But we have stdint.h everywhere now, so let's use those
types instead.
To stay consistent, let's talk about UINT8 etc instead of UBYTE for the
entirety of the u_indices infrastructure.
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23853>
There are cases where there is a chain to an unused nir variable that get removed
by nir_opt_dce. This breaks our current linker as the variable can still be accessed
via nir_foreach_shader_in_variable(..) macro.
So lets call nir_remove_dead_variables(..) just before we setup our linking.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23673>
This patch implements the recommended flush/wait of AUX-TT invalidation
according to per command streamer (engine).
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23786>
In order to make sure RCS engine is idle, we need to add
DC flush + CS stall + Render target Cache flush + Depth Cache
on Gfx 12 and additional CCS cache flush on Gfx12.5.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23786>
This hardware hang workaround (PAL waMiscPopsMissedOverlap) is needed only
on some Vega chips, and only for 8 or more samples per pixel. It has a
significant performance cost (around 1.5x-2x in
nvpro-samples/vk_order_independent_transparency), so it should be precisely
configured when setting up Primitive Ordered Pixel Shading.
It was added in 47b780be21, when POPS was not
used in Mesa, with the change being described as "this may not be needed
yet, but let's set it now".
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
These run nir_validate in debug builds, which will avoid bugs slipping in. It's
not enough that llvmpipe doesn't mind illegal NIR, these passes are well within
their rights to fail spectacularly if the NIR wouldn't validate. So validate so
we catch issues early.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23804>
GLSL booleans (and hence bool derefs) may be translated either as 1-bit or
32-bit NIR registers, depending whether the backend uses nir_lower_bool_to_int32
or not. Add a knob for this and choose the right type for different backends.
Fixes nir_validate failure on
dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_bvec3 run under
lavapipe. That test indexes into a bvec3 array, and gallivm first lowers bools
and then lowers derefs to registers, resulting in random 1-bit booleans mixed in
with 32-bit bools.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23804>
Wire up the CL_DEVICE_PROFILING_TIMER_RESOLUTION from the PIPE_CAP.
While here, also set CL_PLATFORM_HOST_TIMER_RESOLUTION to 1;
that's bogus since we're using the same value as for device, but
at this point we don't have a device to ask.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23639>
Use the get_timestamp as both the device_timestamp in
get_device_and_host_timer and host_timestamp in that
and get_host_timer.
Having eliminited most other clock sources, discussions
on previous versions have concluded it's best to use the
same timer as the 'host_timestamp' since the main requirements
are that it must be one that's a time seen by the device and
that it's very closely coupled.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23639>
uint isn't a standard type, just something we accidentally get from some
other headers.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23833>
Here, we want explicitly sized types, not just types that happen to be
of the right size.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23833>
In these cases we actually want uint32_t, because we're doing 32-bit
things to them.
The hwinfo-bit is only being used by i915, and should probably be
moved to i915 instead. But it shoukd *also* be converted, so let's do
that now.
While we're at it, fixup the bit-setting as well.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23833>
Applications can do their own caching, but are in any case required to
properly "compiler" the binaries via clBuildProgram or clCompileProgram +
clLinkPrograms.
In any case, there is no point building something if we already have the
result.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23847>
Without generating a proper timestamp for the disk cache, we pull old
binaries out of the disk cache, potentially being buggy or simply
outdated.
Once meson 1.2 lands we can easily pull in LLVM functions.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21612>
The problem is when we have DP2 or DP3 instruction that writes a w
channel like here:
DP3 temp[148].w, -temp[147].xyz_, temp[57].xyz_;
will get pair-converted to
src0.xyz = temp[147], src1.xyz = temp[57]
DP3, -src0.xyz, src1.xyz
DP3 temp[148].w, -src0._, src0._
where the alpha instruction is a basically just a replicate of the
result from the RGB sub intruction. However the destination register
index in the RBG slot is also 148. Now we pair-schedule and regalloc
src0.xyz = temp[13], src1.xyz = temp[3]
DP3, -src0.xyz, src1.xyz
DP3 temp[3].w, -src0._, src0._
We properly regalloc the alpha channel, but we obviously skip the rgb,
because the writemask is empty there. However when we emit the shader
later, we actually check the number of used regs based on the maximum
used register index and we don't consider the writemasks, so we would
think we use 149 temps. AFAIK the shader would be still completelly OK.
But we would think it hits the HW limits and used a dummy one instead.
Fix this by checking for empty writemasks when marking the registers as
used.
GAINED: shaders/glmark/1-22.shader_test FS
This is also needed to prevent another lost Trine shader from
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23838>
On Alchemist, the FF_MODE2 documentation says that we must set the
FF_MODE2 timer values for GS and HS to 224. The hardware performance
tuning guide also recommends setting the TDS timer to 4.
On Tigerlake, i915 applies workarounds to set the GS timer to 224
(failing to do so can cause HS/DS unit hangs), and the TDS timer to 4
(for performance). It doesn't currently apply a HS timer there, and
I'm not sure if it's strictly necessary, but given that Alchemist
needed it, and the other two settings matched, let's assume that it
ought to match as well.
Unfortunately, there has been a bug in the i915 workarounds
infrastructure for non-masked context registers where writing one
field of the register zeroes out all the others. So, I believe the
Tigerlake TDS timer value of 4 isn't being applied correctly there,
though the register is also not readable on that platform which
makes it hard to verify. So, this may also speed up tessellation.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9233
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23839>
This enables L3 partial write merging for a number of cases that seem
to be getting accidentally disabled by the kernel, which was causing a
serious performance bottleneck on DG2 and MTL platforms. The
"Compressible Partial Write Merge Enable", "Coherent Partial Write
Merge Enable" and "Cross-Tile Partial Write Merge Enable" bits in
L3SQCREG5 were expected to be enabled by default (and confusingly,
they even read off as enabled if you ran 'intel_reg read 0xb158' on an
idle system), but they are getting clobbered during 3D context
initialization by an i915 workaround.
Enabling L3 partial write merging of compressible surfaces in
particular seems to increase rendering fillrate by over 3x in some
cases (e.g. the
"VulkanFillRate/FillRateGPU/resolution:1[0-3]/format:*/blend:0"
fillrate-bound microbenchmarks). Significant improvements can also be
reproduced in most real-world workloads we've tested so far,
e.g. Counter Strike GO improves by ~11%, Shadow Of the Tomb Raider
improves by ~5.5%, and AztecRuins-VK improves by ~6.5% on DG2-512 --
Thanks a lot to Caleb Callaway for these figures. No regressions have
been observed so far.
Even though this patch might strike as surprisingly simple for such a
large payoff, it's the result of Felix DeGrood and I trying to
root-cause the rendering performance gap of DG2 on Linux vs Windows on
and off during the last year, and some of the OA statistics captured
by Felix early this month were greatly helpful for me to connect the
last few dots, so Felix deserves a big chunk of the credit for this
work.
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23783>
This makes PIPE_CAP_SURFACE_SAMPLE_COUNT do something, namely, explode
with lots of fireworks. We'll have to figure out what's wrong, but at
least now we aren't just not trying at all. Should not break anything as
long as PIPE_CAP_SURFACE_SAMPLE_COUNT is not flipped on.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
This reverts commit 9e67d3f237.
We do not, in fact, implement texture barriers. Texture barriers are
supposed to allow non-overlapping rendering feedback loops. We cannot
support that at non-tile boundaries when texture compression is enabled
without some kind of downgrade path or other special handling.
Fixes Emacs corruption on X/Glamor.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
Issue: H.264 high 10 profile is currently not supported, but is shown as
supported in vainfo.
Reason: Kernel reported capabilities for video encoder/decode doesn't
consider the actual profile (only using reduced profile).
Solution: Use kernel reported capabilities only for basic H.264/HEVC
profiles. Other profiles (e.g. 10 bits) should be checked based on HW.
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9242
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23824>
Cleanup of various fields with redundant information.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23597>
ACO will rely on this field instead of guessing
the stage internally.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23597>
If we consider clearing the group flag of a vec4 register that is used as
source for some instruction we have to take into account that the parent
of the register element may also be part of a group in the parent instruction.
In this case we must not clear the group flag.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9118
Fixes: f3415cb26a (r600/sfn: copy propagate register load chains)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23813>