With that we can easily add a restriction to the not + flt -> fge
optimization to handle NaNs like it was done before.
Fixes: 51d8ca2dff ("r600/sfn: optimize comparison results")
v2: use SPDX license identifier (austriancoder)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37450>
Use FD20 macro that will account for the implicit LSB zero value and is
already used for sources. For the new macro we need to use the entire
bit-range of the field (55-51), so remove the adjustments we used to
do prior to encoding and decoding.
Fixes assertion in vkpeak (https://github.com/nihui/vkpeak) when running
bf16 tests on BMG. And the code now will correctly apply the subreg_nr
to the destination, e.g. a mad(32) gets splitted into two pieces, the
generation would not fill out the upper-part of the register
```
mad(16) g13<1>BF g10<8,8,1>BF g12<8,8,1>BF g56<1,1,1>F { align1 1H A@5 };
-mad(16) g13<1>BF g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 };
+mad(16) g13.16<1>BF g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 };
```
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37236>
When CPU clock is the same with the authoritative trace clock (normally
default to CLOCK_BOOTTIME), perfetto drops the non-monotonic snapshots
to ensure validity of the global source clock in the resolution graph.
When they are different, the clocks are marked invalid and the rest of
the clock syncs will fail during trace processing.
There's no central daemon emitting consistent snapshots for
synchronization between CPU and GPU clocks on behalf of renderstages and
counters producers. The sequence-scoped clock (64 <= ID < 128) is unique
per producer + writer pair within the tracing session. So we can use
sequence-scoped clock for gpu clock whenever applicable, and fallback to
use global clock for dynamic minor allocated >= 192.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
In short, perfetto doesn't require the initial clock snapshot to be
earlier than the timestamp to be converted. So we don't have to do
complex handling for it.
With this change:
- renderstage event requires clock sync, so we'd only emit clock
snapshots on the traceq thread that handles the callbacks
- drops redundant sync_timestamp calls as well as sync_gpu_ts tracking
- no need to reset next_clock_sync_ns when tracing is disabled, since a
snapshot is always emitted right after the initial interned data emit
upon tracing start
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
The object name is part of the VkDebugUtilsObjectName event messages.
When the trace buffer is full and the ring buffer fill policy is chosen,
the debug obj events can be overwritten (lost), which is why we need the
RefreshSetDebugUtilsObjectNameEXT.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
In the bi_emit_load_attr call site, we can use the imm_index value even
if the function returns false. The bifrost path handles this correctly.
Fixes: 652e1c2e13 ("pan/bi: Rework indices for attributes on Valhall")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37464>
Previously we were passing the original compile inputs, rather than the
variant-specific inputs. No actual bugs are caused by this because we
don't use the variant infrastructure for anything yet.
Fixes: ff9907927f ("panvk: Add basic infrastructure for shader variants")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37463>
This is all dead code since we weren't even seting the cap in iris/crocus!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
ideally we'd have no stage switching, but this is just a cleanup for now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
I see no point, we allocate for every shader stage anyway. This is a bit
simpler.
I'm not a fan of the brw_compiler singleton at all but torching that is not on
today's agenda. Flattening it a little bit very much is.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
We do not support VK_EXT_shader_object so far but vk_shader layer
depends on those values so we should fill them.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37452>
We now set fb_fetch_output and fb_fetch_output_coherent to be consistent
with nir_lower_io.
This has no impact in general unless some generic pass depends on those
infos.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37452>
We were not printing IO infos properly for those intrinsics.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37452>
The main challenge is handling tile status (TS) correctly. Full clears
simply mark tiles as "cleared" in TS metadata without touching pixels.
Scissored clears must first decompress existing TS tiles using the
current clear color, then apply the new color to the scissor region.
The implementation maintains the original surface clear color for TS
decompression operations while using the new color for actual clearing.
This prevents rendering artifacts when mixing BLT and 3D operations.
BLT engine operates directly with pixel positions and handles all TS
tile complexity automatically.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35956>
We should not count the last draw command stride padding in
the indirect buffer.
Fixes: 176740c26f4 ("mesa: implement mesh shader draw calls")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37392>
It does not look like our custom version has anything special to offer.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37421>
Advertise the extensions without VK_TIME_DOMAIN_DEVICE_KHR support is
not very useful.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37421>
Thanks to Nanley Chery for pointing out this possibility.
v2: Make it simpler (Nanley).
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37419>
When the auxiliary surface is handled by the hardware directly,
there's nothing to bind besides the main pixels, so we can allow
sparse without doing anything else. We can't do this in the exact same
way with DG2 (which has_flat_ccs) because it uses the
aux_state_tracking_buffer.
v2: Fix spelling (Nanley).
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37419>
We want to be a little more granular than just "aux surfaces are
completely incompatible with sparse!", so have each of
isl_surf_get_*_surf disable itself when sparse is used.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37419>
On r600 ternary operations can't use the fabs source modifier, so
converting "fadd(fabs(fmul(a, b), c)" to "ffma(fabs(a), fabs(b), c)"
adds one more instruction in the backend, hence avoid this.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37440>
Add late optimization to fuse f2i32 and fround_even operations into a
single f2i32_rtne instruction when the intermediate fround_even result
is only used once. This eliminates redundant rounding since f2i32_rtne
performs round-to-nearest-even conversion directly.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Tested-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37426>
Fixes this error during Shader.cpp build:
..\src\util/format/u_formats.h(33): fatal error C1083: Cannot open include file: 'util/format/u_format_gen.h': No such file or directory
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37316>
Again, instrs don't get freed as we go, so the linear gc context saves us
5 pointers per instr.
Fossil replay time for deadspace3 on a debugoptimized build -4.85258% +/-
3.04009% (n=10)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37316>
Since we don't free registers as we go, we can just allocate them in a
linear gc context that gets freed at ralloc destroy. Saves 5 pointers of
memory per register for the ralloc overhead.
Fossil replay time for deadspace3 on a debugoptimized build -4.30353% +/-
1.80078% (n=10).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37316>