The debug archive files are regular tar files, so can be
inspected by tar, and also used direct by file managers and editors.
However a few common tasks are worth having already set up in the
repository.
This tool adds convenience to some of those tasks, including
- Print last version of a shader representation;
- Print a `git-log`-like view of the changes of a shader;
- Comparing two shaders, e.g. SIMD8 and SIMD16 shaders in
Intel;
- Comparing two specific versions of any shaders.
See the "manual" inside the commit for more details.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29146>
This doesn't replace existing support for INTEL_DEBUG=shaders -- so both
`shaders` and `mda` can be used.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29146>
Instead of dumping multiple files with the optimizer passes, write a single
archive file with all the contents. The actual file is created
by the drivers, so later commits will actually enable the feature in
anv and iris.
This removes the use of INTEL_DEBUG=optimizer (and the corresponding
enum value) in brw. That environment variable is still used by ELK --
which currently doesn't support mda.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29146>
Uses the tar format to collect multiple output files. It can
be inspected using the regular UNIX tools, but a later commit
will add a specialized tool to perform common tasks.
The tar implementation is enough to fulfill the current needs
without adding a dependency. There's also a small test mostly
to ensure scaffolding is there in case we need to expand the
implementation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29146>
If the FS has writes to multiple color outputs, but there are not enough
color attachments for them all, we may optimize out the exceeding ones.
With VK_KHR_dynamic_rendering_local_read, we were not respecting the
mapping from output to attachment set by the application, and the wrong
writes were getting eliminated.
Fixes future CTS tests: dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.local_read.remap_single_attachment*
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37531>
The arrays is first memset to OUTPUT_DISABLED, but if we iterate over
MAX_RTS instead of the actual attachment count, we end up resetting any
values not set by the application to the, probably identity, that comes
from the state.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37531>
srnd_edge_cases.lua is checking edge cases.
srnd_randomized.lua is shared by Caio and it serves as a good example for
understanding the randomness and probability of rounding.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36529>
Based on git history thhese appears to be a subset of
`anv_batch_emit_batch`, so I've structured the code similarly, if
`anv_batch_emit_dwords` returns `nullptr`, we just move on without
copying the memory.
CID: 1665339
CID: 1664814
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37534>
For 3-component RGB images with OPTIMAL tiling, we need to create the
surface as RGBX or RGBA. When a host image copy to/from this image
happens, we calculate sizes and offsets based on the 4-component surface
and blow past the end of the 3-component API provided buffer.
Hilarity^WSegfault ensues.
Ideally we'd calculate the right sizes and have the tiled copy functions
handle the conversion, but they are format unaware and expect to just
copy bytes in blocks of equal sizes from both sides.
Handle this case by making an intermediate copy to/from linear RGB
from/to linear RGBX, and pass that intermediate slice to the tiled copy
functions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36453>
Per VUID-VkCopyImageToImageInfo-srcImage-09069,
srcImage and dstImage must have been created with identical image
creation parameters, so we are not going to have copies from color <->
depth/stencil, but we can copy both D/S aspects of an image at the same
time.
Nothing says that we can't copy from one plane of a multiplanar image to
another, so handle that case too (though nothing is currently testing
it).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36453>
Only if required. I somehow misunderstood that those would need to be
independent too, not just the vertex slots.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8dee4813b0 ("brw: add ability to compute VUE map for separate tcs/tes")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37251>
We had twice surface/sampler sources for no good reason, just add a
boolean to tell whether they are bindless or not.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37527>
One advantage here of moving a bunch of stuff to NIR is that we can
now have consistent payload types straight from the NIR conversion to
BRW.
This massively simplifies the BRW lowering code and avoids type errors
that are quite common to make in the backend.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37527>
In very large shaders, first_use_ip, last_use_ip, and even (register) nr
can overflow 16 bits. Increase the size of these fields. Some structure
components are rearranged to promote better packing.
Fixes: 2dad1e3abd ("i965/fs: Add pass to combine immediates.")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37482>
In very large shaders, first_use_ip, last_use_ip, and even (register) nr
can overflow 16 bits. Increase the size of these fields.
used_in_single_block is moved earlier in the structure to promote better
packing.
Fixes: 2dad1e3abd ("i965/fs: Add pass to combine immediates.")
Closes: #9489
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: @joostruis
Tested-by: @Snoucher
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37482>
Intel HW does not support separate destination and reference output pictures
when decoding AV1 video. The only exception is film grain, which the Vulkan
spec already includes a caveat for.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37351>
Use FD20 macro that will account for the implicit LSB zero value and is
already used for sources. For the new macro we need to use the entire
bit-range of the field (55-51), so remove the adjustments we used to
do prior to encoding and decoding.
Fixes assertion in vkpeak (https://github.com/nihui/vkpeak) when running
bf16 tests on BMG. And the code now will correctly apply the subreg_nr
to the destination, e.g. a mad(32) gets splitted into two pieces, the
generation would not fill out the upper-part of the register
```
mad(16) g13<1>BF g10<8,8,1>BF g12<8,8,1>BF g56<1,1,1>F { align1 1H A@5 };
-mad(16) g13<1>BF g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 };
+mad(16) g13.16<1>BF g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 };
```
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37236>
When CPU clock is the same with the authoritative trace clock (normally
default to CLOCK_BOOTTIME), perfetto drops the non-monotonic snapshots
to ensure validity of the global source clock in the resolution graph.
When they are different, the clocks are marked invalid and the rest of
the clock syncs will fail during trace processing.
There's no central daemon emitting consistent snapshots for
synchronization between CPU and GPU clocks on behalf of renderstages and
counters producers. The sequence-scoped clock (64 <= ID < 128) is unique
per producer + writer pair within the tracing session. So we can use
sequence-scoped clock for gpu clock whenever applicable, and fallback to
use global clock for dynamic minor allocated >= 192.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
In short, perfetto doesn't require the initial clock snapshot to be
earlier than the timestamp to be converted. So we don't have to do
complex handling for it.
With this change:
- renderstage event requires clock sync, so we'd only emit clock
snapshots on the traceq thread that handles the callbacks
- drops redundant sync_timestamp calls as well as sync_gpu_ts tracking
- no need to reset next_clock_sync_ns when tracing is disabled, since a
snapshot is always emitted right after the initial interned data emit
upon tracing start
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
The object name is part of the VkDebugUtilsObjectName event messages.
When the trace buffer is full and the ring buffer fill policy is chosen,
the debug obj events can be overwritten (lost), which is why we need the
RefreshSetDebugUtilsObjectNameEXT.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
This is all dead code since we weren't even seting the cap in iris/crocus!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
ideally we'd have no stage switching, but this is just a cleanup for now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
I see no point, we allocate for every shader stage anyway. This is a bit
simpler.
I'm not a fan of the brw_compiler singleton at all but torching that is not on
today's agenda. Flattening it a little bit very much is.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>