In very large shaders, first_use_ip, last_use_ip, and even (register) nr
can overflow 16 bits. Increase the size of these fields. Some structure
components are rearranged to promote better packing.
Fixes: 2dad1e3abd ("i965/fs: Add pass to combine immediates.")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37482>
In very large shaders, first_use_ip, last_use_ip, and even (register) nr
can overflow 16 bits. Increase the size of these fields.
used_in_single_block is moved earlier in the structure to promote better
packing.
Fixes: 2dad1e3abd ("i965/fs: Add pass to combine immediates.")
Closes: #9489
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: @joostruis
Tested-by: @Snoucher
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37482>
Intel HW does not support separate destination and reference output pictures
when decoding AV1 video. The only exception is film grain, which the Vulkan
spec already includes a caveat for.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37351>
Use FD20 macro that will account for the implicit LSB zero value and is
already used for sources. For the new macro we need to use the entire
bit-range of the field (55-51), so remove the adjustments we used to
do prior to encoding and decoding.
Fixes assertion in vkpeak (https://github.com/nihui/vkpeak) when running
bf16 tests on BMG. And the code now will correctly apply the subreg_nr
to the destination, e.g. a mad(32) gets splitted into two pieces, the
generation would not fill out the upper-part of the register
```
mad(16) g13<1>BF g10<8,8,1>BF g12<8,8,1>BF g56<1,1,1>F { align1 1H A@5 };
-mad(16) g13<1>BF g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 };
+mad(16) g13.16<1>BF g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 };
```
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37236>
When CPU clock is the same with the authoritative trace clock (normally
default to CLOCK_BOOTTIME), perfetto drops the non-monotonic snapshots
to ensure validity of the global source clock in the resolution graph.
When they are different, the clocks are marked invalid and the rest of
the clock syncs will fail during trace processing.
There's no central daemon emitting consistent snapshots for
synchronization between CPU and GPU clocks on behalf of renderstages and
counters producers. The sequence-scoped clock (64 <= ID < 128) is unique
per producer + writer pair within the tracing session. So we can use
sequence-scoped clock for gpu clock whenever applicable, and fallback to
use global clock for dynamic minor allocated >= 192.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
In short, perfetto doesn't require the initial clock snapshot to be
earlier than the timestamp to be converted. So we don't have to do
complex handling for it.
With this change:
- renderstage event requires clock sync, so we'd only emit clock
snapshots on the traceq thread that handles the callbacks
- drops redundant sync_timestamp calls as well as sync_gpu_ts tracking
- no need to reset next_clock_sync_ns when tracing is disabled, since a
snapshot is always emitted right after the initial interned data emit
upon tracing start
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
The object name is part of the VkDebugUtilsObjectName event messages.
When the trace buffer is full and the ring buffer fill policy is chosen,
the debug obj events can be overwritten (lost), which is why we need the
RefreshSetDebugUtilsObjectNameEXT.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
This is all dead code since we weren't even seting the cap in iris/crocus!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
ideally we'd have no stage switching, but this is just a cleanup for now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
I see no point, we allocate for every shader stage anyway. This is a bit
simpler.
I'm not a fan of the brw_compiler singleton at all but torching that is not on
today's agenda. Flattening it a little bit very much is.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
Thanks to Nanley Chery for pointing out this possibility.
v2: Make it simpler (Nanley).
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37419>
When the auxiliary surface is handled by the hardware directly,
there's nothing to bind besides the main pixels, so we can allow
sparse without doing anything else. We can't do this in the exact same
way with DG2 (which has_flat_ccs) because it uses the
aux_state_tracking_buffer.
v2: Fix spelling (Nanley).
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37419>
We want to be a little more granular than just "aux surfaces are
completely incompatible with sparse!", so have each of
isl_surf_get_*_surf disable itself when sparse is used.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37419>
This seems to be equal assert with febe90e109 as we hit this when
launching Quake II RTX.
Fixes: 69b6b4cb28 ("anv: add shader instruction emission")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37429>
Having a list of all enabled/used extensions in meson allows us to get
rid of a lot of boilerplate in every bvh build shader.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35326>
Instead of hardcoding 4096-byte page size in bo mapping/unmapping logic,
use os_get_page_size() to determine the correct alignment for munmap()
offset adjustments and address assertions.
Signed-off-by: Zhou Qiankang <wszqkzqk@qq.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37389>
Avoid treating any warnings as errors in the third-party imgui code, and
use Wno-error=stringop-overflow for code in Mesa.
Suggested-by: @eric
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35853>
The following Vulkan CTS tests that emit massive shaders were
regressing after "intel/brw/xe3+: Select scheduler heuristic with best
trade-off between register pressure and latency.":
dEQP-VK.graphicsfuzz.cov-nested-loops-set-struct-data-verify-in-function
dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops
The reason is that they have so many nested loops that they cause the
performance analysis utilization estimates to overflow the 32-bit
floating-point variables used to calculate them, which causes our
throughput estimate to underflow and equal zero for those shaders,
which breaks the logic introduced in brw_allocate_registers() to
select the scheduling variant with highest throughput, since none of
the scheduling modes tried has better throughput than the initial
value equal to zero of "best_perf". Instead use -INFINITY as initial
value for "best_perf" so we always select a scheduling mode.
This should have been caught by CI but oddly the tests above are
showing up as "not run" on my last baseline runs, so this wasn't
flagged as a regression for me.
v2: Use -INFINITY instead of previous approach that used NaN (Ian).
Fixes: 531a34c7dd ("intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13884
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13885
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37322>
When sparse binding functions submit batches, they may modify the
exec_obj_index field of anv_bo structs. This field is used to ensure a
unique list of buffers is sent to the kernel (i915). Add a lock in these
functions to prevent multiple threads from modifying this field during
the batch submission process. To avoid creating a deadlock, also rework
the locking done in anv_queue_submit().
When playing the Monster Hunter Wilds Benchmark on a mesa build which
enables slab allocation of batch buffers (6f7a32ec92), this avoids a
sporadic assert failure:
nsterHunterWilds.exe:
../../src/intel/vulkan/i915/anv_batch_chain.c:489:
setup_execbuf_for_cmd_buffers:
Assertion `execbuf->bos[idx] == first_batch_bo_real' failed.
This issue was seemingly first introduced in 04bfe828db
("anv/sparse: allow sparse resouces to use TR-TT as its backend")
Backport-to: 25.2
Ref: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12582
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37307>
The Vulkan spec requires that access to the queue parameter be
externally synchronized for vkQueueSubmit(). So, each submit call to a
specific queue will have a unique ID.
Backport-to: 25.2
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37307>
This code cannot be reached, since we already checked for
`!valid_samples` and returned `VK_ERROR_FEATURE_NOT_PRESET` in that case
above, and have not altered `valid_samples` since.
Fixes: d5da6980d3 ("anv/sparse: don't support depth/stencil with sparse")
CID: 1662063
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37341>
Renamed brw_nir_trig_workarounds.py to brw_nir_workarounds.py to reflect
its expanded scope beyond just trignometric workarounds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36990>
Added a workaround for a known shader in Horizon Forbidden West that causes
visual corruption on Intel anv driver. The fix clamps fsqrt inputs using
fmax(x, 1e-12) to avoid invalid values. Integrated the workaround via
brw_nir_apply_sqrt_workarounds() and applied it conditionally in the Vulkan
pipeline based on the shader's BLAKE3 hash.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12555
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36990>
Move from a single ralloc allocation per instruction to contiguous
blocks of allocations. Still use ralloc for those large blocks.
Each ralloc allocation has at least 5 pointers of overhead, which would
be about a third of the current brw_inst, and get worse as we try to
pack brw_inst better.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
In Release build, goes from 72 to 64 bytes, and now fits
in a single cacheline.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
Now that all the other kinds were added, all transforms to SEND will
come from non-BASE kinds, so we don't need overallocate for BASE
instructions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
This kind of instruction doesn't have a special struct but
will still be always allocated so that it can be lowered to SEND.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
Fixed the types in brw_inst::bits so the struct is packed correctly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
Move all the SEND specific fields from brw_inst into brw_send_inst.
This new instruction kind will contain all variants of SENDs plus the
virtual opcodes that were already relying on those SEND fields.
Use the `as_send()` helper to go from a brw_inst into the brw_send_inst
when applicable. Some of the code was changed to use the brw_send_inst
type directly.
Until other kinds are added, all the instructions are allocated the same
amount of space as brw_send_inst. This ensures that all
brw_transform_inst() calls are still valid. This will change after
a few patches so that BASE instructions can use less memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
Prepare code for supporting subclasses of brw_inst for certain
specialized kinds of instructions. This will allow
- Move certain fields from brw_inst to the specialized one,
reducing its size and making it easy to understand what applies
to which instruction;
- Move certain control sources into the specialized inst type, which
currently take a full brw_reg to encode small integers. Reducing
the overall sources we walk and care also might help the code
in general.
Next commits will add the new instruction kinds.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>