This flush was introduced with the following commits:
8949d27bb8 ("anv: implement gen9 post sync pipe control workaround")
bcb611361b ("anv: implement gen12 post sync pipe control workaround")
The flush was unsued with the following commit:
e79e1ca304 ("intel: Drop Tigerlake revision 0 workarounds")
This prevents some extra pipecontrols caused by a following patch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>
Replicating what we do in genX_cmd_compute.c
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7ca5c84804 ("anv: add support for simple internal compute shaders")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>
16-bit SIMD8 sampler writeback messages come with a bit of padding in
them, requiring us to emit a LOAD_PAYLOAD to reorganize the data into
the padding-free format expected by NIR. Additionally, we may reduce
the response length on the sampler messages based on which components
of the (always vec4) NIR destination are actually in use. When we do
that, dest_size > read_size, and the trailing components are all empty
BAD_FILE registers, indicating the contents are undefined.
Unfortunately, we can't ignore those trailing components entirely.
In the past, we left them default-initialized, giving us a BAD_FILE
register with UD type (which didn't matter, since all sampler returns
were 32-bit). But with 16-bit, this was confusing the LOAD_PAYLOAD.
For example, writing RGB and skipping A (without sparse) would produce
read_size = 3 and dest_size = 4 and nir_dest[5] containing:
nir_dest[] = <R:hf, G:hf, B:hf, blank-A:ud, blank-sparse:ud>
We'd then call LOAD_PAYLOAD on the first 4 sources, causing it to see
3 HF's and a UD, and try to copy the full 32-bit value at the end,
instead of 16-bits of pad like we intended. This meant it would
overflow the destination register's size, triggering validation errors.
Thanks to Ian Romanick for noticing this, writing a test, and also
coming up with a nearly identical fix.
Fixes: 0116430d39 ("intel/brw: Handle 16-bit sampler return payloads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11617
References: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/152
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30529>
on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in
this pass is redundant. and would cause a problem since we'd then have to lower
the "is helper inv?" flag late. so just skip the extra lowering code.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>
We can achieve most of what brw_fs_opt_predicated_break() does with
simple peepholes at NIR -> BRW conversion time.
For predicated break and continue, we can simply look at an IF ... ENDIF
sequence after emitting it. If there's a single instruction between the
two, and it's a BREAK or CONTINUE, then we can move the predicate from
the IF onto the jump, and delete the IF/ENDIF. Because we haven't built
the CFG at this stage, we only need to remove them from the linked list
of instructions, which is trivial to do.
For the predicated while optimization, we can rely on the fact that we
already did the predicated break optimization, and simply look for a
predicated BREAK just before the WHILE. If so, we move the predicate
onto the WHILE, invert it, and remove the BREAK.
There are a few cases where this approach does a worse job than the old
one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks
containing break, and nir_trivialize_registers may decide it needs to
insert movs into those blocks. So, at NIR -> BRW time, we'll actually
emit some MOVs there, which might have been possible to copy propagate
out after later optimizations.
However, the fossil-db results show that it's still pretty competitive.
For instructions, 1017 shaders were helped (average -1.87 instructions),
while only 62 were hurt (average +2.19 instructions). In affected
shaders, it was -0.08% for instructions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
UBO loads with a non-indirect buffer index should be safe to perform
speculatively. With a direct offset, we may sometimes turn them into
push constants, at which point it's just reading a register with no
cost at all. Otherwise, we access them via messages that use surface
state, and automatically perform bounds checking. So we shouldn't have
any issues with reading out of bounds and page faulting, for example.
This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics,
so we can turn simple if's with loads on both sides to bcsels. In some
cases this can collapse a surprising amount of control flow, allowing
other optimizations to work better.
The i965 OpenGL driver used load_uniform intrinsics, which are allowed
in NIR's peephole select pass. But iris uses the Gallium NIR pass that
translates uniforms to loads from UBO 0, so we haven't been able to take
advantage of NIR's peephole select pass there. The backend pass was
still able to handle this to some extent, however.
fossil-db results on Alchemist:
Totals:
Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00%
Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00%
Send messages: 7416330 -> 7416261 (-0.00%)
Spill count: 52471 -> 52473 (+0.00%)
Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00%
Scratch Memory Size: 3197952 -> 3198976 (+0.03%)
Totals from 1848 (0.29% of 630003) affected shaders:
Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02%
Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03%
Send messages: 59829 -> 59760 (-0.12%)
Spill count: 3870 -> 3872 (+0.05%)
Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02%
Scratch Memory Size: 174080 -> 175104 (+0.59%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
INTEL_MEASURE currently does not support measuring events in
parallel from reused command buffers. When this case is detected,
warn user and disallow. Fixes observed segfaults in such apps.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30523>
If TraceRay() is called with the TerminateOnFirstHit flag, we need to
terminate the ray on the first confirmed intersection. This is handled
by the lowering of accept_ray_intersection and it's working fine for the
case of multiple instances of the intersection shader being called.
But if the shader calls reportIntersection() more than once, we were
handling them all and accepting the closest one regardless of the flag.
Check for the flag on every confirmed intersection and, if set, accept
it right there. The subsequent lowering will take care of terminating
handling the ray termination if necessary.
Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>
The current code is not handling the potential NULL pointer in
VkDebugUtilsObjectNameInfoEXT::pObjectName
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e1b9a6e4f3 ("anv: initial RMV support")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30516>
Like AHB, we don't know the layout for an image backed by gralloc
swapchain memory until bind when gralloc information is passed by the
platform.
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29850>
Refactor out shared code for the u_gralloc tiling query so it can also
be used by ahw and later anb resolves.
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29850>
Dota 2 and Counter-Strike 2 really want to be able to allocate memory
for both VkImages and VkBuffers from the same memory type. Xe2's
special compression-only memory type does not support buffers, which
makes these games crash. Disable CCS on these games as a workaround.
This is a temporary workaround as we're still working towards a
long-term solution (either by fixing the engine or finding a way
better expose our memory types).
Backport-to: 24.2
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11520
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11521
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30481>
These memory types are useless when CCS is disabled, don't leave them
there so they don't confuse applications.
Backport-to: 24.2
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30481>
Xe KMD will now report the different topology mask types based on the
type of the EU of running platform.
With this we don't need to divide the EU count by 2 in intel_perf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30127>
Allows a driver to declare indirect arguments for its tracepoints and
pass an address. u_trace will request a copy of the data which should
be implemented on the command processor.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
We want to reduce the buffer allocations for other type of data than
timestamps.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
We're about to add indirect arguments, having a better way to describe
arguments (as capture/storage) will be useful.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
Our validation code doesn't need to know which bytes are accessed. It
only needs to know which grfs were accessed by an element. This also
helps to easily handle the Xe2 register size change.
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
Previously this check would create a mask of the bytes used in the
grf, and then shift the mask. This worked well when there was 32 bytes
in the register because a 64-bit uint64_t could easily detect that
bytes were used in the next regiter. (The next register was the high
32-bits of the `access_mask` variable.)
With Xe2, the register size becomes 64 bytes, meaning this strategy
doesn't work. Instead of a mask, we can just check to see if more than
1 grfs are used during each loop iteration. (Suggested by Ken.) This
will make it easier to extend for Xe2 in a follow on commit.
Verified this with
dEQP-VK.subgroups.arithmetic.compute.subgroupexclusivemul_u64vec4_requiredsubgroupsize
on Xe2, which otherwise would cause the program to fail to validate
because it assumed a grf was 32 bytes.
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
Also add a stub for VkGetPhysicalDeviceVideoEncodeQualityLevelPropertiesKHR
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>
Anv supports VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BUFFER_OFFSET_BIT_KHR
and VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BYTES_WRITTEN_BIT_KHR.
Also add to handle the VK_QUERY_RESULT_WITH_STATUS_BIT_KHR flag.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>