Improves performance of Phasmophobia with the "Eye Adaptation" video
setting enabled on Arc B570 by about 9.5%.
fossil-db results on Battlemage:
Totals:
Instrs: 148797922 -> 148797865 (-0.00%)
Send messages: 7066341 -> 7066317 (-0.00%)
Cycle count: 21459978352 -> 21459975048 (-0.00%)
Totals from 8 (0.00% of 574410) affected shaders:
Instrs: 4633 -> 4576 (-1.23%)
Send messages: 479 -> 455 (-5.01%)
Cycle count: 611886 -> 608582 (-0.54%)
Observed to cut 15% of sends in a Phasmophobia shader, 8.3% in a Far Cry
New Dawn shader, 7% in a Borderlands 3 DX11 shader, and 3.4-3.7% of
sends in a few Witcher 3 and Dark Souls 3 shaders.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33504>
The issues preventing it to be enabled were fixed so now we can enable
it but we need also to enable workaround 16013994831 back again.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>
Description of this workaround are not clear but looking at Iris
implementation we need to emit all 3DSTATE_PUSH_CONSTANT_ALLOC_XS if
any 3DSTATE_PUSH_CONSTANT_ALLOC_XS is emitted.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>
As other video memories for AV1 are already allocated for the maximum
sizes, now it does the same for MV buffers too.
This fixes a bunch of artifacts of AV1 playing.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866>
Refactor the scale factors to highlight the 16-tile width requirement on
Tile4. The fast-clear simulator code associated with HSD 1407682962
also contains a 16-tile requirement for Tile4 surfaces (for the pitch).
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
According to HSD 1407682962 and the associated simulator code,
fast-clear performance can be affected by: image alignment, tiling,
dimensionality, and row pitch. Redescribe surfaces in order avoid
fast-clearing at a slower rate.
Also, benchmarking the main patch in the performance CI (hw=A750)
shows that some traces are helped significantly:
* TotalWarWarhammer3 +5.58% (n=2)
* Factorio +3.75% (n=1)
* TerminatorResistance +3.3% (n=2)
* Borderlands3 +3.23% (n=2)
We could additionally increase the alignment requirements of surfaces in
order to deterministically increase fast-clear performance. That's left
out of this patch in order to avoid any functional pitfalls that can
arise with increased memory consumption. As a result, performance will
continue to be affected by how ISL/drivers/apps configure main surface
memory alignments (directly or indirectly).
Thanks to Lionel Landwerlin for pointing me to the relevant simulator
code.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11168
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11418
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
Unify code which creates surfaces from buffers. The behavior is slightly
changed to use array layers to enable arrayed buffer clears (as needed).
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
18bbcf9a triggered the following building error in Android,
simple fix is to use ffsll() as it was done before 18bbcf9a
to process uint64_t generics argument.
Fixes the following building error:
FAILED: src/intel/compiler/libintel_compiler.a.p/brw_vue_map.c.o
...
../src/intel/compiler/brw_vue_map.c:120:37: error: implicit declaration of function 'ffsl' is invalid in C99 [-Werror,-Wimplicit-function-declaratio
n]
const int first_generic_output = ffsl(generics) - 1;
^
../src/intel/compiler/brw_vue_map.c:120:37: note: did you mean 'ffs'?
/home/utente/r-x86_kernel/bionic/libc/include/strings.h:72:5: note: 'ffs' declared here
int ffs(int __i) __INTRODUCED_IN_X86(18);
^
1 error generated.
Fixes: 18bbcf9a ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34915>
Gfx12.5 and later allow the use of two 16-bit immediate values in
integer MAD. Gfx11 and Gfx12 allow a single immediate for integer MAD,
but that is not helpful where.
v2: brw_reg_alloc::build_lane_offsets is only used on Gfx12.5+, so the
check around using integer MAD is unnecessary.
No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17119962 -> 17118441 (<.01%)
instructions in affected programs: 65398 -> 63877 (-2.33%)
helped: 32 / HURT: 0
total cycles in shared programs: 895433316 -> 895425578 (<.01%)
cycles in affected programs: 13437376 -> 13429638 (-0.06%)
helped: 30 / HURT: 2
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210052706 -> 209550074 (-0.24%)
Cycle count: 31486266412 -> 31436238696 (-0.16%); split: -0.16%, +0.00%
Totals from 7081 (1.00% of 707082) affected shaders:
Instrs: 16864614 -> 16361982 (-2.98%)
Cycle count: 6323185782 -> 6273158066 (-0.79%); split: -0.79%, +0.00%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
Re-associate the calculation. The current calcuation is
((lane + zero_or_8) << 2) + offset
The first addition is SIMD8, and the shift and second addition are
SIMD16. By switching to
((lane << 2) + offset) + zero_or_32
All operations are SIMD8.
The SHL operates directly on the UW 0x76543210UV value, and that
eliminates the MOV to expand the UW to UD.
v2: Switch to alternate method. Update for SIMD32 on Xe2.
No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17121519 -> 17119962 (<.01%)
instructions in affected programs: 73208 -> 71651 (-2.13%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 129 x̄: 43.25 x̃: 56
helped stats (rel) min: 0.05% max: 4.92% x̄: 2.50% x̃: 2.79%
95% mean confidence interval for instructions value: -56.02 -30.48
95% mean confidence interval for instructions %-change: -3.24% -1.75%
Instructions are helped.
total cycles in shared programs: 895450146 -> 895433316 (<.01%)
cycles in affected programs: 13709400 -> 13692570 (-0.12%)
helped: 31
HURT: 2
helped stats (abs) min: 26 max: 1654 x̄: 543.10 x̃: 672
helped stats (rel) min: <.01% max: 3.43% x̄: 0.43% x̃: 0.51%
HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -652.42 -367.58
95% mean confidence interval for cycles %-change: -0.61% -0.19%
Cycles are helped.
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210566294 -> 210052706 (-0.24%)
Cycle count: 31582309052 -> 31486266412 (-0.30%); split: -0.30%, +0.00%
Totals from 7091 (1.00% of 707082) affected shaders:
Instrs: 17408115 -> 16894527 (-2.95%)
Cycle count: 6443785290 -> 6347742650 (-1.49%); split: -1.49%, +0.00%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
The spec says:
Certain rendering commands can be executed conditionally based on a
value in buffer memory. These rendering commands are limited to
drawing commands, dispatching commands, and clearing attachments with
vkCmdClearAttachments within a conditional rendering block which is
defined by commands vkCmdBeginConditionalRenderingEXT and
vkCmdEndConditionalRenderingEXT. Other rendering commands remain
unaffected by conditional rendering.
It would seem that vkCmdTraceRays* are not covered by that.
Fixes new tests dEQP-VK.conditional_rendering.conditional_ignore.trace_rays*
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34864>
I always thought there was a massive issue with pipeline libraries &
mesh shaders. Indeed recent CTS tests have exposed a number of issues.
Some values delivered to the fragment shader are coming from different
places depending on whether the preceding shader is Mesh or not. For
example PrimitiveID is delivered in the per-primitive block in Mesh
pipelines whereas for other pipelines it's coming as a VUE slot (which
is per-vertex). Those are 2 different locations in the payload.
We have to find a layout for fragment shaders that is compatible with
everything. Leaving gaps here and there in the thread payload.
Fixes the following test pattern :
dEQP-VK.mesh_shader.ext.smoke.fast_lib.shared_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Mesh shaders have per vertex block in URB pretty much identical to the
VUE format. Let's just reuse that concept to do all of our layout in
the payload attribute registers. This will ensure that we have
consistent VUE layout between Mesh & non-Mesh pipelines.
We need a new way of laying out the VUE though as we have to
accomodate a HW constraint of maximum (per-primitive + per-vertex) of
32 varying. This means we cannot have 2 locations in the payload for
things like PrimitiveID which can come from either the per-primitive
or the per-vertex block. The new layout places the PrimitiveID at the
end of the per-vertex attributes and shrinks the delivery dynamically
if the mesh stage is active. The shader is compiled with a
MOV_INDIRECT to read the PrimitiveID from the right location in the
attributes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
In a case like this :
block_0:
%5 = ...
%6 = ...
block_1:
%7 = load_interpolated_input %5, %6
The current logic would move load_interpolated_input to block_0 before
%5 but not move %5 & %6 which are sources of that instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Moving it to is related functions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Will allow the driver to know if the vertices count is dynamic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Take the opportunity to reuse the backend pass.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
Already handled by having a special range created
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
This means the perfetto UI will have a non-zero/NULL handle/name in the UI
on these renderstages. Unfortunately, intel/ds is outside of vulkan so
unless we pull in anv headers, we can't just pass in the anv_cmd_buffer.
This also means it would be much more painful to pass the cmd buffer to
the rest of the events, so they'll still have unset command buffers.
Still, being able to see the name of the command buffer in at least one of
the events should be useful once that's glued together.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22350>
Some intrinsics are implemented by reading memory location that could
be rewritten by a further tracing calls. So we need to move those
reads prior to tracing operations in the shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8979
Tested-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34214>
Consider the flag from PPS when setting tc/beta offset.
This fixes some artifacts when decoding a hevc video,
hevc_scaling_list4.mkv from Lynne.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34782>