image_aspect_to_binding() converts aspect to index by subrracting
VK_IMAGE_ASPECT_MEMORY_PLANE_0_BIT_EXT, however these enum values
are bitfields, not consecutive numbers, so comparing and subtracting
them won't work.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20269>
(cherry picked from commit 7642f3b99c)
I did not properly understood that we cannot access the views written
to the descriptor sets because they might have been destroyed after
the write operation and the copy operation is allowed to copy what is
invalid data. The shader just can't access it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 03e1e19246 ("anv: Refactor descriptor copy")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20222>
(cherry picked from commit a0991c7c79)
The first instruction of any kernel should have non-zero emask. This
restriction needs to be obeyed to avoid GPU hangs.
Patch adds a function to insert dummy mov as first instruction
to make sure this requirement is fulfilled.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20194>
(cherry picked from commit bc4b7de0d0)
This is trying to clear a bit in the control register. However, it's
executing with whatever channel mask happens to be active. Typically
this is the one at the start of the program, so at least some channels
will be active. Typically the first channel will be active due to
packed dispatch, but that's not always guaranteed. Without NoMask,
the float controls writes may randomly not happen.
Recent GPUs also seem to have a hang issue when the first instruction in
the shader doesn't have any active channels. Having an instruction with
NoMask at the start of the program works around the issue. See HSD bug
14017989577. In our case, the float controls preamble was breaking that
restriction every time, causing us to run into this problem frequently.
Thanks to Tapani Pälli for finding this hang issue, and Francisco
Jerez and Lionel Landwerlin for helping pinpoint this issue during
review of a workaround patch in !20194.
Fixes GPU hangs in Elder Scrolls Online, Witcher 3, and likely more.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7639
Fixes: 9da56ffc52 ("i965/fs: add emit_shader_float_controls_execution_mode() and aux functions")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20214>
(cherry picked from commit bafbe7c23a)
On cmd_buffer_emit_scissor(), if VkViewport height or width are set to
a value lower than 1.0, y_max or x_max can be attributed negative values,
causing an overflow. That leads to ScissorRectangleYMax or
ScissorRectangleXMax to be set to values on an unsupported range.
Clamping x_max and y_max in the valid range solves the problem.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7471
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20200>
(cherry picked from commit 2e775b8bdb)
All data written by the user are offset by TUE header size.
Without this patch we copy the correct amount of user data, but both
"from" and "to" offsets are wrong.
Fixes: 37e78803d7 ("intel/compiler: use nir_lower_task_shader pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409>
(cherry picked from commit db0e6f9a07)
This reverts commit 768238fdc0 which
is not only leading to memory leaks, but also reportedly breaks KDE
pretty badly.
Fixes: #7674, #7435
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19972>
(cherry picked from commit 0cee008fee)
Fixes a number of CTS patterns on DG2 :
- dEQP-VK.dynamic_rendering.primary_cmd_buff.random*
- dEQP-VK.draw.*secondary_cmd*
- dEQP-VK.dynamic_rendering.*secondary_cmd*
- dEQP-VK.geometry.*secondary_cmd_buffer
- dEQP-VK.multiview.*secondary_cmd*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9c1c1888d9 ("intel/fs: put scratch surface in the surface state heap")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19946>
(cherry picked from commit 9bb055ff5d)
The initial implementation is a pretty big hammer. Implement the HW
recommendation to minimize cases in which we need a fence.
This improves by 10FPS on some of the Sascha Willems RT demos.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6031ad4bf6 ("intel/fs: Add Wa_22013689345")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19322>
(cherry picked from commit 945637514e)
We need to set CPS_MODE_NONE when no per coarse pixel dispatch.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 231651fd89 ("anv: implement VK_KHR_fragment_shading_rate")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19867>
(cherry picked from commit 507a86e131)
In 4ceaed7839 we made scratch surface state allocations part of the
internal heap (mapped to STATE_BASE_ADDRESS::SurfaceStateBaseAddress)
so that it doesn't uses slots in the application's expected 1M
descriptors (especially with vkd3d-proton).
But all our compiler code relies on BSS
(STATE_BASE_ADDRESS::BindlessSurfaceStateBaseAddress).
The additional issue is that there is only 26bits of surface offset
available in CS instruction (CFE_STATE, 3DSTATE_VS, etc...) for
scratch surfaces. So we need the drivers to put the scratch surfaces
in the first chunk of STATE_BASE_ADDRESS::SurfaceStateBaseAddress
(hence all the driver changes).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ceaed7839 ("anv: split internal surface states from descriptors")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727>
(cherry picked from commit 9c1c1888d9)
When we're not using queries, all the counters from the
MI_REPORT_PERF_COUNT are available. This is the case when using
perfetto with the global pps datasource that capture global counter
values.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8750f43a90 ("intel/perf: add performance query layout using MI_SRM")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893>
(cherry picked from commit 61fef1ed72)
This array of structure needs to be initialized to 0 as it contains a
bitset we don't explicitly clear.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3144bc1d33 ("intel/perf: move query_mask and location out of gen_perf_query_counter")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893>
(cherry picked from commit e754bf6be4)
__builtin_clz(value - 1) is undefined for with value=1 (because
__builtin_clz(0) is undefined).
Because we set rt_pipeline->stack_size = 1 when a ray tracing pipeline
doesn't need any stack allocation to differentiate from a dynamic size
(rt_pipeline->stack_size = 0) we can run into this undefinied behavior
issue.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: f68d64dac0 ("anv: Add support for vkCmdSetRayTracingPipelineStackSizeKHR")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19781>
(cherry picked from commit 440da44a84)
We don't need to login anymore, but we can't use plain minio commands
now. `ci-fairy` got a helper as `s3cp` to keep an almost identical
API.
Solved Conflicts:
.gitlab-ci/common/init-stage2.sh
.gitlab-ci/container/lava_build.sh
.gitlab-ci/prepare-artifacts.sh
src/amd/ci/traces-amd.yml
src/freedreno/ci/traces-freedreno.yml
src/gallium/frontends/lavapipe/ci/traces-lavapipe.yml
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
(cherry picked from commit 67cee534a8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19734>
The code builds up the dynamic array of objects (spirv_objs) and
collect pointers to each of them into another dynamic
array (spirv_ptr_objs).
If the growth of the first array cause a reallocation, it is
possible that the previous pointers end up invalid.
Fixes: 77e929a527 ("intel/clc: allow multiple CL files to be compiled together")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19730>
(cherry picked from commit 9fd1d47aa0)
Age of Empire IV generates a shader of ~2.3Mb on DG2 which is above
the limit we currently have.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19669>
(cherry picked from commit ae76bba34a)
On Intel HW we use the same mechanism for internal operations surfaces
as well as application surfaces (VkDescriptor).
This change splits the surface pool in 2, one part dedicated to
internal allocations, the other to application VkDescriptors.
To do so, the STATE_BASE_ADDRESS::SurfaceStateBaseAddress points to a
4Gb area, with the following layout :
- 1Gb of binding table pool
- 2Gb of internal surface states
- 1Gb of bindless surface states
That way any entry from the binding table can refer to both internal &
bindless surface states but none of the driver allocations interfere
with the allocation of the application.
Based off a change from Sviatoslav Peleshko.
v2: Allocate image view null surface state from bindless heap (Sviatoslav)
Removed debug stuff (Sviatoslav)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7110
Cc: mesa-stable
Tested-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19275>
(cherry picked from commit 4ceaed7839)
The back-end swizzles dwords so that our indirect scratch messages match
the memory layout of spill/fill messages for better cache coherency.
The swizzle happens at a DWORD granularity. If a read or write crosses
a DWORD boundary, the first bit will get correctly swizzled but whatever
piece lands in the next dword will not because the scatter instructions
assume sequential addresses for all bytes. For DWORD writes, this is
handled naturally as part of scalarizing. For smaller writes, we need
to be sure that a single write never escapes a dword.
Fixes: fd04f858b0 ("intel/nir: Don't try to emit vector load_scratch instructions")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7364
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580>
(cherry picked from commit 25c180b509)
Because dup_mem_intrinsic() retains the SSA offset from the original
intrinsic and only modifies it by adding a constant, we can compute the
alignment based on the original alignment and the constant offset. This
is both easier and more accurate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580>
(cherry picked from commit 85685cf932)
Don't copy propagate the constant in situations like
mov(8) g8<1>D 0x7fffffffD
mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W
On platforms that only have a 32x16 multiplier, this will result in
lowering the multiply to
mul(8) g15<1>D g14<8,8,1>D 0xffffUW
mul(8) g16<1>D g14<8,8,1>D 0x7fffUW
add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW
On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in
mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD
Volume 2a of the Skylake PRM says:
When multiplying a DW and any lower precision integer, the
DW operand must on src0.
See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104.
Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I
don't think it would be possible to create a situation where this could
occur. I discovered this via some optimizations that can determine that
the non-constant source must be able to fit in 16-bits. The case listed
above came from piglit's "ext_transform_feedback-order arrays points"
with those optimizations in place.
No shader-db or fossil-db changes on any Intel platform.
Fixes: de6c0f8487 ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
(cherry picked from commit db20412168)
Implement Wa_1508744258:
Disable RHWO by setting 0x7010[14] by default except during resolve
pass.
Disable the RCC RHWO optimization at all times except when resolving
single sampled color surfaces.
v2: Move stalling to genX(cmd_buffer_apply_pipe_flushes) for clarity (Mark)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19450>
(cherry picked from commit ba0336ab3f)
Documentation is worded in a confusing way, which may be understood that
we don't have to set this field to get good results.
MESH part of this commit improves performance of vk_meshlet_cadscene
by a factor of 2 on A380.
Fixes: ef04caea9b ("anv: Implement Mesh Shading pipeline")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19412>
nir_opt_large_constants balks at seeing a store_deref of a variable
where the source is a vecN operation of multiple load_consts, and thinks
that isn't a constant, so it should not bother promoting it.
Unfortunately, we were running nir_lower_load_const_to_scalar before
nir_opt_large_constants, so this prevented a ton of constant promotion.
This commit /used to help/ some shaders in shader-db. Presumably since
!16770 landed, those shaders were already helped. Currently ther are
no shader-db changes on any Intel platform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141998227 -> 141421756 (-0.4%)
Instructions helped: 12515
Instructions hurt: 237
SENDs in all programs: 7437925 -> 7468033 (+0.4%)
SENDs hurt: 12806
Cycles in all programs: 9161655753 -> 9132869800 (-0.3%)
Cycles helped: 10163
Cycles hurt: 2637
Spills in all programs: 19977 -> 18678 (-6.5%)
Spills helped: 384
Spills hurt: 40
Fills in all programs: 32863 -> 31396 (-4.5%)
Fills helped: 385
Fills hurt: 42
Lost: 1
Lots of Shadow of the Tomb Raider fragment shaders and Batman Arkham
Origins vertex shaders were hurt for SENDs in this commit. A couple
Aztec Ruins compute shaders and Spaceship shaders (multiple stages)
were also hurt.
All of the shaders hurt for spills or fills were Spaceship compute
shaders. Nearly all of the shaders helped were Shadow of the Tomb
Raider fragmenet shaders. One Spaceship shader was reall, REALLY helped:
Spills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 321 -> 13 (-96.0%)
Fills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 279 -> 21 (-92.5%)
Overall this seems like an improvement, but we may want to actually
run these few benchmarks before landing.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16539>