Previously we had special ops doing data model breaking things on GPRs. But
there's no real reason for that, we can calculate lane IDs as UGPR vectors
within the Jay data model just fine. Adjust jay_ir/jay_validate to define packed
16-bit UGPR vectors, giving them the natural semantics, then use that to
calculate lane IDs, peeling back all the hacks we added along the way.
This also unfortunately pessimizes inverse_ballot() but only in a corner case
that could be revisited later. Stats are net positive.
In addition to the code clean up, this has 3 other benefits:
* Now that we can rematerialize the lane ID code anywhere we want, we could
theoretically reduce register pressure in some scenarios. Stats show this
doesn't help in the current implementation, though.
* Now that we can calculate lane IDs in control flow, the issues with divergent
function calls all go away. (Well, the lane ID issue. There are other issues.)
* Now that we use UGPRs for this, we don't need a stride=16 GRF in shaders that
don't actually use 16-bit math, meaning less shuffling from bad partitions.
That's reflected in the positive stats here.
SIMD16:
Totals from 1643 (62.07% of 2647) affected shaders:
Instrs: 2227750 -> 2221032 (-0.30%); split: -0.44%, +0.14%
CodeSize: 33138416 -> 33034224 (-0.31%); split: -0.52%, +0.20%
SIMD32:
Totals from 1643 (62.07% of 2647) affected shaders:
Instrs: 2864583 -> 2806217 (-2.04%); split: -2.22%, +0.19%
CodeSize: 43088064 -> 42171504 (-2.13%); split: -2.29%, +0.17%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
look at what the program actually does instead of hardcoding a worst-case.
SIMD16:
Totals from 1965 (74.23% of 2647) affected shaders:
Instrs: 2603230 -> 2539932 (-2.43%); split: -3.44%, +1.01%
CodeSize: 38826160 -> 37811904 (-2.61%); split: -3.59%, +0.97%
Number of spill instructions: 1206 -> 555 (-53.98%)
Number of fill instructions: 1194 -> 551 (-53.85%)
SIMD32:
Totals from 1974 (74.57% of 2647) affected shaders:
Instrs: 3998126 -> 3033333 (-24.13%); split: -24.18%, +0.05%
CodeSize: 59563952 -> 45580448 (-23.48%); split: -23.52%, +0.05%
Number of spill instructions: 43534 -> 37471 (-13.93%); split: -13.97%, +0.04%
Number of fill instructions: 43118 -> 36412 (-15.55%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
These queries need to be used for partitioning too. And also this degunks the
core RA logic in jay_register_allocate.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
panvk previously reported DRM format modifiers only through
VkDrmFormatModifierPropertiesListEXT.
Report them through VkDrmFormatModifierPropertiesList2EXT as well.
Cc: mesa-stable
Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41772>
VkFormatProperties uses VkFormatFeatureFlags, whose valid bits are limited to
VK_ALL_FORMAT_FEATURE_FLAG_BITS (0x7fffffffu).
Without this mask, the last bit leaks out.
Use vk_format_features2_to_features() helper when filling VkFormatProperties so
flags2-only bits are not leaked through legacy feature fields.
Cc: mesa-stable
Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41772>
This test has been marked as flaking on G925, but I've also seen it
flaking on G610 recently. Let's just move it to the common flake-file
instead. Also drop it from the fails-file on G925, as having it in both
isn't really needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40398>
This only fails sometimes, and it doesn't seem to take the whole system
down with it. Let's mark it as a flake instead of skipping it.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40398>
Skips accumulate over time, but rarely gets reevaluated to see if
they're still relevant. To combat this problem, I've dropped all skips,
and added back those that actually serve a practical use.
The result might be a bit more instability in the short term. But
hopefully this pays off in the long term.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40398>
These are known fails due to CTS bugs. Patches are on the way. Let's
skip them, like we do with the other ones in the same category.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40398>
When zink translates glCopySubImageData with a 2d image view of a 3d
image as the destination to a draw call the shader generated for the
load-op was not handling things correctly leading to the wrong
z-slice been loaded.
The fix is to mark which attachments within the load op are 2d image
views of a 3d image then when generating the load-op shader covert
the sample to 3d sample and update the coords to load the required
z-slice value from the tex meta data.
Fix for dEQP-GLES31.functional.copy_image.non_compressed.*_to_texture3d
Fixes: 7b28b6c43d ("pvr, pco: implement VK_EXT_image_2d_view_of_3d")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41955>
With a decode-only build, the si_init_gfx_context is stubbed and returns
false, which causes si_create_context to fail when a decoder is created
(since si_dec_init_decode sets the PIPE_CONTEXT_COMPUTE_ONLY flag).
With this change, the stubbed si_init_gfx_context function returns true,
which allows si_create_context to continue, and a decoder to be
successfully created.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41973>
Since shader support is not built when with_gfx_compute is false, libelf
is not needed.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41973>
The fragment state is stored just before ZS_CRC_EXTENSION, so move the
pointer accordingly.
Fixes: af35fc44a7 ("pan/desc: Implement pan_emit_fbd for v14+")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41643>
Implement the v14+ paths needed to copy IR framebuffer layer state and
re-emit it from the tiler OOM exception handler.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41643>
Postpone emitting the fragment layer state in the fragment job issue path
until just before RUN_FRAGMENT2 is emitted.
This state is loaded from FBD_POINTER which might change due to IR.
Therefore, postponing is required.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41643>
The cs_builder.h helper that records RUN_FRAGMENT[2] also records a flush,
so there's no need for flushing again before calling it.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41643>
This is implemented in common code in d8ef386f98 ("vulkan: add support
for VK_KHR_internally_synchronized_queues").
Passes dEQP-VK.synchronization2.internally_synchronized_queues.*
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41926>
This will be used for CmdDrawByteCountIndirect on v13, which requires
dividing the byte count by the vertex stride to get the number of
vertices in the draw.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41655>
All we really need for udiv32 is a 32x32->64 multiply, but the most
efficient way to implement that is to move the 32-bit reg into a 64-bit
reg anyway. So, I figured it simpler to just have the caller do that
than passing a scratch reg into the helper.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41655>
v13 adds some register add/sub instructions, and I'd like to use
cs_{add,sub}{32,64} for those to match the naming convention for other
reg/reg instructions. So the existing immediate functions are renamed to
cs_add_imm{32,64}, matching the name of the actual instruction.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41655>
Not sure if any workload uses this. This mostly allows us to document
the functionality of HSD 22011236099 on gfx20+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41712>
These platforms don't support CCS on MCS/HIZ/STC. There's nothing we can
do about this. So, stop warning about it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41712>