Loop peeling decrements the calculated trip count, which might
result in a known trip-count of 0 for single-iteration loops.
Thus, also unroll loops if max_trip_count == 0 and exact_trip_count_known.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39778>
Explain why the driver uses demote instead of an immediate jump to the
end of the shader for OpTerminate, noting that the jump approach showed
no performance gains.
Reference: !38381
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39703>
The entire array is always initialized to zero and never modified.
Cuts the size of brw_wm_prog_data by 32%.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>
On the ppc64le architecture error log fail to compile with error:
../src/virtio/vulkan/vn_renderer_virtgpu.c: In function ‘virtgpu_ioctl_map’:
../src/virtio/vulkan/vn_renderer_virtgpu.c:751:66: error: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 6 has type ‘__u64’ {aka ‘long unsigned int’} [-Werror=format=]
751 | "mmap failed: gpu_fd=%d, handle=%u, size=%zu, offset=%llu, err=%s",
| ~~~^
| |
| long long unsigned int
| %lu
752 | gpu->fd, gem_handle, size, args.offset, strerror(errno));
| ~~~~~~~~~~~
| |
| __u64 {aka long unsigned int}
cc1: some warnings being treated as errors
Parse the parameters to fix the failure.
Fixes: a49b7adad8 ("venus: add error log coverage for virtgpu backend")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39775>
Some parts of the logs can use $CI_JOB_STARTED_AT, while other parts use
the uptime (time since kernel boot) instead.
Printing both here allows syncing them and being able to tell how much
time actually elapsed between log lines.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39395>
Instead of asserting, let's simply not enumerate any configuration if
cooperative matrix is disabled. This can happen for example when
neither systolic nor software lowering is being used.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39728>
We don't need that much to make sure we're testing on the right driver,
and shortening it to only the "PowerVR" part means that it will work
with no modifications when the next pvr device is added to CI.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39787>
Lower booleans to their bit size before we lower ALU widths so the ALU
widths will use the final size of the boolean, not the bool size. But
also, if we do have a boolean vector (the order in which these passes
get called isn't as obvious as you'd like), we want to let it stay
max-width and then lower it after we've converted it to a wide boolean.
Otherwise, we'll end up scalarizing lots of boolean stuff we don't mean
to just because it hits the else case.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725>
This will get rid of ball_iequal and bany_inequal early instead of
waiting for it to happen as a side-effect of lowering ALU widths. This
also enables some funky lowerings for fall_equal and friends but we'll
never see those because they're only ever generated by the bool-to-float
pass.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725>
We've got vectorized comparison instructions going all the way back to
Bifrost. We should probably stop splitting them.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725>
There might be cases under which we can make this work but they're
tricky at best. For now, don't even try.
Cc: mesa-stable
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725>
This is no longer necessary now that we implement them all. Also, it
was screwing up vectorization in some cases so it's better to just not
have it in the compile flow at all.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725>
Instead of blindly taking the first source, take the first source that
isn't a constant. That way we won't accidentally expand things to
32-bit just because a constant came first.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725>
There's a nice little comment here saying we use the same write mask (an
out of date term in NIR) and swizzle but we're no longer actually doing
that. Depending on nir_builder magic, we may actually generate a scalar
when we really want a vector. The fix is to use more builder helpers
and just eat the potential copy.
Fixes: 3180656bbc ("nir: don't use nir_build_alu() with incomplete sources")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39725>
the lowerings for e.g. f2f16_rtp have carefully written sequences using
Infinity. nir_opt_algebraic will stomp right through this. `feq x, inf`
without an exact flag is basically always a bug. Disable fast math here.
Fixes OpenCL CTS test_half on Iris.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39740>
The passed flags is always zero on the import paths:
- panfrost_bo_import
- panvk_AllocateMemory
- panvk_GetMemoryFdPropertiesKHR
Fixes: 1c7793ea0b ("panvk: Advertise a HOST_CACHED memory type if we have WC maps")
Tested-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39723>
nir_lower_multiview creates a loop that loops over the view mask and
uses indirect stores. Since we only support direct
store_per_view_output, we have to make sure these indirects are gone.
Lowering the IO vars to temporaries will replace them with indirects on
registers.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 6a1c8d3a0c ("ir3, freedreno, turnip: Lower io earlier")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14787
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39694>
When sampling a subpass input attachment when multiview is been used
the the view index is required.
This becomes an extra output from the vertex shader which is then
iterated into the fragment shader input where it is then used as the
array layer index when sampling the subpass input attachment.
However this extra output was not having its interpolation mode
configured correctly leading to incorrect instructions being added
causing the view index to always be zero and thus sampling the
subpass input attachment incorrectly.
Fix is to make sure the view index interpolation mode is set to flat.
Fix:
dEQP-VK.multiview.input_attachments.no_queries.1_2_4_8_16_32
dEQP-VK.multiview.input_attachments.no_queries.1_2_4_8
dEQP-VK.multiview.input_attachments.no_queries.15_15_15_15
dEQP-VK.multiview.input_attachments.no_queries.15
dEQP-VK.multiview.input_attachments.no_queries.5_10_5_10
dEQP-VK.multiview.input_attachments.no_queries.8_1_1_8
dEQP-VK.multiview.input_attachments.no_queries.8
dEQP-VK.multiview.input_attachments.no_queries.max_multi_view_view_count
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.1_2_4_8_16_32
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.1_2_4_8
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.15_15_15_15
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.15
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.5_10_5_10
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.8_1_1_8
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.8
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.max_multi_view_view_count
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39715>
Fix compiler error:
../src/freedreno/decode/cffdec.c:580:7: error: assigning to 'char *'
from 'const char *' discards qualifiers
[-Werror,-Wincompatible-pointer-types-discards-qualifiers]
580 | p = strstr(name, "CONST");
| ^ ~~~~~~~~~~~~~~~~~~~~~
glibc now provides C23-style type-generic string functions. strstr
returns const char * when passed a const char * argument. Update p
declaration to const since it's only used for offset calculation.
Fixes: 1ea4ef0d3b ("freedreno: slurp in decode tools")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39770>