color varyings must be properly annoted, so they don't get interpolated
when the rasterizer is configured for flatshading. For whatever reason
the etnaviv NIR compiler failed to do so from its inception.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32922>
We have various clang-format issues around on some common code macros.
This should fix this in panvk at least
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32939>
In C, NDEBUG allows disabling the assert macro, let's follow this
behaviour.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32939>
This makes it stop leaking shader binary blobs definition and is
required for panfrost clc.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32939>
We emulate roundf and llroundf for compatibility.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32939>
This fixes issues with LLVM on OpenCL C failing to represent 128-bit
integers.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32939>
Same as 7ca01506c9 ("panvk: hack to improve depth clipping with
small viewport depth range") but applied to the JM backend.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32905>
vk_rasterization_state_depth_clip_enable() checks the clip and clamp
modes, not the cull mode. RS_DEPTH_CLIP_ENABLE got confused with
RS_CULL_MODE in 7ca01506c9 ("panvk: hack to improve depth clipping
with small viewport depth range").
Fixes: 7ca01506c9 ("panvk: hack to improve depth clipping with small viewport depth range")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32905>
The loop checking if exec is overwritten didn't check for NULL
instructions, and didn't fix up reg write indices after inserting
instructions.
Fixes: fcd94a8c ("aco: move try_optimize_branching_sequence() to postRA optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32746>
We need to add variants of these instructions, which are used with a shadow
samper and passed the shadow reference value via src2.
Fixes: abe5bd35 ("etnaviv: Switch to isa_assemble_instruction(..)")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32926>
We need to add a variant of the texld instruction, which is used with a shadow
samper and passed the shadow reference value via src2.
Blob generates such texld's for deqp's GLES3.functional.texture.shadow.2d.* (GC3000).
Fixes spec@arb_depth_texture@texdepth.
Fixes: abe5bd35 ("etnaviv: Switch to isa_assemble_instruction(..)")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32926>
Marek recently changed hole_size to be signed, rather than unsigned.
A negative hole_size means that the two loads overlap - and thus are
prime candidates to be combined.
My original hole_size handling was:
if hole_size > 4 * (8 - low->num_components) then don't vectorize
For non-overlapping loads, this worked: NIR's largest vector is vec16,
and if low was already a vec16, combining it with anything would exceed
that, so it'd never be considered. That meant low would always be a
vec8 or less, so (8 - low->num_components) was a positive number.
Now that we see overlapping loads, we can see a vec16 low, vec4 high,
and also a negative hole size, giving us fun comparisons like:
-16 > 4 * (8 - 16) => -16 > -32 => true, don't vectorize
Which is absolutely the wrong thing to do, because the high load's data
is entirely included within the former load's data.
The idea here was to make sure the second load would be able to pack at
least one component into the first's V8 result. But even this isn't the
best, because...even if it's simply adjacent, doing one V16 load is more
efficient than requesting two back to back V8 loads.
So, we just simplify down to a static check: if there's an entire V8 of
hole, don't vectorize. This already won't happen because the core pass
has max_hole set to 28 bytes (7 32-bit components), but that could
change based on the needs of other drivers, so let's be defensive.
fossil-db results on Alchemist:
Instrs: 161533978 -> 161295137 (-0.15%); split: -0.20%, +0.05%
Subgroup size: 8092544 -> 8092568 (+0.00%)
Send messages: 7915233 -> 7844503 (-0.89%); split: -0.94%, +0.05%
Cycle count: 16577700697 -> 16702609256 (+0.75%); split: -0.59%, +1.35%
Spill count: 72338 -> 67226 (-7.07%); split: -7.36%, +0.29%
Fill count: 134058 -> 125980 (-6.03%); split: -6.83%, +0.80%
Scratch Memory Size: 4092928 -> 3786752 (-7.48%); split: -7.53%, +0.05%
Max live registers: 33031460 -> 32945994 (-0.26%); split: -0.27%, +0.01%
Max dispatch width: 5778384 -> 5778536 (+0.00%); split: +0.26%, -0.26%
Non SSA regs after NIR: 179809505 -> 152735471 (-15.06%); split: -15.08%, +0.03%
Fixes: c21bc65ba7 ("nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32932>
3DSTATE_CPS_POINTERS is deprecated on PTL, so let's switch to
3DSTATE_COARSE_PIXEL to deliver CPS state as pipelined state.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32737>
This change adds CPS related new state instruction, structure and
enum.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32737>
When a SEND instruction is a EOT, the scoreboard lowering will not
allocate a new SBID for it, since nothing needs to wait for it. In
Gfx12 this allowed the SEND to get out-of-order $.dst or $.src
dependencies.
Starting on Xe2+ this is not supported anymore, in favor of supporting
more combined modes.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32712>
Now that the offset unit is correctly scaled depending on
the depth buffer format, this test can be expected to pass.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32756>
Currently we scale the polygon offset units with a fixed factor,
matching the MRD (minimal resolvable distance) for a 16bpp depth
buffer. This wastes a lot of precision when a 24bpp depth buffer
is used.
Apply the correct MRD scale, depending on the format of the
currently bound depth buffer.
Fixes piglit spec@!opengl 1.4@gl-1.4-polygon-offset.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32756>
We failed to set the wrmask of movmsk expanded from ballot.macro. This
caused legalization to miss the need for (ss) when a component other
than the first is used.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 1a78604d20 ("ir3: Add support for subgroup arithmetic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32918>
All places are fine with getting a false negative as long as buffer_wait
returns quickly. This can improve performance.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32877>
It sometimes takes 1 ms to return with timeout=0, which is unacceptable.
Fixes: 4194774edf - radeonsi: move barriers out of si_launch_grid_internal_ssbos
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32877>
This is similar to what link_intrastage_shaders is doing and it
fixes the following test:
KHR-Single-GL46.subgroups.builtin_var.compute.subgroupsize_compute
Which was failing with SPIRV but passing with GLSL, the diff being:
- SPIRV: "subgroup_size: 1"
- GLSL: "subgroup_size: 2"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32698>
The blitter VS expects coords to fit in a signed int16. When this
is not the case, use util_blitter_draw_rectangle instead.
Since util_blitter_draw_rectangle sets vertex elements, we need
to make sure they're properly restored.
The alternative to this fallback would be to pass coordinates
unpacked (so 4 SGPRs instead of 2), but this doesn't fix the
fbo-blit-check-limits test because of uv interpolation precision
issue.
Using 2 triangles instead of a rectangle + disabling
window_space_position helps but then this breaks some GLES3 tests,
like dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_x
(which doesn't pass either if u_blitter is used for all cases).
Using a single triangle covering the whole rectangles fixes all
cases but it then requires to setup scissors to not write too
much pixels...
So, instead of adding so much complexity, let's use u_blitter
for the "large coordinates" fallback, and keep the rectangle blit
for the other cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32698>
GFX11 allowed only one swizzle mode for the VRS image but GFX12 allows
all 2D non-linear swizzle modes and PC_SC_VRS_INFO needs to be
configured.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32914>
If the format does not support COLOR_ATTACHMENT or DEPTH_STENCIL
features then it can't be used as an input attachment.
Fixes dEQP-VK.api.info.unsupported_image_usage.*.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32790>
While this is strictly not necessary, it fixes an apparent false
positive issue about reading garbage value detected by static analyzer.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32819>
This is actually a false positive detected by static analyzer, because
it assumes that `device->instance->meta_cache_enabled` can change
between two execution points.
In order to instruct static analyzer this is not the case, we assing it
to a local variable, and do the checks based on that.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32819>
Declaring a variable-length array (VLA) based on a variable that can be
0 is declared dangerous.
In this case, the variable can't take value 0, so adding an assertion
fixes the issue.
This was detected by static analyzer.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32819>
While making the swapping code generic, the swap-back path was left as
is causing the wrong sources to be swapped.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 00656526d8 ("ir3/cp: extract common src swapping code")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32920>
It's empty now, so we don't need to include it from the packer headers.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32899>
This is where those macros are used, and those are the last two
definitions preventing us from dropping panfrost-job.h.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32899>
Move MALI_EXTRACT_INDEX to pan_format.h where all format-related macros
live and kill the unused MALI_EXTRACT_TYPE and MALI_FORMAT_COMPRESSED
macros.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32899>