Most compositors work with damage in the buffer local coordinate space.
This change spares the compositors some work converting the provided
INT32_MAX x INT32_MAX damage region to the buffer local coordinate
space. It has no significant performance impact, but it'd still be nice
to use wl_surface_damage_buffer() if possible.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34227>
Most compositors work with damage in the buffer local coordinate space.
This change spares the compositors some work converting the provided
INT32_MAX x INT32_MAX damage region to the buffer local coordinate
space. It has no significant performance impact, but it'd still be nice
to use wl_surface_damage_buffer() if possible.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34227>
Ensure s3_upload correctly validates the S3 folder url by requiring it
to end with /. This prevents wrong uploads to invalid paths, such as
file urls. Also improve the error message.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34255>
When running under DXVK or vkd3d, the texture coordinate rounding behavior
should match D3D expectations. On Adreno, this behavior can be toggled
through the SP_TP_MODE_CNTL register.
A driconf-based option is introduced to help set the relevant register flag
that enables this behavior.
This fixes the cause of test_sampler_rounding test case failure in vkd3d on
Turnip's side, but a small change in vkd3d is also required, so the test
failure expectation isn't removed yet.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33987>
Add additional bitfields for the A6XX_SP_TP_MODE_CNTL registers, ones that
we already use and the texcoord rounding mode bitfield that we'll need for
D3D-over-Vulkan implementations.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33987>
Previous code assumed that the caller of utrace_clone_init_builder would
fill some parameters of the builder config, but we were not. Instead,
initialize these from the csif props the same as all the other builder
instances.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Fixes: 3096cf2a5d ("panvk/csf: flush and process trace events for all cmdbufs")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34270>
This commit makes radv use vk_video_derive_h265_scaling_list, which properly
applies default scaling lists whenever they're needed. It also simplifies
update_h265_scaling function into a simple memcpy. The firmware interface
struct and Vulkan's StdVideoH265ScalingLists struct both have identical memory
layouts, so it's not neccessary divide it into multiple copies with offsets.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34096>
H265 specification defines default scaling lists to use whenever scaling lists
are not specified in neither sps nor pps. Currently drivers ignore this
requirement and set the lists to zero. This commits adds a helper function
vk_video_derive_h265_scaling_list (similar to its h264 counterpart) that
selects either sps or pps lists and falls back to default values if neither
were specified. The default values were taken from ITU-T H265 specification
(revision 8), section 7.4.5.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34096>
H264 specification defines this field to force usage of the default
scaling lists even if they are specified in ScalingList4x4 and
ScalingList8x8.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34096>
The intervening_saturating_copy test is removed. The defs version of the
pass does not handle this case. It should not occur often in practice
anyway. Copy propagation and brw_nir_opt_fsat should prevent this
scenario from happening.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 212677275 -> 212677278 (+0.00%)
Cycle count: 30466062848 -> 30466056040 (-0.00%)
Totals from 1 (0.00% of 706300) affected shaders:
Instrs: 1343 -> 1346 (+0.22%)
Cycle count: 411664 -> 404856 (-1.65%)
v2: Stop counting ip. The non-defs part of the pass was the only thing
that used it.
v3: Also delete "if (block != def->block) continue;" code. I noticed
this while working on some other changes to this function. It's the last
thing in the loop, so it's totally useless. Delete some other spurious
continues too.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
v2: Add support for WE_all instructions... this already just worked, so
I only had to delete the check and the FINISHME comment.
v3: Use logic more like def_analysis::update_for_reads to determine when
to not insert LOAD_REG instructions. Based on a suggestion by Ken.
v4: Eliminate "store" from all the names since STORE_REG does not exist
anymore. Fold insert_load_reg into brw_insert_load_reg. Elminate extra
call to s.def_analysis.require() after progress. Pull a loop-invariant
check out of the inst->srouces loop. Drop call to
brw_opt_split_virtual_grfs after lowering load_reg. All suggested by
Caio.
v5: Assert that LOAD_REG doesn't already exist in
brw_insert_load_reg. Update comment before fully_defines. Both
suggested by Caio.
v6: Don't explicitly special-case SHADER_OPCODE_MEMORY_STORE_LOGICAL.
Move the inst->dst.file != VGRF check earlier to avoid the loop over
sources. Both suggested by Ken. Move the call the brw_insert_load_reg
a little bit later, and explain why it's at that location. Suggested
by Caio.
v7: Many changes to the for-each-source loop in brw_insert_load_reg.
Removes incorrect multiplication of s.alloc.sizes with reg_unit. Adds
checks for matching SIMD size and NoMask in the search for pre-existing
LOAD_REG of same value.
v8: Add some unit tests. Suggested by Caio.
shader-db:
Lunar Lake
total instructions in shared programs: 16923237 -> 16921895 (<.01%)
instructions in affected programs: 450565 -> 449223 (-0.30%)
helped: 251 / HURT: 377
total cycles in shared programs: 910428418 -> 889920590 (-2.25%)
cycles in affected programs: 719248184 -> 698740356 (-2.85%)
helped: 9076 / HURT: 9082
total fills in shared programs: 2242 -> 2218 (-1.07%)
fills in affected programs: 116 -> 92 (-20.69%)
helped: 2 / HURT: 0
total sends in shared programs: 848635 -> 848421 (-0.03%)
sends in affected programs: 810 -> 596 (-26.42%)
helped: 10 / HURT: 0
LOST: 82
GAINED: 78
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19875784 -> 19871694 (-0.02%)
instructions in affected programs: 1050091 -> 1046001 (-0.39%)
helped: 251 / HURT: 2403
total cycles in shared programs: 905328238 -> 882446458 (-2.53%)
cycles in affected programs: 682736344 -> 659854564 (-3.35%)
helped: 7869 / HURT: 7911
total spills in shared programs: 5512 -> 5032 (-8.71%)
spills in affected programs: 1830 -> 1350 (-26.23%)
helped: 8 / HURT: 0
total fills in shared programs: 5648 -> 4782 (-15.33%)
fills in affected programs: 3312 -> 2446 (-26.15%)
helped: 8 / HURT: 0
total sends in shared programs: 1032942 -> 1032722 (-0.02%)
sends in affected programs: 572 -> 352 (-38.46%)
helped: 10 / HURT: 0
LOST: 138
GAINED: 53
Tiger Lake
total instructions in shared programs: 19711930 -> 19715591 (0.02%)
instructions in affected programs: 1040623 -> 1044284 (0.35%)
helped: 317 / HURT: 2474
total cycles in shared programs: 862988990 -> 860573870 (-0.28%)
cycles in affected programs: 612392461 -> 609977341 (-0.39%)
helped: 7447 / HURT: 7686
total sends in shared programs: 1034763 -> 1034555 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0
LOST: 56
GAINED: 143
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20545461 -> 20545220 (<.01%)
instructions in affected programs: 422405 -> 422164 (-0.06%)
helped: 180 / HURT: 459
total cycles in shared programs: 872697345 -> 866874523 (-0.67%)
cycles in affected programs: 573117917 -> 567295095 (-1.02%)
helped: 6783 / HURT: 6980
total spills in shared programs: 4335 -> 4336 (0.02%)
spills in affected programs: 90 -> 91 (1.11%)
helped: 1 / HURT: 2
total fills in shared programs: 4194 -> 4196 (0.05%)
fills in affected programs: 463 -> 465 (0.43%)
helped: 1 / HURT: 2
total sends in shared programs: 1079446 -> 1079238 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0
LOST: 117
GAINED: 37
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209708136 -> 209695617 (-0.01%); split: -0.02%, +0.01%
Send messages: 10927753 -> 10927640 (-0.00%)
Cycle count: 30540172048 -> 30427084732 (-0.37%); split: -0.99%, +0.62%
Spill count: 511621 -> 510932 (-0.13%); split: -0.22%, +0.08%
Fill count: 621166 -> 618440 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35574784 -> 35648512 (+0.21%); split: -0.06%, +0.26%
Max live registers: 65453860 -> 65453140 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 75374990 -> 35195764 (-53.31%)
Totals from 503284 (71.25% of 706391) affected shaders:
Instrs: 180203778 -> 180191259 (-0.01%); split: -0.02%, +0.01%
Send messages: 9699732 -> 9699619 (-0.00%)
Cycle count: 30080349592 -> 29967262276 (-0.38%); split: -1.01%, +0.63%
Spill count: 511584 -> 510895 (-0.13%); split: -0.22%, +0.08%
Fill count: 621120 -> 618394 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35443712 -> 35517440 (+0.21%); split: -0.06%, +0.27%
Max live registers: 52566092 -> 52565372 (-0.00%); split: -0.01%, +0.00%
Non SSA regs after NIR: 70110949 -> 29931723 (-57.31%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
This prevents assertion failures in brw_eu_emit in a later commit in
this MR. Even though they have not been previously observed, these
assertion failures could happen even without that commit.
No shader-db or fossil-db changes on any Intel platform.
Fixes: 04e1783278 ("brw: Call brw_fs_opt_algebraic less often")
v2: Add SHUFFLE. Suggested by Ken. Fixed indentation.
v3: Update BROADCAST exec_size after rebasing on "brw/build: Use SIMD8
temporaries in emit_uniformize".
v4: Explain why munging the exec_size is correct.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
The changes to try_copy_propagate will be removed later in the series.
v2: Fix up some comments to note that offset != 0 is allowed only when
stride == 0. Apply same offset=0 restriction in try_copy_propagate_def
too. Allow copy propagation if the source is either a def or
UNIFORM. Don't copy prop a load_reg through a non-def value.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
load_reg is something like load_payload except it has a single
source. It copies the entire source to the destination. Its purpose is
to convert a non-SSA VGRF into an SSA value. This copy is marked as
volatile so that it will act as a scheduling barrier.
v2: Fix some typos in the commit message. Eliminate the
brw_builder::LOAD_REG overload that returns a brw_inst*. This is
unlikely to ever be used. Add some checks to brw_validate. All
suggested by Caio.
v3: Force the source and destination types of the LOAD_REG to by
integer. This will (eventually) simplify the creating of unit tests for
the pass that adds LOAD_REG instructions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
v2: Move to immediately before the main optimization loop. Most
importantly, this is after the first call to DCE.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Non SSA regs after NIR: 237045283 -> 100183460 (-57.74%); split: -58.12%, +0.39%
Totals from 701423 (99.26% of 706657) affected shaders:
Non SSA regs after NIR: 236868848 -> 100007025 (-57.78%); split: -58.17%, +0.39%
Suggested-by: Ken
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
device-select-layer needs to obtain the display server's preferred
display device, and has so far relied on wl_drm for this. wl_drm is
superseded by linux-dmabuf with some Wayland servers having dropped
support for wl_drm entirely.
Implement linux-dmabuf as preferred mechanism for obtaining the main
device, with wl_drm support retained as a fallback for now.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34219>
Starting with sRGB meant we would refcount to -1 if an application
chooses PASS_THROUGH. Instead, just initialize with PASS_THROUGH so the
initial refcount of 0 reflects reality.
Previously, we would segfault if an application chose PASS_THROUGH at
swapchain initialization then switched to a color managed colorspace
later in the runtime, because we would increment refcount from -1 -> 0
and this would result in not creating a new color managed surface.
Fixes: 789507c99c ("vulkan/wsi: implement the Wayland color management protocol")
Signed-off-by: llyyr <llyyr.public@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34353>
Calling dri2_x11_setup_swap_interval with swap_available = false
sets the min/max/default swap interval values to zero.
EGL_MIN/MAX_SWAP_INTERVAL is always reported as 0 and the interval
value set by eglSwapInterval gets clamped to 0.
Set swap_available to true before calling dri2_x11_setup_swap_interval,
as was done before.
Fixes: c00701c83a ("egl/x11: unify swrast/kopper/dri3 paths a bit")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34235>
This field was never used for determining the number of outputs,
just for determining whether streamout was enabled, which makes
it unnecessary. We can use enabled_stream_buffers_mask for that.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
Shaders may have undefined output stores after nir_opt_varyings.
These must be optimized out, otherwise they hit an assertion.
Fixes: 17f6ab28cc
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
We need to enable these buffers regardless of whether or not the
shader actually writes any outputs to them, otherwise we break
XFB queries.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
We need to remember which streamout buffers and streams were enabled,
even if the shader doesn't actually write any outputs to them,
because the API requires that we count vertices created by this shader
towards queries against those streams.
That information can be gathered by nir_gather_xfb_info_with_varyings
from the original NIR I/O variables that we get from the frontend,
but it isn't included in any intrinsics so would be otherwise lost here.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34317>
On i686, where VK_USE_64_BIT_PTR_DEFINES is unset and Vulkan handles are
represented as 64-bit integers instead, the code used the wrong format
specifier, causing a build error.
Fixes: 7fb31361f4 ("Handle external fences in vkGetFenceStatus()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34124>
Per spec, logic operations between fragment values and color attachments
should be disabled when attachments are using float or sRGB formats.
Regardless of attachment's format, enabled logic operations should keep
blending disabled.
Fixes: dEQP-VK.pipeline.*.logic_op_na_formats.*
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34212>
Context flushes can be caused by all kinds of operations that aren't
obvious to a GL API user. As those are quite heavy-weight operations
it is nice to have some insight into how many of those are happening
per frame. Add a sw query to make this information easily accessible.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34350>
For the pReferenceSlots.slotIndex, the max
value should the maxDpbSlots which is
h264: 16 + 1
h265 : 15 + 2
av1: 7+2
Fixing SVA_CL1_E test vector in JVT-AVC_V1
fluster test suite.
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33094>
Since shared RA happens after creating merge sets, newly inserted
splits/collects did not have merge sets created for them. Fix this by
creating merge sets for new instructions after shared RA.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33319>
Shared RA might insert new defs to be handled by regular RA (e.g.,
shared spills). However, their interval offsets were not initialized
which caused their intervals to sometimes be mistakenly matched with
those containing offset 0. Fix this by calling index_merge_sets after
shared RA and modifying that function to only index new defs in that
case.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33319>
This command isn't supposed to be affected by conditional rendering.
This fixes new VKCTS coverage
dEQP-VK.conditional_rendering.conditional_ignore.resolve_image*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34338>
shpe is a bit of a special instruction: it's not really a terminator
(i.e., it does not perform a jump) but it does have to stay at the end
of its block. Up to now, we tried to enforce this by creating const
write barriers on shpe; the assumption being that everything that
happens in the preamble ends in a write to the const file so shpe stays
at the end. Alas, it turns out this is not true: things like sampler
prefetches do not write the const file and nothing was preventing those
from being scheduled after shpe.
Instead of trying to create even more barrier dependencies, fix this by
making shpe a terminator. Both sched and postsched treat terminators
specially to make sure they always stay at the end of their block.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34290>
Shader may have zero instructions and no prefetches but have inputs
that without modifications are used as output.
Fixed vkd3d test:
test_depth_bias_behaviour
Fixes: b0a98d3b13
("ir3: Detect empty fragment shaders")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34348>