Update the gallium driver's use of gpu_id to support 64 bit gpu_ids.
Reviewed-by: Marc Alcala Prieto <marc.alcalaprieto@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40921>
Some code in gallium was making assumptions of how the gpu_id is laid
out, which will not work for 64 bit gpu_ids.
Expose pan_prod_id and pan_rev from the model to collect this logic in a
single place.
Reviewed-by: Marc Alcala Prieto <marc.alcalaprieto@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40921>
remove_trivial_phi() mostly does nothing for non-array phis, but it
rewrites sources if their definining instruction are trivial phis.
In the case of trivial phis in the loop continue block (for loops with
divergent non-trivial continues), we might need to keep those if they
write a shared register, because the source of the trivial phi will not be
reachable from the loop header phi.
In this example, the predecessors of the continue block should be block2,
but the physical predecessors are block2 and block3, requiring a phi in
the continue block which will then be lowered by ir3_lower_shared_phis.
loop {
block1:
a = phi 0, b
if (divergent) {
block2:
b = a + 1
continue;
}
block3:
break;
}
Fixes RA validation error when compiling blackmythwukong/5645a84e669a6179
from radv_fossils.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40480>
With OpLd now having a predicate, we forgot to update legalize_ext_instr
to allow predicates for it.
We should really get ride of those functions but for now let's keep it
simple and sync the implementation to what SM20 backend have.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 9d90cbc314 ("nak: add input predicate to load_global_nv and OpLd")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40934>
Promote DBG() failure paths in enumerate_sysfs_metrics() to mesa_logw()
so users see why OA metrics are unavailable without needing INTEL_DEBUG.
Also log in PPS when no OA queries are available after initialization.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40898>
When observation_paranoid (xe) or perf_stream_paranoid (i915) prevents
unprivileged access to OA metrics, the existing code silently returns no
OA queries. PPS then fails with just a segfault.
This patch adds INTEL_PERF_FEATURE_OA_BLOCKED_BY_POLICY to
intel_perf_features, set by both KMD backends when the paranoid sysctl
exists but lacks sufficent privilage. PPS checks this flag immediately
after initialising intel_perf and returns an error before attempting
metric-set selection.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40898>
VkPipelineRenderingCreateInfo is only required in the fragment output
interface lib. For pre-rasterization shaders and fragment shader state
libs, only the view mask is used but it's optional.
If the attachments info isn't marked invalid merging renderpass info
during lib imports wouldn't work because it would assume that the first
lib has attachment info (eg. the pre-rasterization lib).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15241
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40870>
Instead of inferring constlen from the usage of const registers by
various instructions, we can calculate it directly from the const file
allocations. This greatly simplifies the calculation of constlen.
Note that the increase in constlen comes from a few binning variants.
This doesn't matter as the constlen of the corresponding non-binning
variant is used for those anyway.
Totals from 73 (0.04% of 176258) affected shaders:
Constlen: 3428 -> 3720 (+8.52%)
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40929>
The spec also mentions "output Location decorated color attachments".
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 564b061981 ("hk: Increase maxFragmentCombinedOutputResources to HK_MAX_DESCRIPTORS")
Reviewed-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40925>
We already have a nice script, let's add the only missing option here.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40932>
We only need to deal with the fixed function last vertex stage case,
for prior stages nir_opt_varyings is enough.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40907>
Remove the allocate_tile_state_now parameter from v3dv_job_start_frame().
So v3dv_job_allocate_tile_state() is explicitly called after
job_emit_binning_flush() as we know the value of job->draw_count instead
of using always 0.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40554>
Replace the inline tile_alloc/TSDA sizing in v3dv_job_allocate_tile_state()
with a call to the new v3d_tile_alloc_sizes() helper. This switches from
64B to 128B initial tile alloc blocks (avoiding overflow for simple draws)
and from a flat 512KB headroom to a draw-proportional formula.
Set tile_allocation_initial_block_size and tile_allocation_block_size
in all TILE_BINNING_MODE_CFG emissions and update the
TILE_LIST_INITIAL_BLOCK_SIZE packets to match.
Benchmarked on RPi5 (V3D 7.1) with GfxBench Vulkan Aztec Ruins at
1920x1040. Average tile_alloc BO size dropped 75% (535 KB to 132 KB)
with 20% fewer OOM events (521 to 417) and no FPS regression.
This avoids exhausting GPU memory when multiple blit or fill jobs
are batched in the same command buffer, with a huge reduction of
the memory footprint avoiding the 512 KB of the tile_alloc per batched
job.
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40554>
Replace the inline tile_alloc/TSDA sizing in alloc_tile_state() with a
call to the new v3d_tile_alloc_sizes() helper. This switches from 64B
to 128B initial tile alloc blocks (avoiding overflow for simple draws)
and from a flat 512KB headroom to a draw-proportional formula.
Set tile_allocation_initial_block_size and tile_allocation_block_size
in all TILE_BINNING_MODE_CFG emissions and update the
TILE_LIST_INITIAL_BLOCK_SIZE packet to match.
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40554>
Add V3D_TILE_ALLOC_INITIAL_BLOCK_SIZE = 128 and
V3D_TILE_ALLOC_OVERFLOW_BLOCK_SIZE = 64 to v3d_limits.h.
Corresponding _ENUM macros provide the 2-bit hardware encoding for the
TILE_BINNING_MODE_CFG packets.
The previous implicit 64B initial blocks were too small: a single draw
call emits ~88 bytes of per-tile BCL state, immediately overflowing
into continuation blocks. 128B initial blocks avoid the first
continuation allocation for simple single-draw passes.
Add v3d_tile_alloc_sizes() to v3d_util with the full tile alloc BO and
TSDA sizing logic. This uses the 128B initial blocks and tile_alloc
becomes proportional to the number of draws and size of the initial
blocks allocation with the cap of the previous fixed allocation. So
jobs with 0 or 1 drawcalls (blits/fills) reduce their headroom
dramatically.
The draw-proportional formula replaces a flat 512 KB continuation pool:
headroom = MIN2((tiles_size * draw_count) / 2, 512 KB)
Benchmarked on RPi5 (V3D 7.1) against GfxBench GL tests and
apitrace replays at 1080p. Tile-alloc memory reduction versus the
flat 512 KB headroom (taking into account 256kb kernel alloc per OOM):
GfxBench (5 benchmarks): -45% to -70% reduction, OOM at or below baseline
Apitrace (19 traces): -4% to -77% reduction on 20/24 traces
No FPS regressions observed on any workload.
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40554>
For Valhall, use SHADDX instruction for 64-bit integer addition instead
of lowering it to 32-bit operations. The instruction sequence for doing
it in 32-bit costs 3 cycles but SHADDX only takes 2 cycles to perform.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40841>
va_mark_last currently expects 64-bit registers to always be split in
two, this commit changes it to check first if a 64-bit register is split
or not.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40841>
According to the GL_EXT_multisampled_render_to_texture specification,
copy operations should be allowed when the extension is supported.
Previously, glCopyTexImage* would unconditionally fail with
GL_INVALID_OPERATION when copying from any multisampled framebuffer
(samples > 0), even when using render-to-texture attachments.
Fixes: d7b9da2673 ("mesa/main: fix artifacts with GL_EXT_multisampled_render_to_texture")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Wujian Sun <wujian.sun_1@nxp.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40863>
When round_to_nearest_even is enabled with NEAREST filtering, texture
coordinates near texel boundaries (e.g. 0.9999999404) can be incorrectly
rounded up to the next texel instead of being floor()'d.
According to OpenCL spec section 8.2, for CLK_FILTER_NEAREST:
i = address_mode((int)floor(u))
Backport-to: *
Signed-off-by: Eric Guo <eric.guo@nxp.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40912>
Allocate to hold 256 wait fences. Since there is only one queue per
per ip per process, the idea is that there won't be app or windowing system
that would have large number of job dependencies / wait fences.
If there is an app that has wait fences greater than 256, there won't
be corruption issues since kernel will wait for the extra fences.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40698>