Xe3P will allow us to reduce the number of anv_state_pool in use, this will
improve performance as it will result in less uAPI calls to allocate memory
and less memory waste in anv_state_pool with not much use.
As this will be a run-time decision, here I'm adding a function to get each
anv_state_pool, then we can just change the function and all the callers will
use the correct anv_state_pool.
Next patches will replace directly access to each anv_state_pool by
a function call in the next patches.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42133>
If a GLXPixmap is created without GLX_TEXTURE_TARGET_EXT,
textureTarget remains 0. Calling glXBindTexImageEXT on such a
drawable would pass 0 to _mesa_get_current_tex_object(), triggering
an internal implementation error and a null-pointer segfault.
Return early when textureTarget is 0 - the drawable was never set
up for texturing, so bind is a no-op.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Assisted-by: DeepSeek V4 Flash
Closes: #58
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42093>
VCN supports one tile only, but with multiple slice segments.
Cc: mesa-stable
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42136>
The spec requires us to follow the constantQp/base_q_idx from the app,
which is constrained by the caps. Report the more accurate caps.
Cc: mesa-stable
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42136>
vtn lowers OpFmaKHR to nir_op_ffma and every Mali has a native fused
multiply-add, so there is nothing to do in the backend.
fp16 is gated on shaderFloat16. A 16-bit OpFmaKHR also needs the Float16
capability and only shaderFloat16 turns that on, so without it the bit
would not be usable. Mali has no fp64, so that one stays off.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42075>
Emulates SSBOS via global memory, real SSBO size and global base address
are stored in the descriptor. The size can be accessed using resbase,
the base address is parsed manualy from the descriptor by passing the
bindless base address into the shader via a driver UBO or const file.
nir_lower_ssbo is used to lower SSBO accesses to global memory when the
buffer size exceeds the limit. We also use it to insert bounds checks on
global memory. The final code for SSBO accesses looks like this:
if (@get_ssbo_size >= max_storage_buffer_range_bytes) {
if (offset < @get_ssbo_size) {
// global memory access using base (from resbase) + offset
} else {
// do nothing (stores) or return 0 (loads)
}
} else {
// original SSBO access
}
A new pass is added to lower @load_ssbo_address generated by
nir_lower_ssbo. We set native_offset=true for nir_lower_ssbo to make
sure it doesn't generate 64 bit address math. The new pass then
transforms @load/store_global into @load/store_global_ir3 passing the 32
bit offset from @load_ssbo_address.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Co-authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
Emulates texel buffers via 3D image access, real texel buffer
size and start offset (due to image aligment requirements) are
stored in the descriptor and accessed via resbase.
- Read-only access: isam.a.1d to read as 3d image.
- RW access: stib.b.typed.3d/ldib.b.typed.3d to read as 3d image.
Verified that proprietary D3D12 driver uses the same workaround,
the only difference is that proprietary driver uses arrayed 2d load
for read-only access instead of 3d load, but benefits are not verified.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
From tests A8XX seem to fix incompatible with D3D12 limits.
However, proprietary driver exposes old texel buffer element limit.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
This is mostly useful in combination with `min_ssbo_size` when the
native SSBO access instructions do the bounds check in HW so we don't
want to add bounds checks for all SSBO accesses.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
Some HW may have native SSBO instructions that only support a limited
buffer size. It may be beneficial to use those instructions for small
SSBOs and only fall back to global memory accesses for large ones.
This commit adds an option (min_ssbo_size) that, if non-zero, will cause
code like this to be emitted:
if (@get_ssbo_size >= min_ssbo_size) {
// global memory access
} else {
// original SSBO access
}
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
Even if the front buffer isn't locked yet, it will normally get locked,
so we can't reuse it as a back buffer.
Pointed out by Daniel Stone.
Fixes: 4a976b60b1 ("egl_dri2: use gbm_surface as the native window type in drm platform")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42111>
Exposes higher limits than the ones supported by the HW and several
ArrayDynamicIndexing features not yet implemented so the Dawn WebGPU
implementation can be used while it doesn't exercise these limits or
features.
The override is enabled using the V3D_WEBGPU_OVERRIDE=1 envvar. When
it is enabled it:
- Increases the framebuffer dimension limit from the real HW value
(4096 on RPi4, 7680 on RPi5) to 8192.
- Bumps the advertised maxMipLevels reported per format from 13 to
14 to match the bumped 8192-wide images and 15 for non-2D images.
The TMU HW already supports that for sampling.
- Increases max_varying_components from 64 to 72 (HW limit is 64).
- Exposes features that are not actually implemented; CTS tests that
exercise them will hit asserts in debug builds:
- shaderUniformBufferArrayDynamicIndexing
- shaderSampledImageArrayDynamicIndexing
- shaderStorageBufferArrayDynamicIndexing
- shaderStorageImageArrayDynamicIndexing
- Increases maxImageDimension1D from 4096 to 16384
- Increases maxImageDimension2D from 4096 to 8192
- Increases maxImageDimension3D from 4096 to 16384
- Increases maxImageDimensionCube from 4096 to 16384
When V3D_WEBGPU_OVERRIDE is unset (the default), the driver
advertises the real HW limits already set up by the preceding
"use real HW framebuffer limit" change, so Vulkan CTS conformance
is unaffected.
To help diagnose applications that hit the over-advertised paths,
mesa_loge errors are emitted from three places:
- lower_vulkan_resource_index() warns before the existing UNREACHABLE
for dynamic descriptor indexing, so the cause is visible in release
builds where the assertion is compiled out.
- create_image() warns when vkCreateImage is called with attachment
usage and dimensions above the real HW framebuffer limit. Storage
and sampled-only images above that limit work fine via the TMU.
- job_compute_frame_tiling() erros when a render job width/height
exceeds the real HW framebuffer limit.
The per-plane slices[] array in struct v3dv_image is sized at
V3D_MAX_MIP_LEVELS + 2 so the override case (which advertises 14/15
mip levels for the bumped 8192-wide 2D images and 16384 for 1D/3D images)
still fits without enlarging the default array.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
Add a new max_framebuffer_size to devinfo so V3D 4.2 and V3D 7.1 can
expose different framebuffer dimensions: 4096 on RPi4 and 7680 on RPi5.
This is bounded by the maximum clip size supported by the framebuffer.
Take advantage of this to also raise maxImageDimensions* to
max_framebuffer_size.
A non-power-of-two framebuffer means framebuffer_size_for_pixel_count can
compute a height larger than max_framebuffer_size. Clamp the height to the
maximum and recompute the width from the division so w * h <= num_pixels.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
Use the new devinfo value instead the V3D_MAX_RENDER_TARGETS
macro.
We only maintain the usage of the macro in devinfo initialization
and the V3D in the versioned file src/gallium/drivers/v3d/v3dx_state.c
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
pack_null_texture_state(), introduced to support VK_KHR_robustness2
nullDescriptor for image bindings, left the TEXTURE_SHADER_STATE
"Array Stride (64-byte aligned)" field at 0.
On real V3D HW it is fine: a TMU read against a null descriptor
returns zero regardless of the descriptor contents, but V3D simulator
validates the TMU array stride before issuing the read.
Setting array_stride_64_byte_aligned = 1 (64 bytes raw) fixes failing
dEQP-VK.robustness.robustness2.bind.*.null_descriptor.samples_1.3d.*
tests case under the simulator.
Fixes: 990d76eae6 ("v3dv: Implement and enable nullDescriptor support")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42112>
bindgen up to at least 0.72.1 generates invalid code (see below) and
that function is not used, so simply skip it.
src/gallium/frontends/rusticl/rusticl_mesa_bindings.c:795:81: error: duplicate ‘const’ declaration specifier [-Werror=duplicate-decl-specifier]
795 | void pipe_shader_state_from_tgsi__extern(struct pipe_shader_state *state, const const struct tgsi_token *tokens) { pipe_shader_state_from_tgsi(state, tokens); }
| ^~~~~
Backport-to: *
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41620>
Just use an existing flag to increase the bo size slightly.
Fixes a ring gfx timeout with
dEQP-VK.spirv_assembly.instruction.compute.opfma.fp32.vec3.undef.denorm_flush.directed
on vega10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: *
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41937>
This delays the waitcnt for has_attr_ring_wait_bug by a few instructions.
fossil-db (gfx1201):
Totals from 9 (0.00% of 208640) affected shaders:
Instrs: 19352 -> 19506 (+0.80%)
CodeSize: 101180 -> 101716 (+0.53%)
Latency: 660221 -> 678782 (+2.81%); split: -0.00%, +2.81%
InvThroughput: 95106 -> 97398 (+2.41%)
fossil-db (navi33):
Totals from 58834 (28.20% of 208626) affected shaders:
Instrs: 22424304 -> 22424571 (+0.00%)
CodeSize: 110198112 -> 110199184 (+0.00%)
Latency: 115894319 -> 126491124 (+9.14%); split: -0.00%, +9.14%
InvThroughput: 19424631 -> 19754358 (+1.70%); split: -0.00%, +1.70%
I don't think the stats are very accurate. This seems to often move the
s_waitcnt down into a divergent branch, but the wait still happens later
if the branch isn't taken, so the wait is counted twice.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
This shouldn't fix anything, because event_vmem_bvh was never used here.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>