This allows using drm-shim for an emulated driver in AMD GPU host.
Otherwise we need to set MESA_LOADER_DRIVER_OVERRIDE to the emulated
driver in order to make it working.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>
Also, fix texel rate for G1-Pro variant 1.
And mention G1-Ultra, G1-Premium and G1-Pro in the release notes.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41639>
Both RRA and RMV used the PCI bus slot index in the trace device_id
field. On a typical single-GPU system, this resulted in "Device ID =
0000" displayed in RRA and RMV when traces were opened.
Match RGP dump, which reports device ids correctly.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41788>
The game tries to use anisotropic filtering deep in some control flow
while updating a procedural displacement map, our sampling hardware
does not check the channel enable mask before calculating the
derivatives for each subspan, which causes it to get garbage for any
subspans that have partially disabled lanes.
This workaround converts any sample messages in fragment shaders that
have divergent control flow into a sample_d message with the derivatives
zero'd by software if some of the lanes are disabled.
Closes: #12796
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41716>
Using derivatives in control flow that is not uniform across a subspan
will produce "undefined behavior" in GLSL.
On Intel hardware, this means the sampler will just always compute the
derivatives from whatever values are in each lane of a subspan in the
raw payload, regardless if some have been disabled and contain garbage.
Unfortunately, some applications seem to expect the sampler to ignore
disabled lanes in these cases instead of computing their derivatives
anyway from garbage, so for those we need a pass that finds any sample
messages in divergent control flow and converts them to a sample_d with
the derivatives zero'd by software if one or more lanes required to
calculate them have been disabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41716>
The previous implementation assumed that VS would have exactly two
executables, the HW variant position shader and the HW variant varying
shader, and that non-VS shaders would only have one executable per
variant. These assumptions are violated by avalon (which only has one
IDVS executable), by SW VS (which is a second VS variant, for three
total variants) and GS rast variants (which execute as a VS on the
hardware and so have two executables pre-avalon).
The new logic allows VS-staged variants to occur as a variant in any API
shader stage, and gives them either one or two executable indices
depending on whether the secondary is used.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: ff9907927f (panvk: Add basic infrastructure for shader variants)
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41631>
The source and destination clipping were performed in the wrong order.
We should first clip the source rectangle against the source buffer
bounds, then clip the destination rectangle against the destination
buffer bounds (including scissor).
Fixed the webgl 2.0.0 test case:
conformance2/rendering/blitframebuffer-filter-outofbounds.html
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Wujian Sun <wujian.sun_1@nxp.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41718>
v2: - remove a few useless helpers
- rename some variables
- use some more nir with immediate codes (Emma)
v3: - use 64 bit integer ops
- optimize generated code
v4: - fix typo (Emma)
- Use boolean for available (Emma)
- simplify some calculations (Emma)
- replace "if" in timestamp code and bool conversion
with "bcsel" (Emma)
- clean up some variable names
v5: - remove iadd3 (Konstantin)
Assisted-by: Copilot (Auto mode)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41328>
We don't actually need to copy the vertex attributes because if no
TCS shader was given by the user TES simply is pointed to the VS
output in LDS that has the same layout the TCS shader would provide.
v2: with the lowering of the relevant intrinsics in place
use nir_create_passthrough_tcs_impl to create the passthrough
shader (like suggested by Mareco and Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41328>
With memcpy lowering and fix for infinite optimize loop on 4x16 packs,
passes `dEQP-VK.spirv_assembly.instruction.compute.untyped_pointers.*`.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41695>
Zink implementation splits `maxPerStageDescriptorStorageBuffers` between
atomic buffers and `MaxShaderStorageBlocks` causing CTS tests to fail
because there is not enough SSBO blocks.
Also updated 'maxPerStageResources' for the current limits.
Fixes the following tests:
* KHR-GLES31.core.program_interface_query.ssb-types
* KHR-GLES31.core.compute_shader.pipeline-compute-chain
* KHR-GLES31.core.shader_storage_buffer_object.advanced-indirectAddressing-case1-cs
* KHR-GLES31.core.shader_storage_buffer_object.advanced-usage-sync-cs
* KHR-GLES31.core.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs
Signed-off-by: Arjob Mukherjee <arjob.mukherjee@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41708>
Changes are the result of two issues:
- In library form, workgroup size is not lowered. Only once the
pre-compiles are distinct variants with entry-points can we
lower uses of the workgroup size input.
- Some unimplemented instructions like `ufind_msb` would make their
way through to the final shader, if they are generated by other
algebraic optimizations. `nir_opt_algebraic` needs to be run in a
loop to ensure they are eliminated.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41568>
Defines a more general purpose version of `poly_unroll_restart`
named `poly_unroll_geometry`, which allows unrolling without an
input index buffer by separating the input and output index sizes.
This allows it to be used for additional use cases, such as
unrolling triangle fans or changing index types, where the draw
may not necessarily be indexed or the input and output index types
are not the same.
`poly_unroll_restart` remains as an alias with the same declaration
as before.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41568>
Until now the BLT engine only handled same-format blits. Teach it to
convert between the UNORM formats it can represent, so format-converting
blits no longer fall back to the 3D blitter.
The supported formats are A8R8G8B8, X8R8G8B8, A4R4G4B4, A1R5G5B5,
R5G6B5, R8G8, R8 and A2R10G10B10.
The BLT format names are BGRA-based, matching the PE-internal byte order,
so an identity swizzle is correct for all of these except A2R10G10B10.
That one the PE keeps in RGBA order, so pipe R lands in the BLT B position
and vice versa. blt_conversion_needs_channel_swap() captures this and
find_blt_conversion() derives the per-image swizzle.
SRGB variants share the BLT format of their UNORM sibling. For example
R8G8B8A8_UNORM and R8G8B8A8_SRGB both map to BLT_FORMAT_A8R8G8B8, and
the sRGB handling is carried separately via img->srgb.
No regressions in dEQP-GLES3.functional.fbo.blit.conversion.*.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39879>
Add sRGB field to blt_imginfo and use it to conditionally set the
BLT_SRC_IMAGE_CONFIG_SRGB and BLT_DEST_IMAGE_CONFIG_SRGB bits in
the BLT config register setup.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39879>
Mesa core pre-seeds VS/TCS/TES/GS/FS in _mesa_init_constants(..) with
MAX_TEXTURE_IMAGE_UNITS. When a driver does not expose a stage, this
seed leaks into the GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS sum. Drivers
that only expose VS+FS (like etnaviv) overcounted by 96. Zero the
field so the sum reflects only the stages the driver advertises.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41746>
The TFU stride-0 fill path allocates a 64 KiB staging BO
(V3D_TFU_MAX_DIM * cpp = 16384 * 4), maps it, fills it with the
pattern, and caches it on the command buffer. For non-zero patterns
the per-cmd-buffer cache works well, but WebGPU/Dawn workloads
issue many zero-fills (lazy buffer init) across separate command
buffers, so the cache misses almost every time and each fill pays
for a fresh alloc + mmap + memcpy.
Add a device-wide staging BO held in v3dv_device::meta.tfu_fill_zero,
lazily allocated under meta.mtx and used whenever data == 0. The BO
is read-only after init so it can be shared across queues without
extra synchronization, and it is freed in destroy_device_meta.
Measured on a Dawn/WebGPU zero-fill-heavy workload (RPi5, ~60
meta_fill_buffer calls, ~218 MiB total, all zero-fills):
before: TFU branch total 7.328 ms, avg 115.55 us/call
after: TFU branch total 0.296 ms, avg 4.78 us/call (~24x)
Non-zero patterns continue to use the per-cmd-buffer cache.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Adjust eligibility check on imageExtent vs slice dimensions
rather than on the buffer addressing dimensions. The TFU codepath
here always writes/reads the full slice from its origin, so the
required invariant is 'imageExtent == slice'; bufferRowLength and
bufferImageHeight may be larger than imageExtent (the spec permits
this for non-zero values), in which case the TFU reads/writes at the
buffer's row/layer stride but only touches slice->width pixels per
row and slice->height rows per layer, leaving the trailing padding
untouched.
The previous combined check (width == slice->width && height ==
slice->height applied to the buffer dimensions) would reject any
caller that set bufferRowLength or bufferImageHeight larger than the
image (this is common for buffers shared across mip levels or
for alignment requirements like Dawn aligning bufferRowLength to 2
for 1-pixel-wide textures), forcing those copies through the slower
TLB / blit / compute paths.
For compressed formats, keep the strict equality check since
block-level stride semantics are more complex.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
Generalize copy_buffer_image_tfu with a to_buffer flag selecting which
side is the raster destination, and wire it into v3dv_CmdCopyImageToBuffer2
before the TLB path.
The to_buffer=true direction has the same eligibility constraints as
buffer-to-image, except that V3D 4.2 is unsupported as its TFU cannot
produce raster output, and for image-to-buffer the destination is
always a raster buffer.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>