addrlib has an extra optimization for memcpy with HIC, there are two
modes:
- blockMemcpy: chip-specific layout but better performance overall
- hybridMemcpy: chip-agnostic
Because matching UUIDs doesn't matter on desktop, use the block memcpy
by default.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41019>
`disk_cache_init` asserts `build_id_len == BUILD_ID_EXPECTED_HASH_LENGTH`
(20, the size of a GNU build-id SHA1 on ELF). Mach-O has no GNU
build-id; the closest equivalent is `LC_UUID`, which is 16 bytes.
`build_id_length()` therefore returns a non-20 value on macOS and the
assert fires as soon as `ENABLE_SHADER_CACHE` is on.
Relax the assertion to `<=` so any non-empty build id of acceptable
length is accepted while still catching impossibly long ones. The hash
only needs *some* unique-per-build identifier; the actual byte count
hashed is whatever `build_id_length()` returned.
Cc: mesa-stable
Suggested-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Louis Montagne <louis@askem.eu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41361>
../src/mesa/state_tracker/st_atom_framebuffer.c:203:27: warning: implicit conversion from 'unsigned int' to 'uint8_t' (aka 'unsigned char') changes value from 4294967295 to 255 [-Wconstant-conversion]
Fixes: 2b37f23314 ("gallium: fix pipe_framebuffer_state::view_mask")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41326>
The vertex input state can be NULL if rasterization is disabled with
dynamic vertex inputs.
The input assembly state can be NULL if rasterization is disabled
and both states are dynamic (primive topology and primitive restart
enable).
This fixes a segfault with gpu-ratemeter vk_dyn.prim
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41335>
Before a7xx, ldg/stg.a use an offset in units of their type size while
on a7xx and later, the offset is always in bytes. Currently,
@load/store_global_ir3 take their offset in dwords (32-bits). This has a
few downsides: offsets need an extra shl during codegen on a7xx and
addressing sub-dword-aligned addresses is only possible by doing 64-bit
math on the base address.
Improve the situation by always using a byte offset for
@load/store_global_ir3 and adding the offset_shift index to support type
units pre-a7xx. While we're at it, add the base index as well to support
all ldg/stg.g features in @load/store_global_ir3.
Supporting these renewed intrinsics consists of two parts:
- ir3_nir_lower_io_offsets legalizes the offset_shift on a6xx: for
ldg.a/stg.a, the offset has to be in units of the type size so extra
shifts are inserted to accomplish this if necessary. On a7xx, offsets
are always in bytes so nothing needs to be done.
- The intrinsics are emitted as ldg/stg if the offset is a small enough
constant and as ldg.a/stg.a otherwise. a6xx supports an extra shift
for ldg.a/stg.a that only applies to the GPR offset (not the immediate
base); NIR is pattern matched at this point to extract this if
possible.
All users of @load/store_global_ir3 are updated to generate the offset
in units of bytes. ir3_nir_analyze_ubo_ranges is updated to take the new
offset_shift into account.
Totals from 2029 (1.15% of 176266) affected shaders:
MaxWaves: 26728 -> 26660 (-0.25%); split: +0.01%, -0.26%
Instrs: 1314089 -> 1278603 (-2.70%); split: -2.72%, +0.02%
CodeSize: 2739108 -> 2633236 (-3.87%); split: -3.87%, +0.01%
NOPs: 197537 -> 200843 (+1.67%); split: -1.62%, +3.30%
MOVs: 43771 -> 44025 (+0.58%); split: -1.11%, +1.69%
Full: 31849 -> 31948 (+0.31%); split: -0.03%, +0.34%
(ss): 37965 -> 42027 (+10.70%); split: -3.47%, +14.17%
(sy): 13752 -> 13566 (-1.35%); split: -4.04%, +2.68%
(ss)-stall: 154238 -> 170353 (+10.45%); split: -1.72%, +12.16%
(sy)-stall: 804442 -> 806518 (+0.26%); split: -4.65%, +4.91%
Preamble Instrs: 326728 -> 293488 (-10.17%)
Cat0: 217926 -> 220947 (+1.39%); split: -1.58%, +2.96%
Cat1: 50182 -> 50446 (+0.53%); split: -0.97%, +1.49%
Cat2: 460987 -> 452101 (-1.93%); split: -2.26%, +0.33%
Cat3: 390696 -> 361271 (-7.53%)
Cat7: 39148 -> 38688 (-1.18%); split: -1.24%, +0.06%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41342>
These are funky enough that they make more sense as intrinsics than
texture opcodes.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>
This uprev:
- brings in vrend fixes with virgl ci expectation updated
- enables new venus extensions support
- drops render-server-worker since process isolation is the default
- updates venus ci expectations
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41331>
This has a bit of sorting overhead, but can significantly increase BVH
quality especially in big BVHs. gfx12 is faster at intersecting, so only
enable for gfx11 and earlier right now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41300>
64-bit morton codes are required for decent lbvh tlas builds since the
scene bounds are usually much bigger than the area that is actually
important.
The changes were done without understanding the code but they seem to
work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41300>