(('fadd', a, 0.0), a) worked for both a + 0 and a + -0 before, but now
requires explicit pattern after 0d255011ae
Just noticed because it pushed few dEQPs over the instruction limit.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39151>
Fix compiler error:
../src/util/blob.c:344:8: error: assigning to 'uint8_t *' from
'const void *' discards qualifiers
[-Werror,-Wincompatible-pointer-types-discards-qualifiers]
glibc now provides C23-style type-generic string functions. memchr
returns const void * when passed a const void * argument. Update nul
declaration to const since it's only used to find the null terminator
position and calculate a size.
Fixes: 1c9877327e ("glsl: Add blob.c---a simple interface for serializing data")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39048>
Add astc hdr (float) formats, those get treated identically as the ldr
formats as the blocks have enough metadata to be decoded as float.
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38859>
The wayland helper will use uninitialized data, causing undefined
behaviour.
This was because the feedback data was allocated, but not fully
initialized.
After the fix, SDL's vulkantest runs cleanly through valgrind. Before,
it would not.
Signed-off-by: Bram Stolk <b.stolk@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39133>
Implement a mitigation for VM faults caused by SMEM reading
out of bounds when using robust buffer access.
- Pad uniform and storage buffer allocations with a readonly VM page
- Clamp SMEM offsets that can potentially read past the next page
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
Implement a mitigation for VM faults caused by SMEM reading
from NULL descriptors.
In order to satisfy VKD3D-Proton's expectations on mutable
descriptors, we must do this in shader code, it is not
sufficient to use the address of a mapped BO when writing
null descriptors. It is not feasible to mitigate this
in VKD3D-Proton.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
Map the first page of the same BO as read-only after the BO itself
in order to pad each BO with an extra page. This doesn't require
us to allocate any memory.
This is going to be used for a HW bug mitigation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
The pass implements two mitigations for the GFX6-7 SMEM bug:
1. To mitigate VM faults by NULL descriptors:
Make sure that SMEM buffer loads always access a mapped BO.
Use either the descriptor BO (or compute scratch BO),
or otherwise use the zero-filled BO in their place.
2. To mitigate VM faults by OOB robust buffer access:
Add an instruction to clamp the offset source to the
num_records field of the descriptor. It will be still
out of bounds, but the VM fault can be completely mitigated
if the driver adds a padding to each memory allocation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
On GFX6-7, SMEM instructions access memory when num_records == 0
or offset >= num_records, which causes VM faults when reading a
page that isn't mapped.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
Lowers a shader to use a smaller workgroup to do the same work,
while it will still appear as a bigger workgroup to applications.
To achieve this, the pass augments the CF of the shader
so that each real subgroup will execute two or more logical
subgroups. A logical subgroup represents what the application
can observe as a subgroup.
The size of a logical subgroup is the same as a real subgroup.
Only one logical subgroup may be executed per real subgroup
at the same time. This ensures that all subgroup operations
keep working and the subgroup invocation ID stays the same.
- When the CF contains barriers, we can't just repeat
the code and we need to augment each CF node individually
so that they are aware of logical subgroups.
- In case parts of the CF don't contain any barriers, we can simply
repeat and predicate that CF for each logical subgroup.
It is technically not necessary to implement this strategy, but
in practice it helps reduce the amount of branches in the shader
and therefore improves compile times.
The pass is mainly intended for working around HW limitations,
for example when the HW has an upper limit on the workgroup size
or doesn't support workgroups at all, but the API requires a
certain minimum.
Notes:
- Only applicable to shader stages that use workgroups
- Hits an assertion when called on smaller workgroups
- Always flattens workgroup size to 1D
- Creates local variables
- Does not change subgroup size
- Variable workgroup size not supported yet, maybe later
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Anna Maniscalco <anna.maniscalco2000@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37985>
The issue is that the current narrowing patterns are not working in a lot
of cases, for example
(('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)),
is missing patterns like this:
32x3 %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000)
32x4 %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0)
32 %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz
or after some later transforms:
32x2 %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000)
32x2 %6 = vec2 %5, %1 (0x0)
32 %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy
This patch is heavily based on old branch from Ian Romanick from 2019.
r300 RV530 shader-db:
total instructions in shared programs: 128900 -> 128882 (-0.01%)
instructions in affected programs: 621 -> 603 (-2.90%)
helped: 10
HURT: 1
total cycles in shared programs: 191837 -> 191828 (<.01%)
cycles in affected programs: 799 -> 790 (-1.13%)
helped: 7
HURT: 1
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>
They have the same rules for placement as (eq).
Blob places them right after the last cat5/cat6 instruction if
possible, we do the same for now.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Co-authored-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31885>
The new a7xx NOP flags (eolm), (eogm) have similar to (eq)
constraints and helper_sched could be used for them with minimal
modifications.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Co-authored-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31885>
Much faster because CB is optimal with 2D swizzle modes. This isn't
applied for storage images because it depends on the access pattern,
and benchmark results are very different.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38084>