It's mildly tempting to reuse the src0_alpha source for color1 since
the two features should never overlap, but for now we add an extra
optional source.
We require SIMD16 for now as we only have SIMD16 messages. Eventually,
we're likely to want to support SIMD32 with 2x16 sends, but this gets
us going for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
ffloor(a) is lowered as a - ffract(a). dEQP expects that for example
ffloor(a) == 1.0 for every a in between 1.0 a 2.0. This worked fine,
but the new ffract(a + b(is_integral)) -> ffract(a) rule broke this.
Specifically, dEQP-GLES2.functional.shaders.struct.uniform.equal_fragment
checks that ffloor(a + 1.0) == 1.0 for every a between 0.0 and 1.0.
However this is not exactly true once the ffract(a + 1.0) is lowered
to ffract(a).
Prevent this by marking ffract from ffloor lowering as exact so that
the recently introduced ffract(a + b(is_integral)) -> ffract(a) rule
does not trigger.
Fixes: c6aaafa3 ("nir: add lowering for ffloor")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15562
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41882>
NVIDIA hardware have an instruction allowering you to retrive the mask
of active threads matching the same source value as the current
invocation.
This is going to be used by shared memory lowering for mesh / task
stages on NVK.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27196>
Fix
"dEQP-VK.api.copy_and_blit.*.image_to_image.all_formats.color.2d_to_1d.*.e5b9g9r9_ufloat_pack32.*"
on HK.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 5f5f4474f6 ("nir: Add a format unpack helper and tests")
Reviewed-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41929>
When both inputs are denorms, the bcsel picks the integer min/max result,
which does not flush denorms and therefore might return the wrong result.
Fixes OpenCL fmin/fmax on asahi.
Fixes: d238d766c6 ("nir: add lower_fminmax_signed_zero")
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41386>
ends_program calls into nir_cf_node_get_function repeadtly to fetch the
same function and to check whether we are inside an entry point or not.
But we already got the information higher up the chain so use that
instead.
nir_cf_node_get_function is quite expensive, because it follows pointers
through the tree.
Speeds up compilation of more complex shaders by quite a bit. I am seeing
a 66% cut of compilation time spent in e.g. llama-bench.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41891>
src[1] or src[2] would mean that the atomic uses the deref as data for the
op, we only want to allow address source uses.
Fixes: bb311ce370 ("nir: Allow atomics as non-complex uses for var-splitting passes")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41818>
These are SGPR inputs, so they are uniform in subgroups but may
have different values in different subgroups.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
These intrinsics are generally divergent between different
subgroups, but they can be uniform when all their sources
are also uniform.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
AMD SMEM instructions are always uniform within a subgroup,
but they may be divergent across subgroups, ie. each subgroup
may have a different value from the same SMEM instruction.
This needs to be considered for divergence across subgroups
as well as for vertex divergence, because vertices of the
same primitive may be split between different waves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41584>
There is no hardware which supports ffmaz with denorms.
We also need this to be seperate because there is AMD hardware
with ffma but not ffmaz.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41649>
Suggested by @gurchetansingh.
Android's Soong build system treats several compiler warnings as errors
by default: https://android.googlesource.com/platform/build/soong/+/27f57506/cc/config/global.go/#218
To catch these issues in Mesa, introduce `soong_compat_c_args`
and `soong_compat_cpp_args` with the following flags treated as errors:
-D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS
-Werror=date-time
-Werror=gnu-alignof-expression
-Werror=ignored-qualifiers
-Werror=implicit-fallthrough
-Werror=int-conversion
-Werror=missing-prototypes
-Werror=pragma-pack
-Werror=pragma-pack-suspicious-include
-Werror=sizeof-array-div
-Werror=string-plus-int
-Werror=unreachable-code-loop-increment
These compatibility flags are added to the meson configurations
for ANV, Gfxstream, Lavapipe, PanVK, Turnip, and Venus.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41644>
This struct was initially packed to fit in a slot in NIR intrinsics
indices. Nowadays NIR supports larger indices and cooperative matrix
has extensions that allow it to go beyond the existing limit. This
patch changes the struct to be larger and remove the manual bit packing.
The hash table change is to use the specialized version for u64 keys
that's available in src/util.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41691>
jay is trying to use the fragment shader dispatch mask for helper
invocation lowering, but it was using load_sample_mask_in for that
(now load_coverage_mask_intel). But this isn't the MSAA coverage
mask, the two are different payload fields.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Constructing the render target store payload is more complex than we can
reasonably handle at the NIR level. The main reason is that samplemask
and stencil are packed 16-bit and 8-bit parameters, respectively, which
are intermixed with other values that are 32-bit. In SIMD32 mode, the
packed sub-32-bit values take up fewer registers than normal values.
Currently we also don't specialize the NIR for each FS dispatch width,
and we can't construct the message descriptor without knowing it.
So, we alter nir_intrinsic_store_render_target_intel to take each of
the expected parameters - colour, depth, stencil, samplemask,
src0_alpha, and discard predicate. We construct the payloads and
descriptors in the backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Adding the zero constants have a minor impact on stats due to some unlucky
interactions with nir_opt_cse, opt_instr_sched_prepass and assign_regs.
Totals from 61 (0.01% of 1212873) affected shaders:
CodeSize: 1044720 -> 1047472 (+0.26%); split: -0.00%, +0.27%
Static cycle count: 1198932 -> 1198490 (-0.04%); split: -0.07%, +0.04%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
The nir_instrs_equal normalizes the some indices but hash_intrinsic
wasn't normalizing them. Reorganize the code so both do it using the
same helper.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Assisted-by: Pi coding agent (GPT-5.5)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41606>
We skip emitting ffma_weak here, because otherwise we'd require a lowering
loop with opt_algebraic and lower_double_ops and this way it's also
cheaper.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>