there's no requirement in the spec that the geometry for resolves must match,
only that the geometry must be positive (i.e., no flipped extents)
this avoids major perf issues for scaled resolves
cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18364>
reads flags field from CurrPic struct in pps for VA_PICTURE_H264_LONG_TERM_REFERENCE. If found, Curr_pic.frame_idx wil be used for the long term reference index
In get_picture_storage, check if current frame is ltr, and whether its ref frame is ltr.
In radeon_enc_slice_header, adds the ref_pic_list_modification_flag_l0 and long_term_reference_flag for ltr
v2: fix code formatting issues
Reviewed-by: Ruijing Dong ruijing.dong@amd.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18219>
This duplicates the lowering from nir_lower_packing. However, nir_lower_packing
also lowers a pile of other instructions that we do implement natively, and this
is easier than adding a bunch of knobs to nir_lower_packing to get just what we
need.
Fixes test-printf address_space_4.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
Implement like other workgroup barriers. No subgroup barriers yet, but that
doesn't seem needed yet.
Fixes test_basic.async_copy_global_to_local and a pile of other OpenCL tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
Array index always goes in the fourth 16-bit component on Valhall. I'm unsure
whether that should also apply to Bifrost. f256ec2a88 ("pan/bi: Fix 1DArray
image coordinate retrieval") says that it should be in the third component on
Bifrost, but I can't remember why that would be the case.
Fixes OpenCL test image_streams.write.1darray on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
Works around LLVM/SPIR-V stupidity. In effect this means we always use typeless
image stores, which is good enough for both CL and GL.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
Scalarizing phis results in vector constructions (nir_op_vec) of the same size
as the phi, so a wide phi (>128-bit) will result in a wide vector op that the
backend can't handle. These wide vector ops can always be copypropped away, but
that relies on running NIR copy/prop after scalarizing phis, which was not
always happening before. By scalarizing phis before the opt loop instead of
after, we guarantee that copyprop and DCE run to completion and we get
appropriately lowered code in the backend.
Fixes parts of integer_ops.integer_divideAssign with longs.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
Fixes test_bruteforce.isnormal. We don't implement fisnormal in the backend, but
actually lower_bool_to_bitsize was failing earlier since there's no fisnormal32
to lower to either.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
While we have a POPCOUNT.i32 instruction, we do not have v2i16/v4i8 variants.
The code generated by lower_to_bitsize doesn't seem any better than what we
could do ourselves, so let's use that.
While we're at it, give bitfield_reverse the same treatment as we have only
BITREV.i32. I don't think we can get <32-bit bitfield_reverse in either GL or
CL, but that seems likely to change in the future. (It looks to be valid SPIR-V,
at least.)
Fixes integer_ops.popcount.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
The word offset is already handled by the above code, there's no need to
restrict the further restrict the swizzle. This pattern can come up with OpenCL.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
The following IR is valid NIR:
vec1 16 ssa_0 = ...
vec1 32 ssa_1 = pack_32_2x16 ssa_0.xx
In this case, pack_32_2x16 takes in a two component vector, but the source
itself ssa_0 has only a single component. This is fine due to the shuffle, but
will fail the assert. Remove the assert and all is well.
Fixes test_relational.shuffle_copy.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
These need a simple two-instruction lowering regardless of the size of float
involved. Fixes integer_ops.integer_divideAssign
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
We don't have an 8-bit CSEL, so this is the best we can do. It's easier to write
the lowering as an algebraic rule since we don't need to do anything clever.
Fixes integer_ops.integer_clamp.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
When we lower swizzles, we move source modifiers (except for the swizzle) after
the swizzle operation. In particular, we change the order of composition for
negates and abs. However, copying the source will copy the modifiers unless we
specifically strip the extra modifiers. That's harmless in practice on Bifrost,
which doesn't check for extraneous modifiers, but is incorrect IR and trips an
assertion in the Valhall packing code.
Fixes test_relations.relational_bitselect.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
We're about to extend this pass to support 8-bit swizzles. That will be a
nontrivial change, so let's get some testing for what's already in the pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
This is only a theoretical bug fix because, for now, we always reemit
everything. But this aligns launch_grid with draw_vbo and makes the intention
explicit, both seem like good things to me.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
Fixes math_bruteforce.atan2 and contractions tests.
For OpenCL, we want to flush fp32 and preserve fp16, applying to both inputs and
outputs so F16_TO_F32 acts as preserve, which implements CL spec text:
> Denormalized numbers for the half data type which may be generated when
converting a float to a half using vstore_half and converting a half to a float
using vload_half cannot be flushed to zero
Note that our libclc builds flush denorms and rusticl does not advertise denorms
so we're expected to flush to zero. rusticl correctly sets the desired float
controls, we just have to match to the hardware requirements.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
Bump to 2048, the minimum maximum for image support in the full profile of
OpenCL. The relevant hardware limit is 65536 so we have plenty of clearance.
Fixes api.get_image1d_array_info.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
In NIR, txf does not take a sampler. However, in the hardware it does take a
sampler. If there is no sampler bound and we use txf, the hardware will read
back all-0's due to bounds checking. As a workaround, bind a trivial sampler and
use that.
As-is this workaround is Valhall specific, making use of an extra resource
table. I'm punting on generalizing back to Bifrost until I can discuss the issue
in more depth with Jason and Karol and figure out the right fix.
Fixes api.image_properties_query.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
At some point in the future, adding generic variants to libclc will
hopefully no longer be needed. At that point, we don't want the NIR
code adding duplicates. Check if the generic version already exists
and, if it does, don't re-add it.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18675>
v2: Drop old code (Karol)
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18675>
Fix defect reported by Coverity Scan.
Double unlock (LOCK)
double_unlock: dri2_egl_error_unlock unlocks dri2_dpy->lock while it is unlocked.
Fixes: f1efe037df ("egl/dri2: Add display lock")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18655>
..Rather than at backend IR translation time. This is considerably
simpler because we can use the txs lowering instead of special casing
array sizes. Unfortunately it generates worse code, but that gap should
close once nir_opt_preamble is wired in.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18652>
Indirect dispatch does not actually require any dynamic memory allocation, even
with shared memory. We just need to set wls_instances to some (mostly arbitrary)
value, statically allocate memory based on that, and let the hardware throttle
workgroups to fit if needed.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18661>
Intel has different Z interpolation float point rounding
than other mesa gpus
For example gl_Position.z = 0.0 will be interpolated to
gl_FragCoord.z = 0.5 for all gpus
gl_FragCoord = -0.00000001 will be interpolated to
gl_FragCoord.z = 0.4999999702 for Intel
and rounded to gl_FragCoord.z = 0.5 for other gpus
Games with LEQUAL depth func will fail depth test on Intel
and will pass it on other gpus in such case
This workaround lowers translated depth range
and several gl_FragCoord.z coords with extra small difference
will be translated to the same UINT16\UINT24\UINT32
value of an integer depth buffer
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7199
Signed-off-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18412>
anv_pipeline_get_last_vue_prog_data (used by emit_3dstate_primitive_replication)
doesn't work for mesh stage.
Fixes: ae57628dd5 ("anv: Drop anv_pipeline::use_primitive_replication")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18495>