In case of texel buffers that are read in the shader, but not bound by
the application, the current implementation would incorrectly try to
read from non-existent buffers.
To ensure this does not happen, this change sets the format for any
unbound attributes to CONST_0000, which will kill any actual
reads/writes and always return zeroes.
This fixes the following two tests:
- spec@arb_shading_language_420pack@active sampler conflict
- spec@arb_texture_buffer_object@render-no-bo
Fixes: a21ee564e2 ("pan/bi: Make texel buffers use Attribute Buffers")
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39431>
(cherry picked from commit aec8132b8b)
On Xe2+, HSD 14011946253 and the related documents explain that MCS
still only supports a single clear color.
Fixes: df006bba02 ("iris: Update aux state for color fast clears (xe2)")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 6c6b2d8f30)
When changing the clear color without a fast clear, use dirty bits to
ensure that surfaces with inline clear colors are updated and that
partial resolves are done as needed.
Remove the flags at the bottom of fast_clear_color() as
blorp_fast_clear() already sets them for us.
Fixes: 64d861b700 ("iris: Skip some fast-clears even on color changes")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 3b642f7456)
The previous method to calculate imageSize().z was
incorrect for a cubearray view.
This change was tested on palm and cayman. Here is the test fixed:
spec/arb_texture_view/rendering-layers-image/layers rendering of imagecubearray: fail pass
Fixes: 6c1432f0be ("r600/eg: fix cube map array buffer images.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39063>
(cherry picked from commit 0b8d8f2b17)
This change fixes the clamp to max_texel_buffer_elements
issue related to rv770 and older gpus.
Here are the tests fixed on rv770:
spec/arb_texture_buffer_object/texture-buffer-size-clamp/r8ui_texture_buffer_size_via_sampler: fail pass
spec/arb_texture_buffer_object/texture-buffer-size-clamp/rg8ui_texture_buffer_size_via_sampler: fail pass
spec/arb_texture_buffer_object/texture-buffer-size-clamp/rgba8ui_texture_buffer_size_via_sampler: fail pass
Fixes: 1a441ad5cb ("r600: clamp to max_texel_buffer_elements")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39385>
(cherry picked from commit afcead9158)
This is a gl4.3 issue very similar to e8fa3b4950.
The mode r10g10b10a2_sscaled processed as vertex on palm at the
hardware level doesn't follow the current standard. Indeed, the .w
component (2-bits) is not calculated as expected. The table below
describes the situation.
This change fixes this issue by adding two gpu instructions at
the vertex fetch shader stage. An equivalent C representation and
a gpu asm dump of the generated sequence are available below.
.w(2-bits) expected palm cypress
0 0 0 0
1 1 1 1
2 -2 2 -2
3 -1 3 -1
w_out = w_in - (w_in > 1. ? 4. : 0.);
0002 00000024 A0040000 ALU 2 @72
0072 801F2C0A 600004C0 1 w: SETGT*4 __.w, R10.w, 1.0
0074 839FCC0A 61400010 2 w: ADD R10.w, R10.w, -PV.w
Note: cypress returns the expected value, and does not need
this correction.
This change was tested on palm, barts and cayman. Here are the tests fixed:
khr-gl4[3-6]/vertex_attrib_binding/basic-input-case6: fail pass
khr-gles31/core/vertex_attrib_binding/basic-input-case6: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38849>
(cherry picked from commit 2ed761021f)
Using a PV register which is not PV.x, after a dot4 operation,
does not work on rv770. Anyway, this does work on evergreen
but this is not documented.
This change updates this behavior for all the r600 gpus
which fixes the issue on rv770. It adds max4 which has the
same requirement in the case of max4 being implemented.
Here are some of the affected tests on rv770:
piglit/bin/fp-abs-01 -auto -fbo
glcts --deqp-case=KHR-GL31.buffer_objects.triangles
piglit/bin/shader_runner generated_tests/spec/glsl-1.10/execution/built-in-functions/fs-distance-vec2-vec2.shader_test -auto -fbo
Fixes: 942e6af40b ("r600/sfn: use PS and PV inline registers when possible")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39101>
(cherry picked from commit da1108dcc4)
The functionality was working properly at glMinSampleShading(0.)
and glMinSampleShading(1.). The issue was with the intermediary
values. This change makes this function compatible with the
evergreen setup.
Note: this was one of the few functionalities which were working
properly on evergreen but not on cayman.
Here are the tests fixed:
spec/arb_sample_shading/samplemask 4 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 4/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 6 all/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 6 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 6/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 6/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 8 all/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 8 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 8/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 8/0.500000 partition: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_rbo_4: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_rbo_8: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_texture_4: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_texture_8: fail pass
Fixes: f7796a966d ("radeonsi: add basic code for overrasterization")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38615>
(cherry picked from commit d5d844bfc4)
Otherwise:
gallium/auxiliary/gallivm/lp_bld_nir_soa.c:2394:7:
error: variable 'opname' is used uninitialized whenever switch default is taken
is observed.
Reviewed-by: @LingMan
Fixes: 12bceb228a ("gallivm: let reduce ops use llvm intrinsics")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39418>
(cherry picked from commit 0f582b0268)
now that transient images are a more complete mechanism, this should
in theory be okay and also accounts for the case where
a framebuffer contains mixed msrtt textures and plain multisampled textures
(cherry picked from commit 6474af3b42)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39469>
Now that all larger workgroup sizes are lowered to 256,
the regalloc hang cannot mess up the compute queues anymore.
Still don't allow compute queues on GFX6 though,
those have never been enabled ever since RadeonSI started using
the compute queue in a1378639ab - let's keep it that way.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
Even though radeonsi may not use compute queues, other processes
might run compute jobs in the background, so radeonsi must make
sure not to use larger than 256 sized workgroups on GPUs that
are affected by the regalloc hang.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
Even though radeonsi may not use compute queues, other processes
might run compute jobs in the background, so radeonsi must make
sure not to use larger than 256 sized workgroups on GPUs that
are affected by the regalloc hang.
Unfortunately that means that for now RadeonSI won't be able to
support ARB_compute_variable_group_size on these GPUs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
We don't need to take ETNA_DIRTY_SHADER into consideration for pure
updates of the constant states. When the shader is dirty constants
and code will be uploaded together and the update path will be skipped.
The uniform cache in the context has been removed in ee1ed59458
("etnaviv: prep for UBOs"), so the comment referencing this cache
is confusing and can go as well.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39422>
Constant buffers may be changed without the shader changing.
Check the correct dirty bits when marking constant buffers
as read during the draw to ensure proper synchronization.
Fixes: a40a6e551e ("etnaviv: draw: only mark resources as read/written when the state changed")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39422>
These are useful for displaying very high color precision images with
more than 10 bpc color depth, and also more precision than what fp16
can do on a standard dynamic range (SDR) display, where fp16 for values
in the unorm 0.0 - 1.0 range is about equivalent to at most ~11 bpc
linear color depth. This is especially useful for and aimed at scientific
applications, e.g., neuroscience and other bio-medical research cases.
At least current generation AMD gpu's released during the last 10 years
and supported by amdgpu-kms + atomic modesetting do allow for scanout of
such 16 bpc framebuffers and of up to 12 bpc output to suitable HDMI or
DisplayPort high precision displays.
We gate the format behind a new driconf option 'allow_rgb16_configs',
which defaults to true, but allows to disable the formats if any issues
should arise.
Most regular applications won't need the high display precision of
these new 16 bpc 64 bpp formats which have higher memory and bandwidth
requirements, and therefore a potential undesired performance impact
for regular apps. Followup per-platform enablement commits will use
the EGL_EXT_config_select_group extension to put these 16 bpc unorm
formats into a lower priority config select group 1, so they don't get
preferably chosen by default by eglChooseConfig(), but must be explicitely
requested by client applications which really need the high color
precision of these 64 bpp formats and are happy to pay the potential
performance impact. Thanks to Adam Jackson for pointing me to the
EGL_EXT_config_select_group extension.
If the format would be put into the default config select group 0, a
simple EGL eglChooseConfig() call would end up choosing these formats,
which is not what such regular apps would want.
Tested to not cause any change on native X11/EGL and X11/GLX, which only
supports at most 30 bpc / 32 bpp formats.
Followup commits will enable these formats for the EGL/Wayland backend,
and on the EGL/DRM backend.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38588>
PAL always set WD_SWITCH_ON_EOP for pre gfx10 when primitve
restart is enabled to prevent gpu hang.
It only happens when specific index stream with primitive
restart. Since we don't know what's the exact problem,
just follow PAL to disable 4x primitive rate when primitive
restart is enabled.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14629
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39292>
Only add the appropriate picture_{h264,h264_enc,vc1,...}.c file when the
corresponding codec is enabled via the -Dvideo-codecs flag.
Add stub functions to va_private.h, so that the code in decode.c and
encode.c remains untouched.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39354>
Add mpeg12dec as a selectable video-codec and add a corresponding check
to vl_codec_supported.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39354>
Add jpeg as a selectable video-codec and add a corresponding check to
vl_codec_supported.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39354>
This replaces all full lisence headers with SPDX identifiers and
generally makes things more consistent. I've also dropped the few
remaining author tags. If someone wants to know who wrote a bit of
code, `git blame` is going to be way more accurate than author tags
anyway.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39397>
Without this, non-dynamically-supported state changes that require a pipeline
change (like blend states without full_ds3) that happen in between drawcalls
get ignored unless another one of the conditions also happened to be true.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39381>
Vulkan spec requires binding flags to be matched with the binding with
the same index, however currently bindings are sorted with flags not
properly sorted, which leads to bindings and flags mismatch.
Resolve this by adding optional flags info to the parameters of
vk_create_sorted_bindings(), and refactoring panvk/pvr (which really
pair bindings and flags instead of only iterating flags) to use sorted
flags.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38967>
aco implements the same logic, and in the future it will make changes to
config->float_mode to avoid unnecessary s_setreg.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38815>
si_sqtt_start / si_sqtt_stop use emit_barrier which clears barriers_flags.
Since these functions are used to build an auxiliary cs which will only
be emitted later (on sqtt enablement/disablement) it shouldn't clear
the global barrier_flags value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39308>
The pattern:
ctx->barrier_flags |= ...;
si_mark_atom_dirty(sctx, &sctx->atoms.s.barrier);
is used a lot, let's add an inline helper. This prevents
forgetting the call to si_mark_atom_dirty.
si_upload_bindless_descriptors is special because we're
already in the emit phase so we shouldn't dirty barrier
again.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39308>