This is a gl4.3 issue very similar to e8fa3b4950.
The mode r10g10b10a2_sscaled processed as vertex on palm at the
hardware level doesn't follow the current standard. Indeed, the .w
component (2-bits) is not calculated as expected. The table below
describes the situation.
This change fixes this issue by adding two gpu instructions at
the vertex fetch shader stage. An equivalent C representation and
a gpu asm dump of the generated sequence are available below.
.w(2-bits) expected palm cypress
0 0 0 0
1 1 1 1
2 -2 2 -2
3 -1 3 -1
w_out = w_in - (w_in > 1. ? 4. : 0.);
0002 00000024 A0040000 ALU 2 @72
0072 801F2C0A 600004C0 1 w: SETGT*4 __.w, R10.w, 1.0
0074 839FCC0A 61400010 2 w: ADD R10.w, R10.w, -PV.w
Note: cypress returns the expected value, and does not need
this correction.
This change was tested on palm, barts and cayman. Here are the tests fixed:
khr-gl4[3-6]/vertex_attrib_binding/basic-input-case6: fail pass
khr-gles31/core/vertex_attrib_binding/basic-input-case6: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38849>
(cherry picked from commit 2ed761021f)
Using a PV register which is not PV.x, after a dot4 operation,
does not work on rv770. Anyway, this does work on evergreen
but this is not documented.
This change updates this behavior for all the r600 gpus
which fixes the issue on rv770. It adds max4 which has the
same requirement in the case of max4 being implemented.
Here are some of the affected tests on rv770:
piglit/bin/fp-abs-01 -auto -fbo
glcts --deqp-case=KHR-GL31.buffer_objects.triangles
piglit/bin/shader_runner generated_tests/spec/glsl-1.10/execution/built-in-functions/fs-distance-vec2-vec2.shader_test -auto -fbo
Fixes: 942e6af40b ("r600/sfn: use PS and PV inline registers when possible")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39101>
(cherry picked from commit da1108dcc4)
The functionality was working properly at glMinSampleShading(0.)
and glMinSampleShading(1.). The issue was with the intermediary
values. This change makes this function compatible with the
evergreen setup.
Note: this was one of the few functionalities which were working
properly on evergreen but not on cayman.
Here are the tests fixed:
spec/arb_sample_shading/samplemask 4 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 4/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 6 all/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 6 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 6/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 6/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 8 all/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 8 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 8/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 8/0.500000 partition: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_rbo_4: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_rbo_8: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_texture_4: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_texture_8: fail pass
Fixes: f7796a966d ("radeonsi: add basic code for overrasterization")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38615>
(cherry picked from commit d5d844bfc4)
now that transient images are a more complete mechanism, this should
in theory be okay and also accounts for the case where
a framebuffer contains mixed msrtt textures and plain multisampled textures
(cherry picked from commit 6474af3b42)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39469>
Now that all larger workgroup sizes are lowered to 256,
the regalloc hang cannot mess up the compute queues anymore.
Still don't allow compute queues on GFX6 though,
those have never been enabled ever since RadeonSI started using
the compute queue in a1378639ab - let's keep it that way.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
Even though radeonsi may not use compute queues, other processes
might run compute jobs in the background, so radeonsi must make
sure not to use larger than 256 sized workgroups on GPUs that
are affected by the regalloc hang.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
Even though radeonsi may not use compute queues, other processes
might run compute jobs in the background, so radeonsi must make
sure not to use larger than 256 sized workgroups on GPUs that
are affected by the regalloc hang.
Unfortunately that means that for now RadeonSI won't be able to
support ARB_compute_variable_group_size on these GPUs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
We don't need to take ETNA_DIRTY_SHADER into consideration for pure
updates of the constant states. When the shader is dirty constants
and code will be uploaded together and the update path will be skipped.
The uniform cache in the context has been removed in ee1ed59458
("etnaviv: prep for UBOs"), so the comment referencing this cache
is confusing and can go as well.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39422>
Constant buffers may be changed without the shader changing.
Check the correct dirty bits when marking constant buffers
as read during the draw to ensure proper synchronization.
Fixes: a40a6e551e ("etnaviv: draw: only mark resources as read/written when the state changed")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39422>
PAL always set WD_SWITCH_ON_EOP for pre gfx10 when primitve
restart is enabled to prevent gpu hang.
It only happens when specific index stream with primitive
restart. Since we don't know what's the exact problem,
just follow PAL to disable 4x primitive rate when primitive
restart is enabled.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14629
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39292>
This replaces all full lisence headers with SPDX identifiers and
generally makes things more consistent. I've also dropped the few
remaining author tags. If someone wants to know who wrote a bit of
code, `git blame` is going to be way more accurate than author tags
anyway.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39397>
Without this, non-dynamically-supported state changes that require a pipeline
change (like blend states without full_ds3) that happen in between drawcalls
get ignored unless another one of the conditions also happened to be true.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39381>
aco implements the same logic, and in the future it will make changes to
config->float_mode to avoid unnecessary s_setreg.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38815>
si_sqtt_start / si_sqtt_stop use emit_barrier which clears barriers_flags.
Since these functions are used to build an auxiliary cs which will only
be emitted later (on sqtt enablement/disablement) it shouldn't clear
the global barrier_flags value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39308>
The pattern:
ctx->barrier_flags |= ...;
si_mark_atom_dirty(sctx, &sctx->atoms.s.barrier);
is used a lot, let's add an inline helper. This prevents
forgetting the call to si_mark_atom_dirty.
si_upload_bindless_descriptors is special because we're
already in the emit phase so we shouldn't dirty barrier
again.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39308>
Now that we have intrinsics which map directly to the hardware opcodes,
we can lower PLS inside the gallium driver instead of the back-end
compiler having to know anything about it. This simplifies the back-end
and is less code, if you ignore the new copyright header.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
a longstanding issue in zink has been the scenario where a dmabuf is
created for e.g., RGBA8888, then the app tries to do SRGB, but the driver
doesn't support mutable formats with the dmabuf modifier. in this scenario, the app
would either crash or break unpredictably
by reusing the existing transient mechanism (previously only for msrtss emulation),
these dmabufs can instead have a shadow image which handles mutable formats and
then syncs back to the main image when necessary
this should greatly improve the situation on e.g., Intel
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39336>