This is a performance tuning value, recommended value is 512 on DG2.
On DG2 this was in the privileged register RT_CTRL.
Minor CFE_STATE defintion fixes from Jose.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29616>
So we can enable each mask individually when programming registers.
Also setting Mask2/mask of the second double word so all registers in
it are also zeored during state init.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29616>
GFX versions older than GFX 20 have 'Thread Preemption disable' while
GFX 20 has 'Thread Preemption' with value flipped in compute walker
instruction.
So here by default enabling thread preemption, only disabling it
when BTD mode is enabled as instructed in Wa_14017794102.
Similar for 3DSTATE_BTD, enabling preemption by default and
only disabling when platform is affected by Wa_14017794102.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29616>
Right now we have a statically allocated vk_graphics_pipeline_state,
that we declare at pipeline_init, and fill at
pipeline_init_dynamic_state. This one can be static as right now it is
only needed during pipeline_init lifetime.
But to fill it, we need a vk_graphics_pipeline_all_state structure,
that right now we declare at pipeline_init_dynamic_state. But that one
become part of that vk_graphics_pipeline_state, so still needed at
pipeline_init.
This was detected when trying to refactor the code to use the
pipeline_state later on, but it raises an "invalid read" error using
valgrind with the current code. It is surprising that didn't cause any
problem.
Fixes: f2236065b7 ("v3dv: port dynamic state tracking to use Mesa Vulkan")
Cc: mesa-stable
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29603>
Vcn4 tile number calculation doesn't consider
some input case, which could result in output
bitstream corruption, this fixed the issue.
Not using the number_of_tiles directly but from
calculation. Also change to re-use some macros
from local vcn_5_0.c to ac_vcn_enc.h header file.
Updated vcn4 spec_misc_av1 ip package.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29556>
The downtime should be small, as this would only mean to stop/start services.
After some test, the reenable (revert this commit) will serve as final
verification.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29647>
Android is almost never compiled at the same time as another platform so
it doesn't change anything, but having that android branch is desirable
by the end of this series anyway, so let's do this :P
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29459>
The GL_HALF_FLOAT_OES-enum is about OES_texture_float, not e.g
ARB_texture_float. EXT_texture_rg does have an interaction that allows
this, but the other specs doesn't.
So let's tighten this. In reality, this shouldn't change any real
behavior, because we only support OES_texture_float in GLES contexts,
and in those we'd support EXT_texture_rg if we support RG textures in
the first place. But it makes the logic a bit clearer.
And just to be clear, the non-GLES version of the half-float enum does
not need the same check, because desktop GL supports all converions
here, and GLES 3 and later also requires RG-texture support in the first
place.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29528>
Checking if the format and type enums are valid while checking that the
combinations are valid makes the code hard to modify when there's
complex conditions at play. Let's factor these checks out so we can
stricten up this code in a more maintainable way.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29528>
GLES3 doesn't allow all the format/type combinations that
ARB_texture_rgb10_a2ui does, so let's tighten the error-checking here a
bit.
Fixes: b5a370dc25 ("mesa/main: do not allow ARB_texture_rgb10_a2ui enums before gles3")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29528>
The GL enums from this extension got promoted to GL 3.0, so we need to
test for it before we bump the GL version. If not, bad things will
happen when people try using them.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29528>
Since we now use the common vulkan runtime to handle pipeline state and
this sets a limit for this at MESA_VK_MAX_VERTEX_BINDING_STRIDE we should
do the same, or else we can run into an assert-fail in the runtime code.
Basically the same as !29454, but for Panfrost.
Fixes: 214761bdfe ("panvk: Fully transition to vk_vertex_binding_state")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29461>
In the case of buffer to image stores, we work around the limitation
for linear images by loading D/S data into a the color tile buffer
using a compatible format, however, this only works for formats with
a single aspect, for combined depth/stencil formats, since the copies
are specified to only copy a single aspect, we need to be able to
preserve the contents of the other aspect in the destination image,
and for that we still use the depth/stencil buffer, so we are affected
by the restriction.
Fixes some VK_KHR_maintenance5 CTS tests that hit this scenario,
such as some tests in:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.depth_stencil.2d_to_1d.*
In the case of image to image copies, we don't have any workarounds for
linear depth/stencil so we always want to skip the TLB path. I have not
seen any tests hit this scenario.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29597>
We can't render to linear depth/stencil formats but the blit shader
automatically converts D/S blits to compatible color blits where we
don't have this restriction.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29597>
Backend only recognizes the MUL a*0.5 pattern if there is an immediate
as one of the sources, however by then the source would we already
coverted to RC_FILE_NONE with constant half swizzles. So teach
peephole_omod to recognize this pattern as well.
RV530:
total instructions in shared programs: 128860 -> 128750 (-0.09%)
instructions in affected programs: 11942 -> 11832 (-0.92%)
helped: 106
HURT: 17
total presub in shared programs: 8739 -> 8736 (-0.03%)
presub in affected programs: 32 -> 29 (-9.38%)
helped: 3
HURT: 0
total omod in shared programs: 427 -> 1212 (183.84%)
omod in affected programs: 38 -> 823 (2065.79%)
helped: 0
HURT: 160
total temps in shared programs: 17544 -> 17554 (0.06%)
temps in affected programs: 70 -> 80 (14.29%)
helped: 0
HURT: 10
total lits in shared programs: 3153 -> 3159 (0.19%)
lits in affected programs: 9 -> 15 (66.67%)
helped: 0
HURT: 6
total cycles in shared programs: 191334 -> 191253 (-0.04%)
cycles in affected programs: 21240 -> 21159 (-0.38%)
helped: 101
HURT: 27
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
Consider the following case:
0: MUL temp[1].y, input[0]._x__, input[1]._y__;
1: MOV temp[1].x, input[0].x___;
2: MOV temp[1].z, const[0].__x_;
3: MUL temp[2].xyz, const[1].xxx_, temp[1].yxz_;
...
We correctly recognize that we can convert mul into omod for all three
instructions, however the mul swizzle was not handled correctly:
0: MUL temp[2].y / 2, input[0]._x__, input[1]._y__;
1: MOV temp[2].x / 2, input[0].x___;
2: MOV temp[2].z / 2, const[0].__x_;
...
Just create the conversion swizzle from the initial mul swizzle when rewriting
the original instruction writemasks.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
We add a cycles penalty when we see a begin tex and than subtract from
it based on when first alu comes that needs the results. However if the
only instruction in the TEX block is just KIL, we don't have to add any
penalty as nothing waits for it.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
There is a hole between SE1 and SE2 occupied by COMPUTE_TMPRING_SIZE.
Fixes: 3c8b48e310 ("ac,radeonsi: add a function to initialize compute preambles")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29622>