The GL enums from this extension got promoted to GL 3.0, so we need to
test for it before we bump the GL version. If not, bad things will
happen when people try using them.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29528>
Since we now use the common vulkan runtime to handle pipeline state and
this sets a limit for this at MESA_VK_MAX_VERTEX_BINDING_STRIDE we should
do the same, or else we can run into an assert-fail in the runtime code.
Basically the same as !29454, but for Panfrost.
Fixes: 214761bdfe ("panvk: Fully transition to vk_vertex_binding_state")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29461>
In the case of buffer to image stores, we work around the limitation
for linear images by loading D/S data into a the color tile buffer
using a compatible format, however, this only works for formats with
a single aspect, for combined depth/stencil formats, since the copies
are specified to only copy a single aspect, we need to be able to
preserve the contents of the other aspect in the destination image,
and for that we still use the depth/stencil buffer, so we are affected
by the restriction.
Fixes some VK_KHR_maintenance5 CTS tests that hit this scenario,
such as some tests in:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.depth_stencil.2d_to_1d.*
In the case of image to image copies, we don't have any workarounds for
linear depth/stencil so we always want to skip the TLB path. I have not
seen any tests hit this scenario.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29597>
We can't render to linear depth/stencil formats but the blit shader
automatically converts D/S blits to compatible color blits where we
don't have this restriction.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29597>
Backend only recognizes the MUL a*0.5 pattern if there is an immediate
as one of the sources, however by then the source would we already
coverted to RC_FILE_NONE with constant half swizzles. So teach
peephole_omod to recognize this pattern as well.
RV530:
total instructions in shared programs: 128860 -> 128750 (-0.09%)
instructions in affected programs: 11942 -> 11832 (-0.92%)
helped: 106
HURT: 17
total presub in shared programs: 8739 -> 8736 (-0.03%)
presub in affected programs: 32 -> 29 (-9.38%)
helped: 3
HURT: 0
total omod in shared programs: 427 -> 1212 (183.84%)
omod in affected programs: 38 -> 823 (2065.79%)
helped: 0
HURT: 160
total temps in shared programs: 17544 -> 17554 (0.06%)
temps in affected programs: 70 -> 80 (14.29%)
helped: 0
HURT: 10
total lits in shared programs: 3153 -> 3159 (0.19%)
lits in affected programs: 9 -> 15 (66.67%)
helped: 0
HURT: 6
total cycles in shared programs: 191334 -> 191253 (-0.04%)
cycles in affected programs: 21240 -> 21159 (-0.38%)
helped: 101
HURT: 27
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
Consider the following case:
0: MUL temp[1].y, input[0]._x__, input[1]._y__;
1: MOV temp[1].x, input[0].x___;
2: MOV temp[1].z, const[0].__x_;
3: MUL temp[2].xyz, const[1].xxx_, temp[1].yxz_;
...
We correctly recognize that we can convert mul into omod for all three
instructions, however the mul swizzle was not handled correctly:
0: MUL temp[2].y / 2, input[0]._x__, input[1]._y__;
1: MOV temp[2].x / 2, input[0].x___;
2: MOV temp[2].z / 2, const[0].__x_;
...
Just create the conversion swizzle from the initial mul swizzle when rewriting
the original instruction writemasks.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
We add a cycles penalty when we see a begin tex and than subtract from
it based on when first alu comes that needs the results. However if the
only instruction in the TEX block is just KIL, we don't have to add any
penalty as nothing waits for it.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
There is a hole between SE1 and SE2 occupied by COMPUTE_TMPRING_SIZE.
Fixes: 3c8b48e310 ("ac,radeonsi: add a function to initialize compute preambles")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29622>
This will let us reuse the bulk of this code in a new copy propagation
pass without replicating it. We retain a wrapper function for dealing
with ACP entries, which the new pass won't have.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
This was for Sandybridge's IF with embedded comparison, which only
existed for a single generation of hardware. Since the compiler fork,
we no longer support Sandybridge here.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
We were missing the following "newer" fields:
- ex_desc
- predicate_trivial
- sdepth
- rcount
- writes_accumulator
- no_dd_clear
- no_dd_check
- check_tdr
- send_is_volatile
- send_ex_desc_scratch
- send_ex_bso
- last_rt
- keep_payload_trailing_zeroes
- has_packed_lod_ai_src
We can actually just check ex_desc and the new "bits" union to handle
most of them with fewer checks.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
I want to be able to hash an fs_reg, including all the brw_reg fields.
It's easiest to do this if I can use the "bits" union field that
incorporates many of the other ones.
We also move the using declaration for "nr" down because that field was
moved to the second section a while back.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
Pass in the nir_src and check if it's constant, handling it via CPU-side
arithmetic instead of emitting instructions. While we can constant fold
these via our optimization passes, we have to do opt_algebraic to fold
the binary operation with constant sources into a MOV of an immediate,
then opt_copy_propagation to put it in the next expression, and so on,
until the entire expression is folded. This can take several iterations
of the optimization loop, which is inefficient.
For example, gfxbench5/aztec-ruins/normal/7 has load/store_scratch
intrinsics with constant sources, and this patch removes a number of
optimization passes according to INTEL_DEBUG=optimizer.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
Geometry shaders write outputs multiple times, with EmitVertex()
between them. The value of output variables becomes undefined after
calling EmitVertex(), so we don't need to preserve those. This lets
us recreate new registers after each EmitVertex(), assuming we aren't
in control flow, allowing them to have separate live ranges. It also
means that those registers are more likely to be written once, rather
than having multiple writes, which can make optimization easier.
This is pretty much a total hack, but it's helpful.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
This is faster for 8 samples because it forms a VMEM clause, unlike
the default shader.
It also uses 16-bit types in the shader when possible and averages fewer
components if the format has less than 4.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>