This fixes the ds clears path to clear only depth or stencil
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9971>
After the previous change, PASS 1 can be trivially pulled out of the
loop.
With PASS 1 removed, the loop can be unrolled, and a lot of code can be
deleted (from the unrolls). This saves a couple lines of code, and it
makes the function a little easier to follow.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9867>
Things that are not dynamically indexed must be added last. This is
necessary so that values that are both statically indexed (or used
directly) and dynamically indexed will only be added once. With the
above change, if the constant 47 is used as a literal in an instruction
and in an array that is dynamically indexed, it will be added to
`Parameters` twice. On (really old) GPUs that store constants and other
parameters in the same storage, this can cause some valid programs to
exceed the storage limits. I don't know about R300 or NV30, but R200
was limited to something like 256 vec4s. This applies to constants,
state parameters, and local parameters (the assembly shader version of
uniforms).
The problem this causes here is that the final parameter layout created
in `_mesa_layout_parameters` may have more parameters than the input
layout. The fundamental assumption of that routine (and documented as
an assumption of `copy_indirect_accessed_array`) is that the input size
and the output size will be the same.
The affected shader had something like below. This is a common pattern
for ARB assembly shaders generated by NVIDIA's cgc compiler. As far as
I can tell, the majory of applications that use ARB assembly shaders
either use cgc or use some sort of DX9 crosscompiler... that generates
similar patterns.
PARAM c[141] = { program.local[0..133],
{ 255, 0.1, 3, 1 },
{ 0.5, 2, 0.15915491, 0.25 },
{ 0, 0.5, 1, -1 },
{ 24.980801, -24.980801, -60.145809, 60.145809 },
{ 85.453789, -85.453789, -64.939346, 64.939346 },
{ 19.73921, -19.73921, -9, 0.75 },
{ -999999 } };
The shader contains instructions like
MUL R0.x, R0, c[135].y;
and
DP4 R2.z, c[A0.x + 6], R1;
Starting with b9bff76b63, the constants at the end of `c` would get
added to `Parameters` twice. The first time they are added due to
instructions that directly access the array (e.g., the `c[135].y`
above). The second time is because they are part of an array that is
dynamically indexed. As a result, the final layout of Parameters
(calculated by `_mesa_layout_parameters`) is 7 elements larger than the
input layout.
Since bcc61a01d4 fixed the allocation size of `ParameterValues`,
`copy_indirect_accessed_array` will now write past the end of the array.
The eventually results in a crash in `free`. Thankfully Valgrind was
able to help find the real source of the problem.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: b9bff76b63 ("mesa: put constants before state vars for ARB programs")
Closes: #4505
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9867>
Ran into this while trying to rework fbconfig setup, due to a bug I
ended up trying to allocate a PIPE_FORMAT_NONE framebuffer, which failed
like you'd hope, but which we weren't converting into an error in
st_api_make_current. Instead we'd treat it like binding no drawable to
the context, which is really not what was asked for, so let's go ahead
and make this an error.
Reviewed-by: Eric Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9956>
Not sure what this was supposed to do, but whatever it did, it doesn't.
Reviewed-by: Eric Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9956>
A650 can use the same SSBO descriptor for both 32-bit and 16-bit access,
which makes it easy to enable this extension.
Passes tests that run under:
dEQP-VK.spirv_assembly.instruction.*.16bit_storage.*
Rebased and modified commit from Jonathan Marek.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
When float16 is enabled this will allow to pass a number of
float16 tests.
When A6XX_SP_FLOAT_CNTL_F16_NO_INF is set - all operations which
generate +-infinity generate +-MAX_HALF_FLOAT.
Fixes some tests from:
dEQP-VK.spirv_assembly.instruction.*.float16.*
dEQP-VK.spirv_assembly.instruction.*.float_controls.fp16.*
E.g.:
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.sinh_vert
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_4.length
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log_denorm_flush_to_zero_nostorage
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log2_denorm_flush_to_zero_nostorage
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.inv_sqrt_denorm_flush_to_zero_nostorage
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
NIR has shifts defined as:
opcode("*shr", 0, tuint, [0, 0], [tuint, tuint32], False, ...
However, in ir3 we have to ensure that both operators of shift
instruction have the same bitness.
Let's hope that in future the additional COV for constants would
be optimized away.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
This matches the blob and doesn't require actually implementing controls
since the supported modes are just what the HW does.
Passes tests under:
dEQP-VK.spirv_assembly.*.float_controls.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
cat1 instructions round to zero by default.
When fp16 is enabled this will fix:
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage_frag
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage_vert
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
Broken out of VK_GOOGLE_display_timing patch
Cc: stable
Co-author: Jakob Bornecrantz <jakob@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9939>
Per OpenGL 4.6 spec:
"If no xfb_stride qualifier is specified for a
binding point, the stride is derived by identifying the variable associated with the
binding point having the largest offset, and then adding the offset and the size of
the variable, in basic machine units. If any variable associated with the binding
point contains double-precision floating-point components, the derived stride is
aligned to the next multiple of eight basic machine units. If a binding point has no
xfb_stride qualifier and no associated output variables, its stride is zero."
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2333>
1) Per GL_ARB_enhanced_layouts if explicit location is set for varying,
each struct member, array element and matrix row will take
separate location. With GL_ARB_gpu_shader_fp64/GL_ARB_gpu_shader_int64
they may take two locations.
Examples:
| layout(location=0) dvec3[2] a; | layout(location=4) vec2[4] b; |
| | |
| 32b 32b 32b 32b | 32b 32b 32b 32b |
| 0 X X Y Y | 4 X Y 0 0 |
| 1 Z Z 0 0 | 5 X Y 0 0 |
| 2 X X Y Y | 6 X Y 0 0 |
| 3 Z Z 0 0 | 7 X Y 0 0 |
Previously it wasn't taken into account.
2) Captured double-precision variables should be aligned to
8 bytes per GL_ARB_gpu_shader_fp64:
"If any variable captured in transform feedback has double-precision
components, the practical requirements for defined behavior are:
...
(c) each double-precision variable captured must be aligned to a
multiple of eight bytes relative to the beginning of a vertex."
v2: fix `output_size` calculations
( Andrii Simiklit <andrii.simiklit@globallogic.com> )
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1667
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2333>
When packing varyings when there is only 32bit of space
left in slot 64bit type is attempted to be divided between
current and next slot. However there is neither code for
splitting the 64bit type nor for assembling it back.
Instead we add 32bit padding.
The above happens only in structs because their
members aren't sorted by packing order.
Example of the issue:
struct S {
vec3 a;
double d;
};
out flat S s;
Before, the packing went as:
slot 32b 32b 32b 32b
0 a.x a.y a.z d
1 d 0 0 0
After:
slot 32b 32b 32b 32b
0 a.x a.y a.z 0
1 d d 0 0
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2333>
The merged image contains kernels & rootfs for both arm64 & armhf
baremetal test jobs, and is smaller than either arm{64,hf}_test image
before.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
Doing so in an x86 container via qemu was slow, and started failing
recently after updating to a newer qemu version.
This also results in smaller arm*_test* docker images, since we need to
install fewer Debian packages in them.
As a bonus, this turns some piglit tests from fail to pass (Or maybe
they'll turn out to be flakes? They've passed at least 3 times in a
row).
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
We use the CI-built kernel+rootfs these days. I haven't bumped image tags
because the files are definitely unused, and I'm rebuilding it all in the
next commit.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
If the window is destroyed on a thread that has a currently-bound
context, use that context for destroying the framebuffer. This
ensures that the winsys can wait for in-flight work before
destroying any resources.
If the window did have a context bound beforehand but it was unbound,
we should've already done a glFinish. If the window is destroyed from
an unrelated thread... well, we're screwed, but that's the best we can do.
Reviewed-By: Bill Kristiansen <billkris@microsoft.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9959>
this is not only more correct according to vk spec, it avoids having a 0-sized
layer_stride, which totally breaks the transfer map
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9969>
many apps don't request device-lost notification, so just calling the reset
callback isn't enough; once the device has been lost, no more cmdbufs should
be submitted and the queue should not be waited on
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9963>
finishing a timeline wait guarantees that a given fence has completed,
meaning the accompanying batch static is implicitly available for use
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9963>
we can avoid locking to access batch states in these cases by just using a semaphore
to fast-forward the gpu to the batch we need completed
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9963>
now that there's some tracking for the last-finished batch id, this can
be used to detect when an application holds onto a sync object for way too long,
to the point that the sync object has expired so far into the past that we
no longer have any record of it existing
fixes things like unigine superposition
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9963>
nir_assign_io_var_locations() does not use outputs_written when assigning
driver locations. Use driver_location to avoid incorrectly guessing what
locations it assigned.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8364>