We were diligently setting Wayland buffers as non-busy, but nowhere in
the code did we set them to busy when submitted to the server. This
meant that acquire_next_image would only ever find the same buffer in
a loop, over and over.
Signed-off-by: Daniel Stone <daniels@collabora.com>
In acquire_next_image, we are waiting for a wl_buffer::release to arrive
and release one of the buffers in our swapchain. Most compositors don't
explicitly flush release events, so we may need to perform a roundtrip
instead, to ensure the event arrives.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Apparently, nobody has combined stippling with a fragment shader
containing immediates in almost five years...
Fixes a bug in Kodi with radeonsi reported by Christian König.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Sanity check of BindVertexBuffer for OpenGL ES in
_mesa_handle_bind_buffer_gen breaks OpenGL ES 2 conformance.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93426
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Print the stream value not the pointer to the expression,
also use the unsigned format specifier.
Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
SPIR-V only has one ImageQuerySize opcode that has to work for both
textures and storage images. Therefore, we have to special-case that one a
bit and look at the type of the incoming image handle.
Image intrinsics always take a vec4 coordinate and always return a vec4.
This simplifies the intrinsics a but but also means that they don't
actually match the incomming SPIR-V. In order to compensate for this, we
add swizzling movs for both source and destination to get the right number
of components.
The whole point of inlining sources is to reduce loads. We can end up in
a situation where one value is used a lot of times, and one value is
used only once per instruction. The once-per-instruction one is the one
that should get inlined, but with the previous algorithm, it was given
no preference.
This flips things around to preferring putting less-referenced values
into src1 which increases the likelihood of them being inlined.
While we're at it, adjust the heuristic to not treat 0 as an immediate,
as well as (effectively) check for situations where LIMMs can't be
loaded. All this yields improvements on nvc0:
total instructions in shared programs : 6261157 -> 6255985 (-0.08%)
total gprs used in shared programs : 945082 -> 943417 (-0.18%)
total local used in shared programs : 30372 -> 30288 (-0.28%)
total bytes used in shared programs : 50089256 -> 50047880 (-0.08%)
local gpr inst bytes
helped 21 822 3332 3332
hurt 0 278 565 565
And more importantly avoids generating really bad code with SSBOs, where
we end up checking a lot of different values (usually immediates) against
the length.
On nv50 we get comparable results, and even improve packing (bytes went
down more than instructions):
total instructions in shared programs : 6346564 -> 6341277 (-0.08%)
total gprs used in shared programs : 728719 -> 725131 (-0.49%)
total local used in shared programs : 3552 -> 3552 (0.00%)
total bytes used in shared programs : 43995688 -> 43932928 (-0.14%)
local gpr inst bytes
helped 0 1380 3252 3774
hurt 0 287 1710 1365
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
An issue could still occur if the base level is set, but fixing that
would require a lot more logic.
This fixes the recently-failing texelFetch 3D tests because the mipmaps
were no longer being generated, which in turn caused the copying logic
to be hit, which in turn didn't work because of the broken
width/height/depth.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The Vulkan spec requires all allocations that happen for device creation to
happen with SCOPE_DEVICE. Since meta calls into other things that allocate
memory, the easiest way to do this is with an allocator.
Once we go past half of the "GPR" register file, it seems like we need
to run frag shader with smaller threadsize. (The vertex shader already
runs at TWO_QUADS, which is the minimum.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled)
version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per
generation. Switch a3xx and a4xx over to using prefetch-enabled version
(which is also what blob does.. it seems only on a2xx we cannot use
PFE).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The border color packet is specified as a 64-byte aligned address relative
to dynamic state base address. The way the packing functions are currently
set up, we need to provide it with (offset >> 6) because it just shoves the
bits in where the PRM says they go and isn't really aware that it's an
address.
The __gen_fixed helper properly clamps the value and also handles negative
values correctly. Eventually, we need to make the scripts generate this
and use it for more things.
Instead of emitting the continue before the loop body we emit it
afterwards. Then, once we've finished with the entire function, we run
nir_repair_ssa to add whatever phi nodes are needed.
The efficiency should be approximately the same. We do a little more work
per phi node because we have to sort the predecessors. However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis. Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.
And only call it from r600_invalidate_resource for buffer resources.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This patch fixes a bug when building a pack instruction.
For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.
This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale
I've also corrected some coding style issues in the function.
v2: Returned else statements to vmware style
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
One of the oglconform tests was crashing here, and it was
due to not cloning the actual parameters before creating the
new call. This makes a call clone function that does the right
things to make sure we clone all the needed info, and points
the callee at it. (It differs from ->clone due to this).
this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this
patch in my cts fixes tree, but hadn't had time to make sure I liked it.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Any duplicates in a single declaration will already fail the
generic duplicates test due to the explicit_stream flag being set.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This will allow the ARB_shading_language_420pack rules in
glsl_parser.yy for catching duplicate layout qualifiers to be
triggered for the stream identifier rather than relying on the
code meant to catch duplicates within a single layout(...)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL
4.5 Core spec:
"The command
void UniformSubroutinesuiv(enum shadertype, sizei count,
const uint *indices);
will load all active subroutine uniforms for shader stage
shadertype with subroutine indices from indices, storing
indices[i] into the uniform at location i. The indices for
any locations between zero and the value of
ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not
used will be ignored."
V2: simplify NULL check suggested by Jason.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.orghttps://bugs.freedesktop.org/show_bug.cgi?id=93731
Apparently the IPA op decided to stop working with offsets. Need to
figure out if we need to do an AL2P situation or something similar. For
now just turn it back off.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Spotted by Coverity.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>