When EGL_KHR_mutable_render_buffer extension is enabled, advertised
configs unconditionally include EGL_MUTABLE_RENDER_BUFFER_BIT_KHR bit.
However, f61337b5 starts requesting front rendering usage bit when
EGL_MUTABLE_RENDER_BUFFER_BIT_KHR is seen on the SurfaceType, which
essentially forces linear usage on all winsys BOs for gallium dri and
i965 drivers on Android when cros gralloc is in use.
This patch dynamically appends or strips the front rendering usage bit
depends on whether EGL_RENDER_BUFFER is EGL_SINGLE_BUFFER or
EGL_BACK_BUFFER. The next dequeuBuffer call will switch the buffer
sharing mode while re-allocating winsys BOs given the updated gralloc
usage bits if necessary.
v2: handle ANativeWindow_setUsage on error
Fixes: f61337b5 ("egl/android: check front rendering support for cros gralloc")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Rob Clark <robdclark@chromium.org> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11787>
Android.mk files haven't really been supported by Mesa devs for a long
time. Most of us have been willing to update Makefile.sources if we
remember and sometimes we try to blind code some Android.mk for a new
generator. However, the reality is that it breaks regularly and ends up
being maintained by the Android community. To address this problem
another approach was implemented in !10183 utilizing the maintained
meson build system. The old Android.mk files are no longer required.
This commit was created with the following commands:
git rm **/Android.mk
git rm **/Android.*.mk
git rm **/Makefile.sources
git rm CleanSpec.mk
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4487
Acked-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9728>
I think there is no reason to keep this disabled because it improves
viewperf and it might improve other things.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11754>
We should always prevent 1 CU from executing VS and GS waves
to prevent a deadlock.
Fixes: c377f45c18 "radeonsi/gfx10: rewrite late alloc computation"
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11754>
Always execute the bbox code regardless of negative W, and then simply
use || to discard the result if any W is negative. This is expected to be
rare. (it only happens when a primitive intersects the near plane)
This allows us to eliminate the else statement, which is no longer
executed for accepted primitives with negative W, which are the only
primitives that needed the else branch.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11754>
We say that they're for debug only but we don't really have a good
policy around when to set them and when not to. In particular,
nir_lower_system_values and nir_lower_vars_to_ssa which are the chief
producers of SSA values which might reasonably have a name do not bother
to set one. We have some names set from things like BLORP and RADV's
meta shaders but AFAICT, they're setting a name more because it's there
than because they actually care.
Also, most things other than nir_clone and nir_serialize don't bother to
try and preserve them. You can see in the diffstat of this commit
exactly what passes attempt to preserve names. Notably missing from the
list is opt_algebraic which is the single largest source of SSA def
churn and it happily throws names away.
These observations lead me to question whether or not names are actually
useful at all or if they're just taking up space (8B per instruction)
and wasting CPU cycles (to ralloc_strdup on the off chance we do have
one). I don't think I can think of a single time in recent history
where I've been debugging a shader issue and a SSA value name has been
there and been useful. If anything, the few times they are there, they
just throw me off because they mess up the indentation in nir_print.
iris shader-db on my system gets runtime -2.07734% +/- 1.26933% (n=5)
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5439>
The intention is to pick the system memory for the prime blit dst, but
that is not possible when all memory types are advertised to be local.
This fixes venus over vtest (i.e., unix socket) because the driver
provides no PCI bus info and wsi_device_matches_drm_fd returns false. A
driver might also use can_present_on_device to force prime blit.
Fixes: 469875596a ("vulkan/wsi: Fix prime blits to use system memory for the destination")
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11774>
This prevents the previous commit from being undone by the jump
optimizations in legalize, and fixes another potential case where
instead of a continue we have an if/else at the end of a loop.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
When loops have continue statements, it's expected that when we execute
a divergent continue (i.e. a continue where not all of the threads
active at the start take it) we keep going with the rest of the loop
body and then reconverge at the start of the next iteration. However the
Adreno ISA seems to always take a branch that jumps backwards, assuming
it's the bottom of a loop, so we get a different, undesired convergence
behavior. There's no way I know of to control this behavior in the
instruction set, so we have to instead insert a "continue block" at the
end of the loop where continue statements reconverge which then jumps
back to the top of the loop. Since this doesn't correspond 1:1 with any
NIR block we have to make control flow handling in NIR->ir3 a bit more
complicated, unfortunately.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
When we go to split e.g. a p0.x producer, the only other instructions
ready to schedule are often only p0.x producers. It could happen that
they all have a lower priority than the split instruction. Then we would
immediately schedule the split instruction again, then again try to
schedule one of the other producers, be blocked, and split it, around
and around again, leading to an infinite loop. The following commit
triggered this with
dEQP-GLES3.functional.shaders.discard.dynamic_loop_always on a3xx.
Fixes: d2f4d33 ("freedreno/ir3: new pre-RA scheduler")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
MOVMSK is a bit of a special case, because it takes multiple cycles (and
therefore reduces the nops needed if it's between some other assigner
and consumer) however weird things happen if you try to start reading
the first component while it isn't finished yet. On balance making it
use repeat seems to result in a fewer special cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
I realized that shared registers were never actually getting folded,
even after adding them to valid_flags, because the move wasn't even
being considered.
I looked at the other uses of is_same_type_mov(), and they should be ok
with this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
This fixes a pre-existing bug in ir3, but it showed up even more due to
other changes in this series and it interacts with the logical/physical
CFG split. When both sides of an if end with a jump, a block may become
unreachable via the logical CFG, which can cause problems because it has
no predecessors to figure out the location of live-in non-shared
values. In this case we assume that nir_opt_if has removed any code in
these blocks and just skip processing live-ins for these blocks,
pretending that they aren't live.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
The way that the blob obtains the subgroup id on compute shaders is by
just and'ing gl_LocalInvocationIndex with 63, since it advertizes a
subgroupSize of 64. In order to support VK_EXT_subgroup_size_control and
expose a subgroupSize of 128, we'll have to do something a little more
flexible. Sometimes we have to fall back to a subgroup size of 64 due to
various constraints, and in that case we have to fake a subgroup size of
128 while actually using 64 under the hood, by just pretending that the
upper 64 invocations are all disabled. However when computing the
subgroup id we need to use the "real" subgroup size. For this purpose we
plumb through a driver param which exposes the real subgroup size. If
the user forces a particular subgroup size then we lower
load_subgroup_size in nir_lower_subgroups, otherwise we let it through,
and we assume when translating to ir3 that load_subgroup_size means
"give me the *actual* subgroup size that you decided in RA" and give you
the driver param.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
On qualcomm, we have shared registers similar to SGPR's on AMD. However,
there is no readlane or readfirstlane primitive. shared registers can
only be written to when just one lane is active. This means that we have
to lower readInvocation(val, id) to something like:
if (gl_SubgroupInvocation == id) {
scalar_reg = val;
}
return scalar_reg;
However it's a bit difficult to actually get the value of
gl_SubgroupInvocation in the backend, because for compute it requires
some calculations and we don't have any CSE support in the backend. This
intrinsic lets us turn it into
"readInvocationCond(val, id == gl_SubgroupInvocation)" in NIR at which
point the backend code generation is a lot easier.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
Qualcomm has a mode with a subgroup size of 128, so just emitting larger
integer operations and then lowering them later isn't an option. This
makes the pass able to handle the lowering itself, so that we don't have
to go down to 64-thread wavefronts when ballots are used.
(The GLSL and legacy SPIR-V extensions only support a maximum of 64
threads, but I guess we'll cross that bridge when we come to it...)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
Lower it to a vote instead of a ballot. This was only used for AMD, and
in that case they're pretty much the same. However Qualcomm has a vote
builtin, which we want to use instead of ballots.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>