"unknown_2" field is actually a size of instruction that branch
points to. If it's set to a smaller size than actual instruction
branch behavior is not defined (and it usually wedges the GPU).
Fix it by setting this field correctly.
Fixes: af0de6b91c ("lima/ppir: implement discard and discard_if")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This corresponds to what the GC3000 blob does. The USED / UNUSED enums are
wrong, at least for GC2000/GC3000.
Without this the 3rd texture component is not interpolated correctly (flat?)
in the following test (and others):
dEQP-GLES2.functional.texture.mipmap.cube.generate.rgba8888_nicest
Strangely, when the texture is sampled from OpenGL it works correctly,
the problem only shows up for sampling by gallium/blitter. This fixes other
cube map tests which use util_blitter_blit.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The rs alignment doesn't have to be multiplied by # of pixel pipes.
This works on GC2000 which doesn't have the SINGLE_BUFFER feature.
This fixes some cubemaps (NPOT / small mipmap levels) because aligning by 8
breaks the expected alignment of 4 for tiled format. We don't want to mess
with the alignment of tiled formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The MIN filter is never used when not using mipmaps. This fixes that.
Interestingly, only GC3000 needs this (GC2000 works without this fix).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
ROUND_UV rounding breaks nearest filtering.
Enable it only when nearest filtering isn't used.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Fix this build error on Ubuntu 18.04.
/usr/bin/ld: src/util/libmesa_util.a(u_cpu_detect.c.o): undefined reference to symbol 'pthread_once@@GLIBC_2.2.5'
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110663
Suggested-by: Eric Engestrom <eric@@engestrom.ch>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
We don't even support replay anymore; this is just wasting characters
and adding clutter.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Mali job dependency graphs, at least for GLES3.0, have the special
property that a given node will only have at most a single dependent.
This allows us to efficiently precompute the dependent array and
replace an inner loop's O(N) search with an O(1) lookup, bringing the
algorithmic complexity of scoreboarding from O(N^2) to O(N).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that it has been totally replaced by the borrow mechanism, it is now
unused code.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The whole purpose of the transient memory model is to make subdivision
stupidly easy, so let's handle that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The batch now temporarily possesses the transient buffer, so it'll need
to remember that to free it later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We use a fixed size slab if we can, otherwise we create a dedicated
("oversized") BO and add that to the job. In the latter case we'll get
reference counting for free so we can forget about this corner case for
the rest of the series.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like transient allocations to occur on the screen (borrowed by
the batch) rather than on the context. Add fields to track this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The latter upload is correct, but the former upload is unassociated with
any particular FBO and therefore becomes orphaned. We do have to upload
at draw-time at the latest, if we haven't by then.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Zero-sized allocations will fail with an unhelpful errno from the
kernel; check size explicitly in userspace before it gets that far.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Don't rely on them being preinitialized to zero; this can cause junk to
appear on the wire.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A bunch of these are from asserts not being compiled in 32-bit mode
(once Erik's ASSERTABLE stuff is merged, we'll want to switch).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Until now we have always been emitting our scoreboard locks on the last thread
switch to improve parallelism. We did this by emitting our last thread switch
right before our tlb writes at the very end of the program, where we know that
we are outside control flow.
Unfortunately, this strategy is not valid when we have tlb color reads too, as
these will happen before this point in the program and can happen inside
control flow.
To fix this we always emit a thread switch before the first tlb load and if we
see additional thread switches after that point, we change the strategy to lock
on the first thread switch.
v2: change the solution so it is expected to work in more scenarios (Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
Kind of a funky corner case that does not (as far as I know) apply to
organic shaders from GLES but does pop up in generated shaders from the
fixed-function desktop pipeline.
Fixes: bb483a9166 ("panfrost: Clamp point size")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The screen header includes the common xml, and otherwise we might race
to build before it's done.
Fixes: e03259974e ("freedreno: Generate headers from xml files")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Fixes: 66411521ea ("etnaviv: combine translate_ts_sampler_format/translate_msaa_format")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Only Z24S8 is properly supported right now, so let's be careful. Fixes a
number of issues relating to improper Z/S handling. The most obvious is
depth buffers with incorrect strides, which manifests in truly bizarre
ways and can happen commonly with FBOs.
Fixes WebGL (Aquarium runs, etc).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We were failing to flag the program dirty when it changed. Also, we
were unnecessarily setting key->input_vertices for SINGLE_PATCH mode,
which would reduce program cache hits. Only set it if needed.
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data. I'd like to add another thing or two and need a
place to put it. This commit adds a new brw_base_prog_key struct which
contains those two common bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not clear the hardware really has a maximum which confuses dEQP;
clamp to whatever we report as our maximum.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In preparation for a Panfrost-based non-Gallium driver (maybe
Vulkan...?), hoist everything except for the Gallium driver into a
shared src/panfrost. Practically, that means the compilers, the headers,
and pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The reason for doing this is two-fold:
1. These passes are likely to be shared with the Bifrost compiler
Therefore, we don't want to restrict them to Midgard
2. The coding style is different (NIR-style vs Panfrost-style)
The NIR passes are candidates for moving upstream into
compiler/nir, so don't block that off for stylistic reasons
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Just use the #define instead.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>