Use "util/detect_os.h" instead of "pipe/p_config.h" and "pipe/p_compiler.h"
in src/util/os_mman.h
This is a prepare to implement os_mman on windows
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19645>
While Panfrost allocates linear images with strides that are a multiple of 64
bytes, other dma-buf producers on the system may not satisfy this requirement.
However, at least on v7 and newer, any image with a regular format must have a
stride that is a multiple of 64 bytes.
This fixes a real bug in an application that created a linear R8_UNORM image
with stride 480 bytes, imported it as an EGL_image, and then tried to texture
from it with the GPU. Previously, the driver allowed this situation but it
resulted in an imprecise fault from the GPU. This patch corrects the driver to
reject the import as invalid due to the unaligned stride, ensuring we never
attempt to texture from such a resource.
To implement, we add some new layout queries to centralize knowledge about the
stride alignment requirements, and we sprinkle in asserts to show how the
invariant is upheld throughout the lifecycle of image creation to texturing.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19620>
For 2D UI workloads and even most 3D workloads, the indirect dispatch shader
won't actually be needed, but we currently compile it during eglInitialize() on
every v7 application. That hurts app start-up time, especially given that this
shader doesn't hit the disk cache. We can instead defer compiling this shader
until it's actually needed, when glDispatchComputeIndirect() gets called.
The tradeoff is that the first glDispatchComputeIndirect() call will be (much)
slower than successive calls, since we need to build and compile this internal
shader. I'm unconvinced that's a problem in practice.
An app would need to call glDispatchComputeIndirect for the first time in the
middle of a scene. 2D apps never would call that, OpenCL doesn't have that, and
GL compute will have the same costs just moved around. So it's down to a 3D
GLES3.1 app that indirectly dispatches compute for the first time time in the
middle of a scene. Which, meh? It's not entirely implausible but we have bigger
fish to fry, and this fixes a real problem (about 5% of eglInitialize time spent
building this shader that won't actually get used).
es2_info starts slightly faster with this change.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19622>
Without additional signalling of modifiers, CRCs cannot possibly in a correct
way work across process boundaries. Since we don't do that signalling, we should
not be allocating private CRCs for imported resources, and we should not be
using our own private CRCs for internal resources.
The entire out-of-bands CRC infrastructure is a hack to let us do CRCs even for
imported/exported BOs, but that can't possibly work. Remove it, and remove a
pile of special cases across the driver.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19576>
We have no vertex shader key, and unless legacy GL features are used, the
fragment shader key is known ahead-of-time. That means we can precompile shaders
at CSO create time, hopefully avoiding some draw-time jank.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19363>
NIR deemphasizes nir_variable. We want to transition off it. Instead of walking
the list of variables and playing games with the GLSL types to collect varying
information, walk the list of instructions and use the I/O semantics to collect
similar information.
In addition to avoiding the reliance on nir_variable, this fixes handling of
struct varyings under certain circumstances. Such programs are compiled by the
GLES3.1 CTS but not used, so without this fix, the affected tests would regress
when precompiling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19363>
Bifrost onwards handle this in hardware, and the Midgard lowering isn't
too terrible. Enable the format, otherwise desktop GL apps such as
Hacknet try to render to the format and get an incomplete framebuffer.
Cc stable because apparently we've been advertising this format
unintentionally as a result of some other interaction? Unclear how
Hacknet is hitting this, maybe it's an app bug. Shrug, it's not a big
deal regardless.
Additionally, we need to restrict texturing from 32-bit normalized due
to a restriction added with the v7 pixel format fiasco. That means
restricting rendering to 32-bit normalized on v7 onwards.
Closes: #7251
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Dang Huynh <danct12@disroot.org>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19358>
gl_PointCoord is implemented via a special attribute descriptor on Midgard. This
descriptor has an orientation bit, the orientation is driver-controlled. That
means we can map rast->sprite_coord_mode to this bit, rather than lowering in
the shader.
This is a bug fix for point sprites, which are implemented natively on Midgard
for dubious reasons and need to be flipped this way. It is also an optimization
for apps reading gl_PointCoord, removing the extra arithmetic to flip, although
the value of this is somewhat dubious.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19237>
This function is only used if PAN_ARCH >= 5
Fixes a clang warning about unused static inlined functions.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18800>
Fixes math_bruteforce.atan2 and contractions tests.
For OpenCL, we want to flush fp32 and preserve fp16, applying to both inputs and
outputs so F16_TO_F32 acts as preserve, which implements CL spec text:
> Denormalized numbers for the half data type which may be generated when
converting a float to a half using vstore_half and converting a half to a float
using vload_half cannot be flushed to zero
Note that our libclc builds flush denorms and rusticl does not advertise denorms
so we're expected to flush to zero. rusticl correctly sets the desired float
controls, we just have to match to the hardware requirements.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
Indirect dispatch does not actually require any dynamic memory allocation, even
with shared memory. We just need to set wls_instances to some (mostly arbitrary)
value, statically allocate memory based on that, and let the hardware throttle
workgroups to fit if needed.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18661>
If memory allocation fails, we look for a suitable sized BO in the BO cache and
wait until we can use its memory. That usually works, but there's a case when it
can fail despite sufficient memory in the system: BOs in the BO cache
contributing to memory pressure but none of them being of sufficient size. This
case is not just theoretical: it's seen in the OpenCL
test_non_uniform_work_group, which puts the system under considerable memory
pressure with an unusual allocation pattern.
To handle this case, try evicting *everything* from the BO cache and stalling
in order to allocate, if the above attempts failed. Fixes the following error:
DRM_IOCTL_PANFROST_CREATE_BO failed: No space left on device
on the aforementioned OpenCL test.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18579>
We need to look at the job header pointers themselves, not the memory objects
that contain them, because there can be (and usually is) multiple jobs per BO.
Fixes: 3da8c9193c ("panfrost: Handle Job VA cycles when decoding a dump file")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18539>
When a job loop is submitted to the GPU, as in IGT
panfrost_submit@pan-reset, this will trigger a DRM scheduler timeout and
eventually a devcoredump. However, when pandecode traverses the list of
jobs in a submit BO, it will iterate forever.
Fix it by adding already-visited CPU VA's into a mesa pointer set and
checking that the current job's CPU VA hasn't already been handled.
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14034>
In file included from src/panfrost/lib/genxml/v9_pack.h:15,
from ../../src/panfrost/lib/genxml/gen_macros.h:95,
from ../../src/panfrost/lib/pan_format.c:27:
../../src/util/bitpack_helpers.h:34:10: fatal error: valgrind.h: No such file or directory
Fixes: c52d5acf15 ("util,intel: Pull the bit packing helpers from genxml to a common header")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7169
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18350>
Even if the driver doesn't *use* trivial blend shaders, building and compiling
blend shaders is expensive. We shouldn't be building blend shaders that should
never be used.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Helpful to disambiguate blend shaders with different colour masks used for the
same format/replace operation.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
We don't need blending in the blitter. That means blend shaders are only needed
on Midgard. Simplify accordingly.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
On Midgard, we need a "blend" shader even if blending is disabled, if the format
isn't blendable. This is inefficient. Bifrost solves this by decoupling the
format conversion from the blending, allowing opaque (unblended) output to any
format without a blend shader or fragment key.
Unfortunately, our blend code is from the Midgard era -- I wrote an early
version of nir_lower_blend when I was still in high school! -- so we've been
using blend shaders for opaque output even on Bifrost. Whoops!
In SuperTuxKart, reduces blend shader calls by 30%, translating to a 15%
reduction in i-cache misses.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
..on Bifrost and later, where the conversion hardware makes this reasonable.
This saves us from inserting a pile of conversions in the compiler to lower away
the 8-bit input/output. This also generates substantially better code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
It's unnecessary since the hardware already does the conversion for us.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
The type of the output variable will propagate through the store_output
intrinsic's src_type field to the BLEND instruction's register format field. On
Valhall, the register format for a BLEND comes from the instruction -- the
register format specified in the conversion descriptor (used on Bifrost) is
ignored. That means it has to match.
Previously, we always used a blend shader for integer rendering. Since blend
shaders ignore the register format of the BLEND instruction, that masked this
issue. That also means we don't need to backport this.
Will prevent a regression from the following commit.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
For untyped_color_outputs, we need to ignore the type of the colour output in
the shader and instead use the type from the format. We have all the information
to do this at blend descriptor pack time, but not at shader compile time. This
means we need a (somewhat expensive) fixup in this edge case to ingest
NIR-to-TGSI. This will prevent a regression from the rest of the series.
Although the register_format field is also present on Valhall blend descriptors,
it is ignored so we don't need the fixup there.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
If we don't recognize the model, dev->model will be NULL. In that case, we can't
dereference dev->model to get the tilebuffer size. If we do, we'll segfault,
instead of gracefully refusing to probe and loading the swrast instead.
Fixes: 96d65b47c7 ("panfrost: Use implementation-specific tile size")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18115>
It's noisy since Bifrost was introduced, unnecessary since we converted to
per-arch GenXML, and wrong since Valhall was added.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
Architecturally, these only work for Midgard, and even on Midgard didn't turn
out to be too useful. While we're removing pandecode cruft, let's remove the
stats that just add noise to Bifrost and Valhall (and largely just noise to
Midgard too).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
It's the same core logic. Unify and let GenXML do its thing.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
Eliminate some #ifdef by grouping v5 and v6 state separately.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
Remove unsued width/height properties, and use cleaner C syntax to build the
return value.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
There are a lot of problems with passing job_index around:
* Almost entirely unused
* Not particularly helpful even when used
* Mostly ignored for Valhall already
* Doesn't extend to CSF
It only really exists due to the early days of pandecode generating valid C code
as the trace format. With GenXML instead, that's not applicable.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
It hasn't had a consistent semantic meaning since we've switched decoding over
to GenXML.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
The hardware doesn't care what BO a given buffer resides in, only what GPU
address it's at. It's simpler to fetch from a GPU address, rather than the pair
of a GPU address and a backing allocation. This cleans up a lot of cruft in
pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18094>
../src/panfrost/lib/tests/test-earlyzs.cpp: In function 'void test(pan_earlyzs, pan_earlyzs, uint32_t)':
../src/panfrost/lib/tests/test-earlyzs.cpp:59:4: error: 'pan_shader_info::<unnamed union>' has no non-static data member named 'can_discard'
59 | };
| ^
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18024>
PAN_MESA_DEBUG=overflow will place objects as close as possible to a
protected region at the end of the buffer, so that overflows segfault.
Caught the bugs in all four of the preceding commits.
v2: memset the BO to 0xbb to catch code expecting zeroed allocations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
create_vertex_elements_state is sometimes called with a too large
num_elements argument, for example with util_blitter, which causes a
buffer overflow.
There is no documentation to forbid this practice, so don't rely on
so->num_elements being correct and instead use the vertex shader
attribute count, which matches the value used to allocate the
descriptors.
Use attributes_read_count rather than attribute_count because the
latter also includes images and PAN_VERTEX_ID/PAN_INSTANCE_ID.
Fixes: 76de3e691c ("panfrost: Merge attribute packing routines")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
Remove the previous compile-time early-ZS implementation and replace it with the
decoupled early-ZS implementation. This uses more efficient settings in some
cases (depth/stencil tests always passes or do not write), and fixes the
settings used in another case (alpha-to-coverage enabled with an otherwise
early-ZS shader.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #6206
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
The new early-ZS helpers are pure functions, leaf nodes of the call graph, and
implemented with a different algorithm from the "oracle" table of correct values
for various combinations of states. Further, incorrect settings often still pass
CTS while causing game bugs or inefficiencies. That combination makes the
helpers an excellent candidate for unit tests. Add some.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>