v2: minor formatting fixes
v3: call glsl_type_singleton_init_or_ref and glsl_type_singleton_decref
v4: capitalize and punctuate comments
fix text_executable -> text_intermediate in TODO
make glsl_type_singleton wrapper static
v5: rewrite how we run the nir passes
v6: fix unhandled case switch warning in st/mesa
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v4)
v2: rework arguments to compiler::compile_program
add assert to device::ir_format
v3: remove PIPE_SHADER_IR_SPIRV
change title
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v2)
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Most drivers have actually no binary format and just store the IR directly
as a single entry point blob.
v2: add a cap to switch between single or multi entry point binaries
v3: remove the entry_point field
v4: remove PIPE_CAP_MULTI_ENTRY_POINT_BINARIES
v5: remove supports_multiple_entry_points
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
v2: pass argument by value
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
We want to use it for other formats as well, so give it a more generic name
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
makes it easier to consume a IR_NATIVE binary
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Changes since:
* v12:
- remove autotools (Karol Herbst)
- Remove the callback in format_validation_msg. (Francisco Jerez)
- Removed is_binary_spirv. (Francisco Jerez)
- Pass a string reference to is_valid_spirv instead of the
notification callback. (Francisco Jerez)
* v11: Fix compilation error introduced in v11.
* v10:
- Reuse format_validation_msg in is_valid_spirv.
- Remove LVL2STR macro in format_validation_msg.
* v9: Add `clover_cpp_std` to the overrides of the `libclspirv` target
in Meson.
* v7: Add DEFINES to libclspirv and libclover, in autotools, as they
would otherwise never know whether CLOVER_ALLOW_SPIRV has been
defined (Dave Airlie)
* v6: Update the dependency name (meson) and the libs variable
(Makefile) due to the replacement of llvm-spirv to the new
official SPIRV-LLVM-Translator.
* v5: Changed to match the updated “clover/llvm: Allow translating from
SPIR-V to LLVM IR” in the v6.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Changes since:
* v12 (Karol Herbst):
- rename CLOVER_ALLOW_SPIRV to HAVE_CLOVER_SPIRV
* v11 (Karol Herbst):
- only set new defines for clover to speed up recompilation
- remove autotools
* v10:
- Add a new flag (`--enable-opencl-spirv` for autotools, and
`-Dopencl-spirv=true` for meson) for enabling SPIR-V support in
clover, and never automagically enable it without that flag. (Dylan Baker)
- When enabling the SPIR-V support, the SPIRV-Tools and
SPIRV-LLVM-Translator libraries are now required dependencies.
* v7:
- Properly align LLVMSPIRVLib comment (Dylan Baker)
- Only define CLOVER_ALLOW_SPIRV when **both** dependencies are found:
autotools was only requiring one or the other.
* v6: Replace the llvm-spirv repository by the new official
SPIRV-LLVM-Translator.
* v4: Add a comment saying where to find llvm-spirv (Karol Herbst).
* v3:
- make SPIRV-Tools and llvm-spirv optional (Francisco Jerez);
- bump requirement for llvm-spirv to version 0.2
* v2:
- Bump the required version of SPIRV-Tools to the latest release;
- Add a dependency on llvm-spirv.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v10)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Currently there is no way to make no context current w/gallium + osmesa.
The non-gallium version of osmesa does this if the context and buffer
passed to `OSMesaMakeCurrent` are both null. This small change makes it
so that this is also the case with the gallium version.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
BLORP always turns off TCS/TES/GS. If regular drawing also has them
disabled (the overwhelmingly common case), then leaving them disabled
is just fine by us and we can skip dirtying them, as that would just
re-disable them a second time on the next draw.
If they are actually enabled, however, we do need to flag them.
Cuts 52% of the 3DSTATE_HS packets in an Aztec Ruins trace.
This improves a couple of things:
1. We now only update anything if the shader actually cares.
Previously, is_indexed_draw was causing us to flag dirty vertex
buffers, elements, and SGVs every time the shader switched between
indexed and non-indexed draws. This is a very common situation,
but we only need that information if the shader uses gl_BaseVertex.
We were also flagging things when switching between indirect/direct
draws as well, and now we only bother if it matters.
2. We upload new draw parameters only when necessary.
When we detect that the draw parameters have changed, we upload a
new copy, and use that. Previously we were uploading it every time
the vertex buffers were dirty (for possibly unrelated reasons) and
the shader needed that info. Tying these together also makes the
code a bit easier to follow.
In Civilization VI's benchmark, this code was flagging dirty state
many times per frame (49 average, 16 median, 614 maximum). Now it
occurs exactly once for the entire run.
When both UTIL_QUEUE_INIT_RESIZE_IF_FULL and
UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY are set, we can get into a
situation where the queue never executes and grows to a huge size
due to all other threads being busy.
This is the case with the shader cache when attempting to compile a
huge number of shaders up front. If all threads are busy compiling
shaders the cache queues memory use can climb into the many GBs
very fast.
The use of these two flags with the shader cache is intended to
allow shaders compiled at runtime to be compiled as fast as possible.
To avoid huge memory use but still allow the queue to perform
optimally in the run time compilation case, we now add the ability
to track memory consumed by the jobs in the queue and limit it to
a hardcoded 256MB which should be more than enough.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also, swap vs and fs constructor or so fs comes first.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When using xfb and rasterizing, the fragment shader may have fewer
inputs than the vertex shader outputs. We can't rely on gl_Position to
be placed at fs->total_in, but have to instead remember where we add
it in the link map and use that location.
Fixes 100+ tesselation dEQPs under
dEQP-GLES31.functional.tessellation.primitive_discard.*
dEQP-GLES31.functional.tessellation.user_defined_io.*
Reviewed-by: Eric Anholt <eric@anholt.net>
If we can entirely push uniform data, we don't need a SURFACE_STATE
descriptor for pulling data. Since constant uploads are a very common
operation, and being able to push all data is also very common, we would
like to avoid the overhead in this case.
This patch defers uploading new descriptors. Instead of handling that
at iris_set_constant_buffer, we do it at iris_update_compiled_shaders,
where we can see the currently bound shader variants. If any need pull
descriptors, and descriptors are missing, we update them and flag that
the binding table also needs to be refreshed.
Improves performance in GFXBench5 gl_driver2 on an i7-6770HQ by
31.9774% +/- 1.12947% (n=15).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We now track per-stage bind history for constant and shader buffers,
shader images, and sampler views by adding an extra res->bind_stages
field to go with res->bind_history.
This lets us flag IRIS_DIRTY_CONSTANTS for only the specific stages
involved, and also skip some CPU overhead in iris_rebind_buffer.
Cuts 4% of 3DSTATE_CONSTANT_XS packets in a Shadow of Mordor trace
on Icelake.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The underlying buffer isn't changing - so we don't need to update any
SURFACE_STATE descriptors - we just might have new constants, meaning
we need to re-emit 3DSTATE_CONSTANT_XS. On Gen9, this means we need
to update 3DSTATE_BINDING_TABLE_POINTERS_XS too, but that's now handled
by the explicit check in the previous patch.
On Gen9, this should cause us to re-emit the binding table /pointer/ on
writing to a buffer with PIPE_BIND_CONSTANT_BUFFER, rather than emitting
a whole new /table/.
On Gen8 and Gen11, this avoids binding table churn altogether.
Cuts 61% of 3DSTATE_BINDING_TABLE_POINTERS_XS packets in a Shadow of
Mordor trace on Icelake.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Right now, we usually flag both IRIS_DIRTY_{CONSTANTS,BINDINGS}_XS,
because we have SURFACE_STATE for constant buffers in case the shaders
access them via pull mode.
But this flagging is overkill in many cases. Gen8 and Gen11 don't need
it at all. Gen9 doesn't need that large of a hammer in all cases.
Just handle it explicitly so the right thing happens.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We upload a new SURFACE_STATE for the UBO/SSBO in question, which
means that we need new binding tables as well.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
* Set missing STENCIL_CONFIG_EXT2 bits
* Swap stencil sides when rendering CCW
Fixes following deqp tests (which were 99% failing):
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
Note: deqp tests require --deqp-gl-config-name=rgba8888d24s8ms0
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If we want to execute several batches in parallel they need to have
their own tiler and scratchpad BOs. Let move those objects to
panfrost_batch and allocate them on a per-batch basis.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we want the batch dependency tracking to work correctly we must
make sure all BOs are added to the batch->bos set early enough. Adding
FBO BOs when generating the fragment job is clearly to late. Add a
panfrost_batch_add_fbo_bos helper and call it in the clear/draw path.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This helper automates the panfrost_bo_create()+panfrost_batch_add_bo()+
panfrost_bo_unreference() sequence that's done for all per-batch BOs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't know who else is using the BO in that case, and thus shouldn't
re-use it for something else.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Thanks to that we avoid the recursive call into panfrost_bo_create()
and we can get rid of panfrost_bo_release() by inlining the code in
panfrost_bo_unreference().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
panfrost_bo_unreference() should be used instead.
The only difference caused by this change is that the scratchpad,
tiler_heap and tiler_dummy BOs are now returned to the cache instead
of being freed when a context is destroyed. This is only a problem if
we care about context isolation, which apparently is not the case since
transient BOs are already returned to the per-FD cache (and all contexts
share the same address space anyway, so enforcing context isolation
is almost impossible).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Store a screen pointer in panfrost_bo so we don't have to pass a screen
object to all functions manipulating the BO.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
panfrost_bo_mmap() already takes care of that.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
They are not expected to be called directly, users should use
panfrost_bo_{create,release}() instead.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Right now, the BO API is spread over pan_{allocate,resource,screen}.h.
Let's move all BO related definitions to a separate header file.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Change the prefix for BO allocation flags to make it consistent with
the rest of the BO API.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This way we have all BO related functions placed in the same source
file.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
pan_drm.c was only meaningful when we were supporting 2 kernel drivers
(mali_kbase, and the drm one). Now that there's now kernel-driver
abstraction we're better off moving those functions were they belong:
* BO related functions in pan_bo.c
* fence related functions + query_gpu_version() in pan_screen.c
* submit related functions in pan_job.c
While at it, we rename the functions according to the place they're
being moved to.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
has_draws can be inferred directly from the batch->last_job value, no
need to pass it around.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
ctx is allocated with rzalloc() which takes care of zero-ing the memory
region. No need to call memset(0) on top.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>