We want to implement explicit BO dependency tracking and for that
we'll use arrays of dependencies (syncobjs) indexed by screen->id.
This is way more efficient than storing and checking screen pointers
everywhere.
v2: Properly use atomic operations in a non-racy way (Alyssa, Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12363>
This commit does several things:
* Unify code common to several drivers by evaluating INTEL_NO_HW within
intel_get_device_info_from_fd (suggested by Jordan).
* For drivers that keep a copy of the intel_device_info struct, a
separate copy of the no_hw field is now unnecessary. Remove them.
* Minimize kernel queries when INTEL_NO_HW is true. This is done for
code simplification, but we may find reason to undo this later on.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12007>
The prior method of checking the result of getenv() for NULL would cause
the feature to be enabled for INTEL_NO_HW=0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12007>
Opt out of implicit synchronization for the workaround bo: we already
never mark it as writable and we only write to it as part of
PIPE_CONTROL synchronization requirements. Setting it as ASYNC should
be enough for i915.ko to pin it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12151>
Add support for driconf overrides on a per-device level, for cases
where we don't want to override behavior for all devices supported
by a particular driver.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12135>
This will give the driver a chance to set a device name separate from the
driver name, using info probed during screen creation. All drivers
querying driconf in screen creation now have to call parsing on their own,
but other drivers get fallback parsing after screen creation.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12135>
There are two problems with the current architecture.
In OpenGL, the id is supposed to be a unique identifier for a particular
log source. This is done so that applications can (theoretically)
filter particular log messages. The debug callback infrastructure in
Mesa assigns a uniqe value when a value of 0 is passed in. This causes
the id to get set once to a unique value for each message.
By passing a stack variable that is initialized to 0 on every call,
every time the same message is logged, it will have a different id.
This isn't great, but it's also not catastrophic.
When threaded shader compiles are used, the id *pointer* is saved and
dereferenced at a possibly much later time on a possibly different
thread. This causes one thread to access the stack from a different
thread... and that stack frame might not be valid any more. :(
I have not observed any crashes related to this particular issue.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12136>
There are two problems with the current architecture.
In OpenGL, the id is supposed to be a unique identifier for a particular
log source. This is done so that applications can (theoretically)
filter particular log messages. The debug callback infrastructure in
Mesa assigns a uniqe value when a value of 0 is passed in. This causes
the id to get set once to a unique value for each message.
By passing a stack variable that is initialized to 0 on every call,
every time the same message is logged, it will have a different id.
This isn't great, but it's also not catastrophic.
When threaded shader compiles are used, the id *pointer* is saved and
dereferenced at a possibly much later time on a possibly different
thread. This causes one thread to access the stack from a different
thread... and that stack frame might not be valid any more. :(
This fixes shader-db crashes of various kinds on Iris with threaded
shader compiles enabled.
Fixes: 42c34e1ac8 ("iris: Enable threaded shader compilation")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12136>
There are a couple minor things that can be improved:
1. Eliminate (or reduce) the dynamic allocation of the
threaded_compile_job.
2. For apps like shader-db, improve the case where nr_threads=0. Right
now this adds thread switching and mutex overhead.
3. Other performance improvements? iris_uncompiled_shader::variants has
some special properties that make it ripe for replacement with a
lockless list. Without gathering some data, it's hard to guess what
impact that could have.
v2: Fix whitespace and formatting issues. Noticed by Ken.
s/threaded_compile_job/iris_threaded_compile_job/g. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11229>
This can be useful to simplify debugging compiler issues.
Similar to 9445a4ab43 ("radeonsi: add radeonsi_sync_compile option").
v2: Actually query the driconf to set screen->driconf.sync_compile.
Noticed by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11229>
This is distinct form max_cs_threads because it also encodes
restrictions about the way we use GPGPU/COMPUTE_WALKER. This gets rid
of the MIN2(64, devinfo->max_cs_threads) we have scattered all over the
driver and puts it in a central place.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11861>
Back when SSBOs were first enabled in i965, we tried to work around
issues where the CPU and GPU were incoherently writing to the same
cacheline by forcing an alignment such that different sections of
data would fall in different cachelines. This seems wrong.
On integrated GPUs with LLC, CPU and GPU writes should be coherent.
On integrated GPUs without LLC, we either enable snooping (so they
are again coherent), or we use WC maps (so the CPU cache isn't used).
Discrete GPUs always use WC maps (so the CPU cache isn't used).
This should work. In other words, I think the increased alignment
was just working around coherency problems on atoms that have been
fixed in the intervening 6 year time period.
Untyped surface messages require 4B alignment.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5016
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11727>
This is rarely useful, but after the next patch removes tiling tracking,
this would literally be the only difference between iris_bo_alloc and
iris_bo_alloc_tiled, so we may as well add it.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11169>
Based on a patch by Rafael Antognolli.
We already had a flags parameter, but omitted it from the simple alloc
interface because most callers were passing 0. However, we'll want to
use it for selecting between device local memory and system memory, and
possibly mmap cacheability modes, in the future. At that point, many
more callers will want to specify, so I think we should include flags
in iris_bo_alloc() as well.
A few places used the iris_bo_alloc_tiled() function simply to pass
flags, so this patch converts them to use iris_bo_alloc() instead now
it does everything they want.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11169>
This is enabled by enabling gallium's memobj capability.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4337>
Enabling this fast path, while making CPU side simpler
(and thus reducing driver's CPU overhead), forces us to
generate multiple VERTEX_BUFFER_STATE packets pointing to
the same buffer, but with slightly shifted BufferStartingAddress.
This is fine on GFX version >= 11 and < 8, but on version
8 and 9, thanks to internal details of the VF cache, vertex
data from each VERTEX_BUFFER_STATE is cached separately
and this results in a lot of cache misses.
Disabling this fast path restores previous performance levels.
On my SKL GT2 machine this improves performance in:
- GLB27 Egypt offscreen by 37%
- GLB27 TRex offscreen by 22%
- gfxbench5 Manhattan offscreen by 1.75%
- gfxbench5 Manhattan31 offscreen by 1.9%
- gfxbench5 Aztec Ruins high by 2.3%
In Intel performance CI on GFX version 9 it improves performance in:
- gfxbench5 Manhattan offscreen by 1.65%
- gfxbench5 Aztec Ruins normal offscreen by 1.33%
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3277
Fixes: 42842306d3 ("mesa,st/mesa: add a fast path for non-static VAOs")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9857>
gen_device_info will try to use the most recent uAPI to get this
number and will fallback to this get_param. So just use what was
queries by gen_device_info_from_fd().
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9052>
This patch renames functions, structures, enums etc. with "gen_"
prefix defined in common code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
Changes in this patch include:
- Rename all files in src/intel/common path
- Update the filenames used in source and build files
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
We should be exposing it in every driver, since it's required eventually
to reduce jank. Make drivers have to explicitly opt out instead of opt
in.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9088>
These hooks were written in the initial IRIS_MEASURE implementation.
Minor changes by Mark Janes <markjanes@swizzler.org> to adapt to the
INTEL_MEASURE reimplementation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7354>
Now that we store shader variants in the objects themselves rather
than a per-context hash table, they are actually global across
contexts. We can enable this feature.
This makes shaders shared across contexts, so apps can compile in
one and use it in another. This has always been allowed by GL,
but in the past we've simply recompiled the shaders in every context,
which is slow and painful.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7668>
It's hooked up in all the pipe wrapper drivers, and all the
frontends except a couple places in glx/xlib.
This enables a more efficient path for drivers which use
swrast's Present, but hardware rendering (e.g. d3d12, zink).
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Marek Olák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8045>
Screen is shared among contexts, other context might be already using
vtbl while another initializes it again.
==45872== Possible data race during write of size 8 at 0x5DDAE78 by thread #549
==45872== Locks held: 1, at address 0x5D1B6F8
==45872== at 0x6D66D91: gen9_init_state (iris_state.c:7816)
==45872== by 0x6BA0A31: iris_create_context (iris_context.c:342)
==45872== by 0x621F390: st_api_create_context (st_manager.c:917)
==45872== by 0x620E6F9: dri_create_context (dri_context.c:163)
==45872== by 0x6A40DB1: driCreateContextAttribs (dri_util.c:480)
==45872== by 0x540B963: dri2_create_context (egl_dri2.c:1583)
==45872== by 0x53FB84E: eglCreateContext (eglapi.c:821)
==45872==
==45872== This conflicts with a previous read of size 8 by thread #544
==45872== Locks held: 1, at address 0x5F6E0E0
==45872== at 0x6CB779E: blorp_alloc_binding_table (iris_blorp.c:167)
==45872== by 0x6CAEF70: blorp_emit_surface_states (blorp_genX_exec.h:1540)
==45872== by 0x6CB67F9: blorp_exec (blorp_genX_exec.h:2016)
==45872== by 0x6CB7AFE: iris_blorp_exec (iris_blorp.c:307)
==45872== by 0x70F5916: try_blorp_blit (blorp_blit.c:2145)
==45872== by 0x70F5FCA: do_blorp_blit (blorp_blit.c:2273)
==45872== by 0x70F778F: blorp_copy (blorp_blit.c:2803)
==45872== by 0x6BB9EB6: iris_copy_region (iris_blit.c:725)
v2: move as genX(init_screen_state) (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7544>
This enables GL_EXT_semaphore feature.
v2:
* reversed previous commit that was conditionally setting the signal
fence capability if the syncobj was present
* reversed previous commit that was introducing a bool has_syncobj that
is not necessary anymore
v3:
* changed the signal function to use fence->seqno due to recent changes
to master
v4:
* changed the signal callback to use the new structs of the fences
backend (iris_fine_fence)
v5:
* removed check for ctx == NULL in iris_fence_signal and await functions
as at the time they are called we always have a context
* splitted a line to not exceed width
v6:
* put back the if(ctx) check in iris_fence_await, if this is an error
the fix should be in a different MR
Signed-off-by: Eleni Maria Stea <estea@igalia.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7042>
Now that we have the HDC, using the data cache for UBO pulls seems to
help things quite a bit:
GTA V DXVK 104.0%
Talos Principle GL 102.8%
Rise of Tomb Raider VK 102.8%
Dark Souls 3 DXVK 101.4%
Witcher3 DXVK 101.3%
Bioshock Infinite GL 100.5%
Doom 2016 VK 97.7%
Doom is a bit of a loss but it helps enough other stuff, it's probably
worth the hit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7230>
This commit enables clover support for iris. It is intended as a
compiler developer tool and not as a new OpenCL implementation from
Intel. If you want competent OpenCL, we have a different open-source
driver for that built on our LLVM-based IGC compiler stack. However,
using clover with iris is becoming increasingly useful as a compiler
development tool and I'm getting tired of carrying the patches in a
private branch.
By default, clover will not initialize on iris. To enable clover, set
the IRIS_ENABLE_CLOVER environment variable to "1" or "true". As we've
done with the semi-sketchy platform support in ANV, it dumps a very loud
WARNING to stderr when enabled. Use at your own risk.
NOTE: To anyone intending to benchmark this, the performance is going to
be terrible and that is expected. This is in no way representative of
the Intel/NIR compiler stack. As it currently stands, clover passes
-O0 to clang when compiling OpenCL C to make SPIRV-LLVM-Transator work.
When compiling the SPIR-V, clover currently doesn't run any NIR
optimizations before it lowers memory access so any NIR optimizations
iris attempts to do are severely hampered. One day, clover will get a
NIR optimization loop or the ability to hand things off to the driver
per-lowering but today is not that day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7047>
Use the same generators as used in anv driver so both Vulkan and OpenGL
drivers can share the same external memory objects.
v2: removed extra parameter from function gen_uuid_compute_device_id
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Eleni Maria Stea <estea@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7025>
It's included in declaration of INTEL_DEBUG.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6732>
Both of these are clover-only caps. We don't really support clover and,
even if we did, the number of address bits is wrong and we definitely
don't support the CL path for images.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6405>
All drivers that support mediump lowering should support 16BIT_TEMPS,
but some do not also want 16b consts to be lowered. Replace the pipe
cap in preperation to remove LowerPrecisionTemporaries.
Note: also updates reference checksums for the arm64_a630_traces job,
due to lowering more to 16b
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6189>
Adds PIPE_CAP_PRIMITIVE_RESTART_FIXED_INDEX which is a subset of the
primitive restart cap for when the hardware can only support the fixed
indices specified in GLES.
The switch statements were automatically modified with this command:
find \( \( -name \*.cpp -o -name \*.c \) \! -type l \) \
-exec sed -i -r \
's/^(\s*case\s+PIPE_CAP_PRIMITIVE_RESTART)\s*:.*$/\0\n\1_FIXED_INDEX:/' \
{} \;
v2: Add a note in screen.rst
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5559>