Not copying all the scissors caused
dEQP-VK.pipeline.extended_dynamic_state.two_draws_dynamic.2_viewports
to fail but thah test pointlessly relies on KHR_multiview (cts issue
filed).
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9422>
Prior to SKL, the mipmaps for 3D surfaces are laid out in a way
that make it impossible to represent in the way that
VkSubresourceLayout expects. Since we can't tell users how to make
sense of them, don't report them as available.
"Fixes" dEQP-VK.image.subresource_layout.3d.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9419>
This implements most of the remaining u_threaded_context support. Most
of the heavy lifting was done in the previous patches which fixed things
up for the new thread safety requirements. Only a few things remain.
u_threaded_context support can be disabled via an environment variable:
GALLIUM_THREAD=0
On Felix's Tigerlake with the GPU at fixed frequency, enabling
u_threaded_context improves performance of several games:
- Civilization VI: +17%
- Shadow of Mordor: +6%
- Bioshock Infinite +6%
- Xonotic: +6%
Various microbenchmarks improve substantially as well:
- GfxBench5 gl_driver2: +58%
- SynMark2 OglBatch6: +54%
- Piglit drawoverhead: +25%
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
pipe->transfer_map can be called from u_threaded_context's thread
rather than the driver thread. We need to use two different slab
allocators, one for each thread. transfer_unmap, on the other hand,
is only ever called from the driver thread.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
u_threaded_context requires various objects to inherit from a new
threaded_foo base class rather than directly from pipe_foo. This
patch does most of the mechanical changes required for that.
It also initializes the new threaded_resource fields.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
When we enable u_threaded_context, the pipe->create_*_state hooks
(precompile variants) are going to be called from one thread, while
iris_update_compiled_shaders (on-the-fly variants) are going to be
called from a driver thread. BLORP shaders also happen from
clear, blit, and so on in the driver thread.
u_upload_mgr isn't thread-safe, so use an uploader for each purpose.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
This enables us to replace the backing storage of resources that have
been used as stream output targets, in case we're invalidating their
entire contents. This can avoid stalls. We simply hadn't supported it
because it was going to be tricky to re-emit 3DSTATE_SO_BUFFER without
screwing up "reset offset to zero" vs. "keep appending". But that
should be working fine with the previous patch's refactor.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
The previous mechanism was a bit fragile. We stored the zero offset
in the pre-baked packet, and used an flag to override 0xFFFFFFFF
(append) offsets until our first emit - then prohibited anyone from
trying to re-emit the packet by flagging IRIS_DIRTY_SO_BUFFERS,
because that would re-emit the version with the zeroing of the offset.
Now, we always store 0xFFFFFFFF in the pre-baked packet, and use a
flag to override it to zero on the first emit. That way, we can
re-emit that packet at any time, and it'll just keep appending.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
In the future, Marek is planning to make u_threaded_context call
create_stream_output_target() from a different thread than the main
driver thread, which means that we can't safely use uploaders there.
To prepare for this eventual future, just defer the allocation of
the offset BO 'til later. It's a very small amount of overhead.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
With u_threaded_context, create_surface and create_sampler_view will
be called from a different thread than the driver thread. They aren't
allowed to access the context, which means that they can't use the
uploaders there to upload our SURFACE_STATE entries.
Thanks to backing-storage replacement and iris_rebind_buffer, we already
reworked things to maintain CPU-side copies of the SURFACE_STATE entries
and added the ability to upload or re-upload them later. So we can skip
the upload at object creation time, and add a simple resource-is-NULL
check at binding table upload time to ensure that they get uploaded by
the time we need them. (They might get uploaded earlier due to rebinds
or clear color updates, but this is the last moment to do so.)
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
It shouldn't affect bound program state, and the current context state
shouldn't be relevant for shader creation precompiles anyway (level load
isn't going to have the eventual set of sampler views bound when you go to
draw with that shader).
Closes: #4306
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
You always have to populate the key with the right texture swizzles, even
if textures haven't changed since binding a new shader.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
We can compose the swizzles at sampler view creation time, saving
recompiles on texture format changes.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
This allows for virgl guests to expose GL_NVX_gpu_memory_info and
GL_ATI_meminfo when the extensions are supported on the host.
Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9337>
mi_ is already a unique prefix in Mesa so the gen_ isn't really gaining
us anything except extra characters. It's possible that MI_ may
conflict a tiny bit with GenXML but it doesn't seem to be a problem
today and we can deal with that in the future if it's ever an issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9393>
Writes to global memory should not be moved over discard,
otherwise we could have unintended side-effects or lack of
side-effects where they should be observed.
Fixes tests:
dEQP-VK.rasterization.frag_side_effects.color_at_beginning.kill
dEQP-VK.rasterization.frag_side_effects.color_at_end.kill
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9365>
Disabling VM_ALWAYS_VALID actually hurts more than it helps
after doing more testing. Managing the global BO list in userspace
is really costly and make a bunch of games CPU bound.
I think re-enabling VM_ALWAYS_VALID is a step in the right direction.
This reverts commit 6ac6e2fbfb.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9341>
We had an optimization in place to skip a unifa write if the address
happens to be right after the last ldunifa read address, but we can
take this further and update the unifa address by emitting ldunifa
instructions if needed to skip a unifa write that is close enough.
This is because a unifa write involves 4 cycles: 1 for the write
and 3 delay slots before we can emit the first ldunifa.
So if we have code like this:
unifa addr + 0
ldunifa.r0
unifa addr + 12
ldunifa.r1
In practice we end up with QPU like this:
unifa addr + 0
nop
nop
nop
ldunifa.r0
unifa addr + 12
nop
nop
nop
ldunifa.r1
And with this patch we get:
unifa addr + 0
nop
nop
nop
ldunifa.r0 <--- reads offset 0
ldunifa.- <--- reads offset 4
ldunifa.- <--- reads offset 8
ldunifa.r1 <--- reads offset 12
Of course, QPU scheduling might find ways to fill the NOPs to some
extent and remove some of the gains, but generally speaking, this is
still usually a win.
Going by shader-db results, allowing the next unifa address to be up
to 12 bytes after the address resulting from the last ldunifa read
shows the best results:
total instructions in shared programs: 13817048 -> 13812202 (-0.04%)
instructions in affected programs: 602701 -> 597855 (-0.80%)
helped: 1750
HURT: 760
Instructions are helped.
total uniforms in shared programs: 3795485 -> 3793200 (-0.06%)
uniforms in affected programs: 43930 -> 41645 (-5.20%)
helped: 898
HURT: 0
Uniforms are helped.
total max-temps in shared programs: 2326612 -> 2326621 (<.01%)
max-temps in affected programs: 651 -> 660 (1.38%)
helped: 10
HURT: 21
Inconclusive result (value mean confidence interval includes 0).
total sfu-stalls in shared programs: 30942 -> 30906 (-0.12%)
sfu-stalls in affected programs: 627 -> 591 (-5.74%)
helped: 186
HURT: 158
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13847990 -> 13843108 (-0.04%)
inst-and-stalls in affected programs: 601404 -> 596522 (-0.81%)
helped: 1747
HURT: 757
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
We can't remove unused ldunifa that are not the first or last
in a sequence, but we can still ignore their destination
to reduce register pressure.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
With threaded-context we won't have a chance to apply the workaround in
the backend driver. But the previous commit moves it to a driconf
configured workaround in mesa/st, so we can drop this now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9316>
All the ir3 gens had the same thing, time to move it out into a shared
helper.
The keeping the storage in fdN_context is to avoid namespace clashes
between ir3 and ir2.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9394>
I had forgotton on which gens these where used on (which is important if
you need to know which shader stages use these).. expand the comments a
bit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9394>
Found this on top of Karol's patches but it seems like it can just be
applied to master.
Helps with some cases of
kernel_image_methods/test_kernel_image_methods 2Darray
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9381>
TGSI has these nice labels for us for where to jump in this case, let's
use them. Improves piglit arb_shader_image_load_store-shader-mem-barrier
runtime massively, though not enough to make the test really reasonable to
run.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9347>