Commit e99e7aa4 began passing start > 0 to indexed draw calls rather
than keeping start at 0 and manually advancing ib->ptr. This should
work fine, however, there have been instances of software fallbacks
not handling things right.
vbo_sw_primitive_restart had a bug where it was ignoring "start" and
always calling find_sub_primitives with start = 0 and end = ib->count.
This meant that when start > 0, it was analyzing the wrong part of the
index buffer when finding subprimitives.
In theory, each _mesa_prim can have a different "start" value. But
the code only calls find_sub_primitives once, because it wants to
map, analyze, and unmap the index buffer before calling ctx->Draw,
as some drivers don't support drawing with the index buffer mapped.
To handle this, we break vbo_sw_primitive_restart calls into sections
where "start" matches across all the primitives, similar to how I
handled the issue in tnl in commit bd6120f562.
In the common case, start matches and we handle it in one pass anyway.
Fixes Piglit's primitive-restart VBO_COMBINED_VERTEX_AND_INDEX test
and KHR-GL33.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
on Intel Ivybridge and older (which don't do arbitrary cut indices).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4052
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9417>
Use debug_printf more consistently, normalize formatting a bit, and
trace a few more places you're likely to care about.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9436>
Applications rarely require them, but this improves fossil-db replay time.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9411>
This register is only 32bits.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1952fd8d2c ("anv: Implement VK_EXT_conditional_rendering for gen 7.5+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9428>
Instead of lowering it in NIR, use the lookup tables as inputs to a
second-order Taylor expansion. shader-db results aren't amazing but keep
in mind this is without backend CSE yet.
total instructions in shared programs: 115913 -> 115707 (-0.18%)
instructions in affected programs: 3151 -> 2945 (-6.54%)
helped: 12
HURT: 0
Instructions are helped.
total nops in shared programs: 84045 -> 84041 (<.01%)
nops in affected programs: 1571 -> 1567 (-0.25%)
helped: 1
HURT: 7
Inconclusive result (value mean confidence interval includes 0).
total clauses in shared programs: 20498 -> 20489 (-0.04%)
clauses in affected programs: 188 -> 179 (-4.79%)
helped: 6
HURT: 0
Clauses are helped.
total quadwords in shared programs: 90395 -> 90291 (-0.12%)
quadwords in affected programs: 2287 -> 2183 (-4.55%)
helped: 12
HURT: 0
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
Useful for representing -0 in transcendental sequences matching the
blob.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
With these changes the shader code is visible in RGP.
Vk pipeline feature is emulated using si_update_shaders: when shaders are
updated we compute a sha1 of their code and use it as a pipeline hash.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
For radeonsi the shaders don't live in the same BOs, so they're
unlikely to be less that 0x1000 bytes apart.
So this commit bumps the threshold to 0x10000 and warns once
when hitting it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
We were checking that the previous instruction doesn't write flags,
but we also need to check it doesn't read them.
Fixes: 1784dd22a3 ('broadcom/compiler: pipeline smooth ldvary sequences')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9431>
When we were only able to pipeline smooth varyings, if we had to disable
ldvary pipelining in the middle of a sequence it would stay disabled for
the rest of the program, to prevent us from prioritizing scheduling of
ldvary instructions that we would not be able to pipeline effectively.
Now that we can pipeline all ldvary sequences we can change this.
This change re-enables ldvary pipelining upon finding the next
ldvary in the program in the hopes that we can continue pipelining
succesfully. To do this, we track the number of ldvary instructions we
emitted so far and compare that to the number of inputs in the fragment
shader we are scheduling. This also allows us to simplify our ldvary
tracking at nir to vir time, since that is all now handled in the QPU
scheduler.
total instructions in shared programs: 13817048 -> 13810783 (-0.05%)
instructions in affected programs: 810114 -> 803849 (-0.77%)
helped: 4843
HURT: 591
Instructions are helped.
total max-temps in shared programs: 2326612 -> 2326300 (-0.01%)
max-temps in affected programs: 4689 -> 4377 (-6.65%)
helped: 285
HURT: 7
Max-temps are helped.
total sfu-stalls in shared programs: 30942 -> 30865 (-0.25%)
sfu-stalls in affected programs: 207 -> 130 (-37.20%)
helped: 120
HURT: 42
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13847990 -> 13841648 (-0.05%)
inst-and-stalls in affected programs: 825378 -> 819036 (-0.77%)
helped: 4899
HURT: 590
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9404>
Use atomic operations to avoid competition. In addition,
since sub_ctx_id 0 has been used by default, sub_ctx_id
should start from 1.
Signed-off-by: Xin He <hexin.op@bytedance.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9406>
The clear code is 0xCC which means CMASK isn't fast-cleared.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9392>
It makes more sense to try to enable TC-compat if the image has HTILE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9405>
Not copying all the scissors caused
dEQP-VK.pipeline.extended_dynamic_state.two_draws_dynamic.2_viewports
to fail but thah test pointlessly relies on KHR_multiview (cts issue
filed).
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9422>
Prior to SKL, the mipmaps for 3D surfaces are laid out in a way
that make it impossible to represent in the way that
VkSubresourceLayout expects. Since we can't tell users how to make
sense of them, don't report them as available.
"Fixes" dEQP-VK.image.subresource_layout.3d.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9419>
This implements most of the remaining u_threaded_context support. Most
of the heavy lifting was done in the previous patches which fixed things
up for the new thread safety requirements. Only a few things remain.
u_threaded_context support can be disabled via an environment variable:
GALLIUM_THREAD=0
On Felix's Tigerlake with the GPU at fixed frequency, enabling
u_threaded_context improves performance of several games:
- Civilization VI: +17%
- Shadow of Mordor: +6%
- Bioshock Infinite +6%
- Xonotic: +6%
Various microbenchmarks improve substantially as well:
- GfxBench5 gl_driver2: +58%
- SynMark2 OglBatch6: +54%
- Piglit drawoverhead: +25%
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
pipe->transfer_map can be called from u_threaded_context's thread
rather than the driver thread. We need to use two different slab
allocators, one for each thread. transfer_unmap, on the other hand,
is only ever called from the driver thread.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
u_threaded_context requires various objects to inherit from a new
threaded_foo base class rather than directly from pipe_foo. This
patch does most of the mechanical changes required for that.
It also initializes the new threaded_resource fields.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>