Enabling this fast path, while making CPU side simpler
(and thus reducing driver's CPU overhead), forces us to
generate multiple VERTEX_BUFFER_STATE packets pointing to
the same buffer, but with slightly shifted BufferStartingAddress.
This is fine on GFX version >= 11 and < 8, but on version
8 and 9, thanks to internal details of the VF cache, vertex
data from each VERTEX_BUFFER_STATE is cached separately
and this results in a lot of cache misses.
Disabling this fast path restores previous performance levels.
On my SKL GT2 machine this improves performance in:
- GLB27 Egypt offscreen by 37%
- GLB27 TRex offscreen by 22%
- gfxbench5 Manhattan offscreen by 1.75%
- gfxbench5 Manhattan31 offscreen by 1.9%
- gfxbench5 Aztec Ruins high by 2.3%
In Intel performance CI on GFX version 9 it improves performance in:
- gfxbench5 Manhattan offscreen by 1.65%
- gfxbench5 Aztec Ruins normal offscreen by 1.33%
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3277
Fixes: 42842306d3 ("mesa,st/mesa: add a fast path for non-static VAOs")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9857>
A little low-stakes RE effort as I unwind from fighting CI all day. Comes
from diffing dEQP-VK.clipping.user_defined.clip_distance.vert.* on the
blob and comparing to a6xx behavior. (My blob doesn't do tess, so if
there are equivalent tess fields for some of these, I didn't find them)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9870>
It seems that when the linker optimized the IO it might leave gaps
in the driver output locations, therefore set the size of the output
array based on the highest driver location occupied by a store_output.
Fixes#4520
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9879>
They are not supported by the hardware.
Fixes:
dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.buffer_0_32_short3_vec4_dynamic_draw_quads_1
dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.buffer_0_32_short3_vec4_dynamic_draw_quads_256
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9877>
The indirect draw call already encodes the index bias so that no
additional encoding in the hardware is needed in this case.
This fixes a regression with a number of tests from
dEQP-GLES31.functional.draw_indirect.random.*
Fixes: c6c532faa8
"gallium/u_vbuf: use updated pipe_draw_start_count while using draw_vbo"
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9877>
It seems that this has to be 4 (radeonsi also sets this value to 4)
Fixes deqp-gles31:
functional.texture.texture_buffer.modify.bufferdata.offset_1_alignments
functional.texture.texture_buffer.modify.bufferdata.offset_7_alignments
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9877>
This is the final step of this little journey, which allows us to no
longer deal with the shader-slot mapping inside the NIR to SPIR-V
translation.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9836>
These locations should match what we get out of the pre-existing pass,
and are currently unused. They will be used in a later commit in this
series, but is left like this because this code is fairly fragile.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9836>
pan_blend_shaders.{c,h} files are removed from Makefile.sources
in order to avoid the following building error:
FAILED: out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_pipe_panfrost_intermediates/pan_blend_shaders.o
...
clang: error: no such file or directory: 'external/mesa/src/gallium/drivers/panfrost/pan_blend_shaders.c'
clang: error: no input files
Fixes: 7994929e84 ("panfrost: Use the blend shader cache attached to the device")
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9894>
now that the results are automatically being copied to internal buffers,
mapping the buffers will handle any fencing that's required
the only caveat is that the query still needs to be flushed for a result
to be returned
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9811>
50 is very small and triggers constant stalls; this should probably be
tuned more at some point in the future since even this value is small in
some scenarios
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9811>
this adds internal qbos to each query which record the results of queries incrementally,
meaning that any time we want the result of a query we just have to read the
buffer back
in particular this opens up some possibilities for further optimization, specifically with
actual qbos where we can now pass the internal qbo off to a compute shader to
mimic the check_query_results() handling and avoid having to actually sync or
do a cpu read
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9811>