Add logical shift left and right operations support to mi_builder.
v1:
- Add GEN_GEN > 12 check (Jordan Justen)
- Add gen_mi_has_shift function (Jordan Justen)
- Fix commit title (Jordan Justen)
v2 (Jason Ekstrand):
- Add _imm versions of all of them
- Better handle corner-cases in _imm helpers
- Handle the power-of-two limitation for _imm versions
- Add tests
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9445>
The container is moved from before and hence returns size 0. To get the
correct value, the new instruction container must be used instead.
This was flagged by clang-tidy. The fixed call still triggers the
corresponding diagnostic, hence this change silences it by adding a
redundant clear() after move.
Fixes: 7f1b537304 ("aco: add new NOP insertion pass for GFX6-9")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9432>
This pass needs to run on the last shader in a pipeline writing
gl_Position. In GLES2, that's always the vertex shader, but in ES3.2, it
can be a geometry or tessellation shader. The shared code works the same
in this case, just make the assert more generous.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9444>
Now the texture virtual memory usage is less of a problem,
we can use this workaround permanently.
In the spirit of the API it's certainly not the proper way
of implementing DYNAMIC textures (it seems they are ok
to have hidden copies in driver managed memory, but not have
virtual addressing space reduced), but it makes sense for us,
both performance wise, and to avoid bugs.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9377>
On 32 bits, virtual memory is sometimes too short for apps.
Textures can hold virtual memory 3 ways:
1) MANAGED textures have a RAM copy of any texture
2) SYSTEMMEM is used to have RAM copy of DEFAULT textures
(to upload them for example)
3) Textures being mapped.
Nine cannot do much for 3). It's up to driver to really unmap textures
when possible on 32 bits to reduce virtual memory usage.
It's not clear whether on Windows anything special is done for
1) and 2). However there is clear indication some efforts have
been done on 3) to really unmap when it makes sense.
My understanding is that other implementations reduce the usage
of 1) by deleting the RAM copy once the texture is uploaded
(Dxvk's behaviour is controlled by evictManagedOnUnlock).
The obvious issue with that approach is whether the texture is
read by the application after some time. In that case,
we have to recreate the RAM backing from the GPU buffer.
And apps DO that. Indeed I found that for example Mass Effect 2
with High Texture mods (one of the crash case fixed by this patch serie),
When the character gets close to an object, a high res texture and replaces
the low res one. The high res one simply has more levels, and the game seems
to optimize reading the high res texture by retrieving the small-resolution
levels from the original low res texture.
In other words during gameplay, the game will randomly read MANAGED textures.
This is expected to be fast as the data is supposed to be in RAM...
Instead of taking that RAM copy eviction approach, this patchset
proposes a different approach: storing in memfd and release the
virtual memory until needed.
Basically instead of using malloc(), we create a memfd file
and map it. When the data doesn't seem to be accessed anymore,
we can unmap the memfd file.
If the data is needed, the memfd file is mapped again.
This trick enables to allocate more than 4GB on 32 bits apps.
The advantage of this approach over the RAM eviction one,
is that the load is much faster and doesn't block the GPU.
Of course we have problems if there's not enough memory to map the
memfd file. But the problem is the same for the RAM eviction approach.
Naturally on 64 bits, we do not use memfd.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9377>
We should only pass in a new indirect_info object if we actually set valid
values in it.
Fixes: abe8ef862f "gallium: make pipe_draw_indirect_info * a draw_vbo parameter"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9425>
In order for shader viewport index to be calculated correctly,
the cliptest code needs proper primitive lengths to work out
the provoking vertex. I half fixed this before for GL4 but looks
like I didn't make it all the way.
This fixes:
dEQP-VK.draw.shader_viewport*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9401>
Commit e99e7aa4 began passing start > 0 to indexed draw calls rather
than keeping start at 0 and manually advancing ib->ptr. This should
work fine, however, there have been instances of software fallbacks
not handling things right.
vbo_sw_primitive_restart had a bug where it was ignoring "start" and
always calling find_sub_primitives with start = 0 and end = ib->count.
This meant that when start > 0, it was analyzing the wrong part of the
index buffer when finding subprimitives.
In theory, each _mesa_prim can have a different "start" value. But
the code only calls find_sub_primitives once, because it wants to
map, analyze, and unmap the index buffer before calling ctx->Draw,
as some drivers don't support drawing with the index buffer mapped.
To handle this, we break vbo_sw_primitive_restart calls into sections
where "start" matches across all the primitives, similar to how I
handled the issue in tnl in commit bd6120f562.
In the common case, start matches and we handle it in one pass anyway.
Fixes Piglit's primitive-restart VBO_COMBINED_VERTEX_AND_INDEX test
and KHR-GL33.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
on Intel Ivybridge and older (which don't do arbitrary cut indices).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4052
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9417>
Use debug_printf more consistently, normalize formatting a bit, and
trace a few more places you're likely to care about.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9436>
Applications rarely require them, but this improves fossil-db replay time.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9411>