Unlike on an immidiate-mode renderer, Turnip only renders tiles on
vkCmdEndRenderPass. As such, we need to track all queries that were
active in a given render pass and defer setting the available bit
on those queries until after all tiles have rendered.
This commit adds a draw_epilogue_cs to tu_cmd_buffer that is
executed as an IB at the end of tu_CmdEndRenderPass. We then emit
packets to this command stream that update the availability bit of a
given query in tu_CmdEndQuery.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
Mostly a translation of freedreno's implementation of glEndQuery for
GL_SAMPLES_PASSED query objects with a slight modification to set the
availability bit of the query bo (slot->available) if the query was
not ended inside a render pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
General structure is inspired by anv's implementation in genX_query.c.
We define a packed struct that tracks sample count at the beginning of
the query and at the end; the result of the occlusion query is then
slot->end - slot->begin.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
Most places we actually know the usage and can provide it. There are
two exceptions to this:
1. We pass 0 into get_blorp_surf_for_anv_image when we use
ANV_IMAGE_LAYOUT_EXPLICIT_AUX because anv_layout_to_aux_usage is
never actually called so it doesn't matter.
2. We pass 0 into anv_layout_to_aux_usage in transition_color_buffer.
However, the coming commits which will begin using the usage
parameter only care about depth.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
Rather than looking at the aux usage, we look at the isl_aux_state which
provides us with more detailed information. This commit adds a couple
helpers to isl which let us quickly determine if we have valid depth/hiz
on the initial layout and if we need valid depth/hiz for the final
layout.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
This new helper maps VkImageLayout enums to isl_aux_state enums which
are the hardware's concept of image layouts. We can then use the aux
state to get the fast clear type and the aux usage. This should yield
no functional change in driver behavior.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
As of 52ad1712ed, TRANSFER_SRC_OPTIMAL and SHADER_READ_ONLY_OPTIMAL
are now identical for depth buffers so there's no reason why we need to
use the "wrong" layout. Technically, according to Vulkan, blits and
MSAA resolves are transfer ops so we should use the transfer layout now
that we can.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
When copying to an RGB surface, we treat it as an R only one of three
times the width, which may end up being larger than the maximum size
supported by the hardware and so it hits the shrink path. This forced
both source and destination surfaces to be shrunk, even though it's not
necessary for the former, and may even hit some assertions in some
cases, such as the surface being compressed.
Fixes several tests under dEQP-VK.api.copy_and_blit.core.image_to_image.dimensions.*
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3422>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3422>
This allows removing superfluous s_cselect instructions
that come from turning booleans into 64-bit vector condition.
v2 by Daniel Schürmann:
- Make the code massively simpler
v3 by Timur Kristóf:
- Fix regressions, make it work in wave32 mode
- Eliminate extra moves by not always using the SCC definition
- Use s_absdiff_i32 for uniform XOR
- Skip the transformation for uncommon or invalid instructions
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3450>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3450>
v5: rebase on float_controls changes
v7: rebase after shader args MR and load/store vectorizer MR
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
For GS copy shaders, whether we want to do exports is conditional. By
explicitly marking the end blocks, we can mark an IF's then branch as an
export block and ensure that's where the assembler inserts null exports.
v6: only fixup exports in the end block, like before
v8: simplify some code
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
GS is the same on GFX6, but GFX6 isn't fully supported yet.
v4: fix regclass
v7: rebase after shader args MR
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
v2: implement GFX10
v3: rebase
v7: rebase after shader args MR
v8: fix gs_vtx_offset usage on GFX9/GFX10
v8: use unreachable() instead of printing intrinsic
v8: rename output_state to ge_output_state
v8: fix formatting around nir_foreach_variable()
v8: rename some helpers in the scheduler
v8: rename p_memory_barrier_all to p_memory_barrier_common
v8: fix assertion comparing ctx.stage against vertex_geometry_gs
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
ACO lowers output derefs which breaks the shader_info pass used by gs copy
shader creation.
v3: rebase
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
Because ac_build_optimization_barrier() overwrites the original
src_type, we have to keep track of it before emitting that barrier.
Otherwise, wrong conversions are expected for pointers or small
bitsizes.
By doing this, we no longer need to do the cast dance in
ac_build_readlane_no_opt_barrier(), it was just necessary for
ac_build_optimization_barrier().
This fixes a bunch of crashes with subgroups related tests when
RADV_DEBUG=checkir is enabled, and it also fixes a compiler crash
with The Surge 2.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2395
Fixes: 0f45d4dc2b ("ac: add ac_build_readlane without optimization barrier")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3535>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3535>
This is not always ->rgbBits, because there are cases where that could
be 32 but we're (legally) bound to a depth-24 pixmap. The important
thing to have match here is the actual server-side notion of depth. You
can look this up (at modest expense) from the xlib visual info if the
fbconfig has a visual. But it might not, so if not, fetch it (at
slightly greater expense) from XGetGeometry. Do this at GLX drawable
creation so you don't have to do it on the SwapBuffers path.
Apparently this fixes glx/glx-swap-singlebuffer, which is unintentional
but quite pleasant.
Fixes: mesa/mesa#2291
Fixes: 90d58286 ("drisw: Fix and simplify drawable setup")
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3305>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3305>
This gets a lot of the hard code converted over to the new macros,
resulting in (I feel) much more readable code with
LESS_SHOUTING_ABOUT_THE_REG(). I decided to consistently put the reg on
its own line, so that all the register names line up.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
This introduces some minor unpacking of the temporary fd_reg_pair structs
to code that previously was packing a whole register field.
In the pack wrapper in tu_cs.h, I added some explanatory docs, dropped the
relocs handling since we don't need it, and removed the extra regs[] in
the __ONE_REG() macro (which was causing gcc's optimizer to fall on its
face in my release build).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
Sometimes you want to zero out an address by supplying a NULL BO, but
without this we would end up only emitting one dword. Increases size of
fd6_gmem.o by .8%, though it's not clear to me why (no obvious terrible
codegen happening)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
The file name is taken from the environment variable
PANDECODE_DUMP_FILE, defaulting to pandecode.dump if it is not set.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3525>