On asahi, we can still specialize based on the shader key and get
everything folded. But this gives drivers the option to make it
dynamic if they wish.
Co-authored-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
There are a few sysvals which exist just so we can specialize them based
on shader keys or linking. In the case where we can't specialize them,
this provides a pass which loads them from the appropriate poly_*_param
struct.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
The important bit here is that we move agx_update_vs() to before we
build up the poly_*_params so we have access to the final linked vertex
shader.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
It makes more sense here along with the output buffer. I think this
should be squashed with the previous commit (and not sure it works
without).
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
We have access to the poly_vertex_state from the GS so we might as well
use it. Asahi uses a single poly_vertex_state for VS and TCS and just
assumes the tessellator stalls before we update it for TCS. If a driver
wants to use two separate poly_vertex_state buffers, it will be the
driver's responsibility to make the system values return the right one.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
Instead of having the vertex output buffer be a system value and
something the driver needs to manage, put it in poly_vertex_param. We
already need to have it somewhere GPU-writable so we can write it from
indirect setup kernels. Instead of manually allocating 8B all over the
place just to hold this one pointer, stick it in poly_vertex_param.
This also lets us get rid of a NIR intrinsic.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
For vertex shaders, it comes from the preamble. For geometry and
tessellation shaders it comes straight from the root table.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
We're about to put more than just input assembly data in there so the
name will make a lot more sense. Also, add a comment to make it more
clear that this buffer applys to both VS and TES.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
Otherwise we get a lot of individual x/y/z stores to tesslevels when
we should really just be storing the whole thing at once.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
The newly rewritten remap_tess_levels_legacy will have already lowered
anything it cares about to URB intrinsics. So the generic remapping
pass won't see them, as it operates on generic input/output intrinsics.
This also drops some of the callback boilerplate we needed temporarily.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
This unifies the dynamic (SSO) and fixed (linked together) versions.
We emit piles of NIR as if we were doing the dynamic version, but
replace the tess config field access with constant values. It all
should optimize away back to something reasonable. We lower these
directly to URB read/write intrinsics.
It also rewrites the dynamic version to directly read/write the URB
rather than going through temporaries. The old version was broken
in that tessellation control shader invocations can technically use
the shared output area for cross-invocation data sharing with barriers,
although doing so using the built-in tesslevel patch outputs is very
unlikely.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We were using this for indirect loads of the shader input thread
payload, but there's no reason we can't use it for constant access
too. In this case we can just MOV from the ATTR file directly
without a special opcode that turns into MOV_INDIRECT later.
We also allow it to load multiple components now. This is useful
for say, returning vec4 pushed inputs. And, we allow it in more
stages than just the fragment stage.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We're going to change the intrinsic to a load(...) which puts "load" in
the name. Also, it's just more consistent with our usual terminology.
We also rename the corresponding backend opcode so they remain matched.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
These are a ton more convenient. When the TCS and TES were linked
together, the legacy layouts were a hassle, but didn't impose any
significant cost. With unlinked TCS and TES, the legacy layouts
involve significant runtime code for scrambling the data, whereas
the reversed layouts are substantially less overhead.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
st/nir lowers this for iris, and brw_link_shaders lowers this for anv,
but for unlinked tessellation control / evaluation shaders, the lowering
was not happening for TCS.
Just do it unconditionally when lowering TCS outputs and TES inputs.
This lets the remapping code just assume vectors all the time, rather
than getting single component stores with nir_intrinsic_component set
(which came from nir_lower_io lowering compact arrays).
This also requires changes to the dynamic unlinked TCS/TES lowering to
temporaries, which needs to use vectors rather than arrays with this
change. That code is going away in future patches anyway, but this
keeps it going for now to avoid interim breakage.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We now call remap_tess_levels before remap_non_header_patch_urb_offsets.
The latter already excludes tess levels anyway, so the order shouldn't
matter.
This paves the way for remap_tess_levels to skip handling some header
values in certain cases, because with reversed layouts, many of them no
longer need any special handling and we can just let the generic pass
handle them.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Just have a single remap_tess_levels that does either the
statically-known-primitive or the dynamic (unlinked) mode.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Our current legacy patch header layout handling doesn't actually care
which is which slot, and remaps everything to its correct spot anyway.
For using the newer "reversed" patch header layouts, it will be more
convenient to have outer as slot 0, and inner as slot 1, as that just
works with no special remapping needed for both quads and triangles
(but unfortunately isolines are still a pain).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
These work the same regardless of stage.
v2 (Ken): Rebase, move from mesh to all stages, add reorderable load
variant, allow channel masks to be non-constant even on Xe2.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Be consistent with lowering that happens after, so that it gets a full
vector register and can stride into it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We add several new intrinsics for accessing URB handles:
- load_urb_output_handle_intel
- load_urb_input_handle_intel
- load_urb_input_handle_intel_indexed
The latter is used by stages like TCS and GS where each input control
point has a unique handle. The index is which ICP to read from. The
others are for most stages, where all inputs or outputs are accessed
via a single handle.
Then we have URB load and store operations, split for Xe2+ (URB via LSC)
and earlier (HDC OWord messages):
- load_urb_vec4_intel
- load_urb_lsc_intel
- store_urb_vec4_intel
- store_urb_lsc_intel
The legacy vec4 variants take a handle and a 128-bit OWord offset as
sources. Additionally, stores take a set of channel enables to mask
off and avoid writing vec4 components. We don't use the WRITE_MASK
const-index as our channel enables are not required to be constant.
The Xe2+ LSC variants are simpler. Handles are byte offsets into the
URB memory region, and offsets are expressed in bytes. So we simply
add them into a single "address" source. We don't support writemasks
here, as they aren't really necessary with the better addressability.
(Plus, the store_cmask operations work significantly differently than
the previous HDC OWord messages). We will lower disjoint writemasks
to multiple stores.
Based on earlier code by Lionel Landwerlin.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Current extent can be 0xFFFFFFFF (-1) which indicates there is no current extent, and the swapchain will inherit that of which the application provides.
Check this before applying the hack.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31134>
only these case were not reporting anything
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38602>
Noticed renderdoc complaining about our size :
RDOC 692028: [18:08:18] vk_core.cpp(2272) - Warning -
VkPhysicalDeviceDescriptorBufferPropertiesEXT.imageCaptureReplayDescriptorDataSizeis too large at 32
(must be <= 16), can't support capture of VK_EXT_descriptor_buffer
Since we only need 2 pointers (main + private), we can shrink this to
16bytes. The 1/2 planes have a relative offset from the base.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38625>
If the threshold of the linear mipmap fallback for compressed
format is reached at a different mipmap level than the
size-compatible non-compressed formats the image can be viewed as,
then we have to disable the fallback. Otherwise, for some levels,
texels would be read from the wrong locations due to the tiling
mismatch.
NOTE: Prop driver falls back to LINEAR in this case.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38655>