Gather outputs in advance to save both output data and type. Output data
is used for streamout and gfx11 param export. Output type is used for
streamout latter.
The output info will also be used for nir vertex export in the future.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Singed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
When slot has no param offset, we should not emit store output for
them on gfx11.
Fixes: abe2e99e9e ("ac/nir/ngg: gs support 16bit outputs")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
Follow the spec, all outputs even not this stream need to be
reset after emit vertex.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
Because we called nir_lower_io_to_temporaries which ensure the
store output and emit vertex in the same block.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
Driver workarounds for game bugs can be easily broken. This one
shouldn't be applied to meta shaders and this restores previous logic.
Fixes: da32cbb5c6 ("aco: fix missing uses of MRT output flags")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20637>
Right now, the WSI core supports copying WSI images to a linear buffer
for implementations that want the result in this form. This being said,
most of the blit logic can be re-used for image to image copies, and that's
exactly what we'll need if we want to hook-up DXGI swapchains in the
win32 WSI implementation. So let's rename a few fields so we no longer
imply that images are copied to a buffer, and the use_buffer_blit boolean
an enum so we can extend the implementation to support image -> image
copies.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
It looks like the LLVM assembler promotes true 16-bit instructions to VOP3
in this case.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
With PS epilogs compiled on-demand (for some dynamic states), they need
to be emitted outside of the graphics pipeline path. Also keep track
of the last emitted PS epilog to avoid redundant emission.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20201>
With PS epilogs on-demand, the non-compacted color format field won't
come from the pipeline and it seems easier to introduce a new dirty
flag for re-emitting this state.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18552>
The scratch allocation alignment on GFX11 is small enough that this should
help. Would be nice to someday remove this hack completely though.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>
This lets us use less scratch if both VGPR spilling and scratch intrinsics
are used.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>
fossil-db (gfx1100):
Totals from 112 (0.08% of 134574) affected shaders:
Scratch: 1513472 -> 1455360 (-3.84%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>
p_cvt_f16_f32_rtne will be lowered to v_cvt_f16_f32 and we already know that
preserves the high bits.
I tested the others on GFX1036.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20574>
This makes these instructions free when considering pipeline statistics
and s_delay_alu insertion.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>
Use gpr_map to determine how many cycles each dependency of the
s_delay_alu needs. This information helps the pass avoid further
s_delay_alu instructions.
fossil-db (gfx1100):
Totals from 13097 (9.73% of 134574) affected shaders:
Instrs: 30711894 -> 30702692 (-0.03%)
CodeSize: 153462500 -> 153425692 (-0.02%)
Latency: 372758612 -> 372741922 (-0.00%)
InvThroughput: 50164111 -> 50160717 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>
The driver re-emits the tessellation domain origin state when a new
pipeline with tessellation is bound, so this can be moved there.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20486>
The winding order can be different between pipelines.
Fixes new dEQP-VK.pipeline.pipeline_library.dynamic_control_points.change_*_winding.
Fixes: f22290949d ("radv: add support for dynamic tessellation domain origin")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20486>
With dynamic color blend equations, dual-src blending will be
determined from the dynamic state, better to move it there now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20517>