RGP expects a pipeline's shaders to be all stored sequentially, eg:
[vs][ps][gs]
As such, it assumes a single bo is dumped to the .rgp file, with
the following info:
* va of the bo
* offset to each shader inside the bo
For radeonsi, the shaders are stored individually, so we may have
a big gap between the shaders forming a pipeline => we can produce
very large file because the layout in the file must match the one
in memory (see the warning in ac_rgp_file_write_elf_text).
This commit implements a workaround: gfx shaders are re-exported as a
pipeline.
To update the shader address, a new state is created (sqtt_pipeline),
which will overwrite the needed _PGM_LO_* registers.
This reduces DeuxEX rgp captures from 150GB+ to less than 100MB.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
As it does not access any symbols in util/u_cpu_detect.h but accessed symbols in "c11/threads.h"
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19266>
UTIL_MAX_CPUS is not used by u_thread.* anymore after commit
"util: replace UTIL_MAX_CPUS by util_cpu_caps.num_cpu_mask_bits"
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19266>
`pvr_get_hw_clear_color()` now packs formats based on the accum
format which is how the hw deals with the formats internally and
might not line up exactly with the vk format representation.
E.g. R5G6B5_UNORM_PACK16 uses the U8 accum format so the USC will
internally use 3 bytes (1 per component) to deal with it instead
of 2 bytes which the vk format specifies. On USC EMITPIX, the PBE
will pack the results to 2 bytes using PACKMODE_R5G6B5 resulting
in the final value being in the vk format representation.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19218>
This will be used later on to implement
vkCmdClearAttachments() where we'll need to pick the
appropriate clear color shader based on the dwords used.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19218>
The previous list of passes contained several errors: "constprop" does not
exist anymore and a trailing ',' is not allowed. This made LLVMRunPasses
fail, but the error is silently ignored. Thus none of the listed passes
ran at all.
https://reviews.llvm.org/D85159 suggests "InstSimplify really should be
used anywhere ConstProp is being used" so replace constprop with
instsimplify and remove the trailing comma.
By enabling pass logging with
LLVMPassBuilderOptionsSetDebugLogging(opts, true),
the difference is visible. Before:
Running pass: AlwaysInlinerPass on [module]
Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module]
Running analysis: ProfileSummaryAnalysis on [module]
Running pass: CoroConditionalWrapper on [module]
Running pass: AnnotationRemarksPass on fs_variant_partial (1162 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_partial
Running pass: AnnotationRemarksPass on fs_variant_whole (1110 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_whole
After:
Running pass: AlwaysInlinerPass on [module]
Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module]
Running analysis: ProfileSummaryAnalysis on [module]
Running pass: CoroConditionalWrapper on [module]
Running pass: AnnotationRemarksPass on fs_variant_partial (1162 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_partial
Running pass: AnnotationRemarksPass on fs_variant_whole (1110 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_whole
Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module]
Running pass: SROAPass on fs_variant_partial (1162 instructions)
Running analysis: DominatorTreeAnalysis on fs_variant_partial
Running analysis: AssumptionAnalysis on fs_variant_partial
Running analysis: TargetIRAnalysis on fs_variant_partial
Running pass: EarlyCSEPass on fs_variant_partial (1111 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_partial
Running pass: SimplifyCFGPass on fs_variant_partial (961 instructions)
Running pass: ReassociatePass on fs_variant_partial (961 instructions)
Running pass: PromotePass on fs_variant_partial (897 instructions)
Running pass: InstCombinePass on fs_variant_partial (897 instructions)
Running analysis: OptimizationRemarkEmitterAnalysis on fs_variant_partial
Running analysis: AAManager on fs_variant_partial
Running analysis: BasicAA on fs_variant_partial
Running analysis: ScopedNoAliasAA on fs_variant_partial
Running analysis: TypeBasedAA on fs_variant_partial
Running analysis: OuterAnalysisManagerProxy<llvm::ModuleAnalysisManager, llvm::Function> on fs_variant_partial
Running pass: SROAPass on fs_variant_whole (1110 instructions)
Running analysis: DominatorTreeAnalysis on fs_variant_whole
Running analysis: AssumptionAnalysis on fs_variant_whole
Running analysis: TargetIRAnalysis on fs_variant_whole
Running pass: EarlyCSEPass on fs_variant_whole (1059 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_whole
Running pass: SimplifyCFGPass on fs_variant_whole (912 instructions)
Running pass: ReassociatePass on fs_variant_whole (912 instructions)
Running pass: PromotePass on fs_variant_whole (844 instructions)
Running pass: InstCombinePass on fs_variant_whole (844 instructions)
Running analysis: OptimizationRemarkEmitterAnalysis on fs_variant_whole
Running analysis: AAManager on fs_variant_whole
Running analysis: BasicAA on fs_variant_whole
Running analysis: ScopedNoAliasAA on fs_variant_whole
Running analysis: TypeBasedAA on fs_variant_whole
Running analysis: OuterAnalysisManagerProxy<llvm::ModuleAnalysisManager, llvm::Function> on fs_variant_whole
Fixes: 2037c34f24 ("gallivm: move to new pass manager to handle coroutines change.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19217>
Apparently the TLS constructor doesn't work well if RADV
is instantiated multiple times and/or used by a program with
already existing threads.
Fixes: a128d444cb ('aco: use monotonic_buffer_resource for instructions')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19219>
For simple clients using the swap chain contention back pressure to regulate
their drawing and that don't query buffer age introduce a new DRI option with
which set to true (the default is false) we block the client until a new buffer
is available. This way we stall the client's execution until a new buffer is
available and the redrawing of the client starts only at this point and not
before.
The motivation for that is to reduce latency for clients that regulate their
drawing by swapchain contention back pressure. These clients draw whenever
possible and their drawing is implicitly stopping whenever we block. When we
block at the end of the swap and return only when a new buffer is available
the client can draw and we directly present. Otherwise the client would draw,
we block on the buffer becoming available, and only then show what the client
had drawn, usually one frame later.
Co-authored-by: Michel Dänzer <michel@daenzer.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14684>
This is known to produce bogus results for certain combinations of
operands, so don't use it. See this issue for details:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6555
None of the shaders in shaderdb are affected by this change, as all of
the shaders containing an integer division require a higher GLSL version
than vc4 supports and are therefore all skipped.
This is port of the v3d change 73e8fc3efb
to vc4; we expect the impact to be the same as for v3d, based on testing
a custom shader.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18019>
We can produce slightly better code for these in the backend, so
do that.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18019>
For used by different counter.
Vulkan:
1. VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT,
sum generated primitives of all 4 streams when GS.
2. VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT, count generated
primitives for all 4 streams when VS/TES/GS.
3. VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT, count generated
and streamout primitives for all 4 streams when VS/TES/GS.
OpenGL:
1. GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB, sum generated
primitives for all 4 streams when GS.
2. GL_PRIMITIVES_GENERATED, count generated primitives for all 4
streams when VS/TES/GS.
3. GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, count streamout
primitives for all 4 streams when VS/TES/GS.
pipeline_stat_query_enabled_amd is for Vulkan 1 and OpenGL 1.
xfb_query_enabled_amd is for Vulkan 2/3 and OpenGL 2/3.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>
User can enable/disable multi VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT
queries with same or different index.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>
some allocations require a different memory heap even when using the
same memory bits, so allow iterating over heaps of the same memory type
to find the one that works
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19281>
it's possible for drivers to have multiple heaps with identical flags,
so this will enable passing the heap that should actually be used for
allocation
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19281>
this is supposed to create the non-optimized pipeline first and then
compile the optimized version in the background, not the other way around
facepalm.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19201>
this is supposed to return the address at the start of the buffer,
not the address at the start of the memory allocation
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19274>
Mis-matched usage is a large percentage of buffer cache misses when
searching for applicable buffers. Separating these into separate
managers puts them into separate heaps and eliminates a significant
amount of CPU time spent searching.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19283>
This was similarly done for a3xx & a4xx with 4bdd226ab6
("freedreno/ir3: Switch to NIR for a3xx/a4xx's vertex id lowering.")
already, yet a5xx was missed which I noticed as Minecraft window
titlebar text corruption on OnePlus 5T with Adreno 540.
Cc: mesa-stable
Signed-off-by: Jami Kettunen <jami.kettunen@protonmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19263>
Wire up support for the two intrisics to get default tess levels (which
adds new HS driver params) and add a helper to generate a passthrough
TCS for a given VS. The passthrough TCS is cached in the VS, indexed by
patch_vertices, as the generated TCS is a function of the VS outputs and
the patch_vertices count.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>
If using a passthrough TCS shader, we can't rely on having a new TCS
stage bound, so we need to invalidate the TCS state when the # of
patch_vertices changes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>
Switch over to using the common nir helper.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>
Based on si_create_passthrough_tcs() as that seemed the most generic of
the various different backend driver implementations. Uses the
load_tess_level_outer_default and load_tess_level_inner_default
intrinsics to load the gl_TessLevelOuter and gl_TessLevelInner values,
so driver will somehow need to implement those to load the values set
by pipe_context::set_tess_state() or similar.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>
The job fits into 15 minutes of runtime, so deparalelize.
Stress-tested.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19211>