The LLVM backend does this when lowering ordered_xfb_counter_add_amd. I
guess there is some missing dependency checking or something.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19345>
NGG streamout lowering creates some idiv instructions that need to be
lowered.
No fossil-db results because it's currently broken.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19364>
This probably doesn't fix anything in practice because GDS is only
used for the number of generated primitives by GS and meta operations
don't use GS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19348>
In RadeonSI, they are packed but not in RADV, so don't rely on driver
locations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19365>
NIR has two implementations of lower_idiv, keyed on the
imprecise_32bit_lowering flag. This flag is misleading: the results when
setting this flag "imprecise", they're completely wrong for some values.
If a backend has a native implementation of umul_high, the correct path
isn't that much more expensive. If it doesn't, it's substantially slower
for highp integer divison... but in practice, non-constant highp integer
division is pretty rare.
After a painful migration of the tree, this code path has no more users.
Remove it so nobody else gets the bright idea of using it again.
Closes: #6555
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>
TES rel patch id is <256, so we can use an existing unused LDS
byte instead of extra dword.
To ease the programing, change the index of repacked_arg_vars
for these variables.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
vertex primtive id and passthrough are not exclusive, just need
to get correct vertex index when passthrough.
radeonsi won't disable passthrough when vs primitive id output,
this is also for fixing the crash of the assertion.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
Use lds base load intrinsics in nir ngg lowering to get layout, left
its calulation to driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
radeonsi need to use packed driver location for all outputs,
while radv need to use VARYING_SLOT_*. To meet both drivers'
needs.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
According to LLVM, s_sendmsg_rtn(GET_REALTIME) should be used instead
of s_memrealtime.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19267>
radeonsi may disable it. gfx_level will also be used by latter
vertex param export when gfx11.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17457>
According to PAL, there is more restrictions that RADV doesn't have.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19278>
RGP expects a pipeline's shaders to be all stored sequentially, eg:
[vs][ps][gs]
As such, it assumes a single bo is dumped to the .rgp file, with
the following info:
* va of the bo
* offset to each shader inside the bo
For radeonsi, the shaders are stored individually, so we may have
a big gap between the shaders forming a pipeline => we can produce
very large file because the layout in the file must match the one
in memory (see the warning in ac_rgp_file_write_elf_text).
This commit implements a workaround: gfx shaders are re-exported as a
pipeline.
To update the shader address, a new state is created (sqtt_pipeline),
which will overwrite the needed _PGM_LO_* registers.
This reduces DeuxEX rgp captures from 150GB+ to less than 100MB.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18865>
Apparently the TLS constructor doesn't work well if RADV
is instantiated multiple times and/or used by a program with
already existing threads.
Fixes: a128d444cb ('aco: use monotonic_buffer_resource for instructions')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19219>
For used by different counter.
Vulkan:
1. VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT,
sum generated primitives of all 4 streams when GS.
2. VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT, count generated
primitives for all 4 streams when VS/TES/GS.
3. VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT, count generated
and streamout primitives for all 4 streams when VS/TES/GS.
OpenGL:
1. GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB, sum generated
primitives for all 4 streams when GS.
2. GL_PRIMITIVES_GENERATED, count generated primitives for all 4
streams when VS/TES/GS.
3. GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, count streamout
primitives for all 4 streams when VS/TES/GS.
pipeline_stat_query_enabled_amd is for Vulkan 1 and OpenGL 1.
xfb_query_enabled_amd is for Vulkan 2/3 and OpenGL 2/3.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>
User can enable/disable multi VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT
queries with same or different index.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>
Copy the information from the first predecessor then check whether
it matches other predecessors and modify the data accordingly.
Marked for backporting to stable to make it possible to also
backport fixes based on this.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18103>
According to perf, this roughly halves the impact of the post-RA
optimizer in ACO's compile times.
Measurement was taken using a debug optimized build using
NIR_DEBUG=novalidate RADV_DEBUG=nocache and replaying the Fossil DB
from the Doom Eternal shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18103>