They use same instruction. Just because when the time
nir_load_smem_buffer_amd was introduced, radeonsi didn't support
pass buffer descriptor to nir_load_ubo directly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22523>
With independent sets, we're not able to compute immediate values for
the index at which to read anv_push_constants::dynamic_offsets to get
the offset of a dynamic buffer. This is because the pipeline layout
may not have all the descriptor set layouts when we compile the
shader.
To solve that issue, we insert a layer of indirection.
This reworks the dynamic buffer offset storage with a 2D array in
anv_cmd_pipeline_state :
dynamic_offsets[MAX_SETS][MAX_DYN_BUFFERS]
When the pipeline or the dynamic buffer offsets are updated, we
flatten that array into the
anv_push_constants::dynamic_offsets[MAX_DYN_BUFFERS] array.
For shaders compiled with independent sets, the bottom 6 bits of
element X in anv_push_constants::desc_sets[] is used to specify the
base offsets into the anv_push_constants::dynamic_offsets[] for the
set X.
The computation in the shader is now something like :
base_dyn_buffer_set_idx = anv_push_constants::desc_sets[set_idx] & 0x3f
dyn_buffer_offset = anv_push_constants::dynamic_offsets[base_dyn_buffer_set_idx + dynamic_buffer_idx]
It was suggested by Faith to use a different push constant buffer with
dynamic_offsets prepared for each stage when using independent sets
instead, but it feels easier to understand this way. And there is some
room for optimization if you are set X and that you know all the sets in
the range [0, X], then you can still avoid the indirection. Separate
push constant allocations per stage do have a CPU cost.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
RADV used these to emulate firstTask for NV_mesh_shader.
They are no longer needed.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22139>
This new intrinsic maps to the MTBUF instruction format on AMD GPUs
and represents a typed buffer load in NIR.
Also add an unsigned upper bound for the new intrinsic.
Code for that ported from aco_instruction_selection_setup.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16805>
This will emulate VGT_ESGS_RING_ITEMSIZE, which does the multiplication
for us. It's beneficial to stop setting VGT_ESGS_RING_ITEMSIZE to reduce
context rolls, and also the register will be removed in the future.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
It's unused and if we ever want to use it again we should make it an alu
opcode instead.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21445>
Hoping that I didn't miss any, this *should* add assertions
to all functions and passes which explicitly handle 'nir_loop'.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
Like nir_texop_fragment_mask_fetch_amd, this is used to load multi
sample image fmask data for AMD GPU.
We will lower multi sample image load and samples_identical intrinsics
to use it latter for radeonsi. RADV does not need this because it
always expand fmask images before dispatch compute shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
Used by radeonsi for lower nir_atomic_add_gen/xfb_prim_count_amd.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18010>
These two values are not known when compile for radeonsi.
They are relocated when link/upload time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
For a follow up optimization, we would like to track scratch loads.
This isn't possible with global load/store intrinsics. So use a couple
of special intrinsic in the pass and only lower it to global
intrinsics at the end.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
For used by different counter.
Vulkan:
1. VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT,
sum generated primitives of all 4 streams when GS.
2. VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT, count generated
primitives for all 4 streams when VS/TES/GS.
3. VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT, count generated
and streamout primitives for all 4 streams when VS/TES/GS.
OpenGL:
1. GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB, sum generated
primitives for all 4 streams when GS.
2. GL_PRIMITIVES_GENERATED, count generated primitives for all 4
streams when VS/TES/GS.
3. GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, count streamout
primitives for all 4 streams when VS/TES/GS.
pipeline_stat_query_enabled_amd is for Vulkan 1 and OpenGL 1.
xfb_query_enabled_amd is for Vulkan 2/3 and OpenGL 2/3.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>
These intrinsics will be used to lower NGG attributes to memory on
GFX11.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19173>
For radeonsi which load this from arg.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19166>
This will be used to load the number of rasterization samples when a
fragment shader is compiled inside a library without the MSAA state.
RADV needs to know the number of samples for loading sample positions
with interpolateAtSample().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18677>
Used by RADV/Radeonsi NGG culling. Pack them into a single vec4
load for radeonsi to reduce const buffer load.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
Used by AMD GPU NGG line culling. We could use nir load
line width and viewport scale to calculate this in shader,
but this way needs expensive divide ops.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
This adds support for global 64-bit GPU addresses as a pair of
32-bit values. This is useful for platforms with 32-bit GPUs
that want to support VK_KHR_buffer_device_address, which makes
GPU addresses explicitly 64-bit.
With the new format we also add new global intrinsics with 2x32
suffix that consume the new address format.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
Also add radv and radeonsi implementation. Will be used in tess lowering.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
This changes the trace rays logic to always use
VkTraceRaysIndirectCommand2KHR and implements
vkCmdTraceRaysIndirect2KHR. I renamed the
load_sbt_amd to sbt_base_amd and moved the SBT
load lowering from ACO to NIR.
Note that we can not just upload one pointer to
all the trace parameters because that would
be incompatible with traceRaysIndirect.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16430>
The mesh shader task ring is a buffer in VRAM which we will use to
store some mesh shader outputs that don't fit into LDS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16737>