For doing performance investigation, I often find it useful to have a "are
we tripping over any of our performance TODOs?" flag, so add it and use it
in a few of the TODOs.
This also greatly cleans up the deqp-vk logs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16316>
We set the number of task shader ring entries in radv_device
based on the generous assumption that each CU can run task/mesh
shaders with maximum occupancy.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Mostly the same as for compute shaders, but with a few extras:
task_ring_offsets:
Same as what ring_offsets is to graphics shaders.
Contains an address that points to a buffer that contains
the ring buffer descriptors.
task_ring_entry:
Index that can be used to address the draw and payload rings.
draw_id:
Same meaning as in graphics shaders.
task_ib_addr/task_ib_stride:
Indirect buffer address and stride from the draw calls.
These are used to emulate the firstTask feature of NV_mesh_shader.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
This is going to be used by both task and mesh shaders for
accessing the draw and payload ring buffers.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Task shaders store their output payload to VRAM where mesh
shaders read from. There are two ring buffers:
1. Draw ring: this is where mesh dispatch sizes and
the ready bit are stored.
2. Payload ring: this is where the optional payload
is stored (up to 16K per task workgroup).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Otherwise, it would mutate `fneg(fadd(-0, 0))` into `fadd(0, -0)` which
isn't correct since -0 + (+0) = +0 + (-0) = +0.
This fixes the OpenCL contraction tests on Iris.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16041>
This may slightly increase perf somewhere because the hardware can now
pre-cache binding tables. The real feature is that INTEL_DEBUG=bat now
dumps out surface states for compute.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15759>
SM5 requires swizzles for 64 bits alu source to be either .xyzw,
.xyxy, .zwxy, or .zwzw. If the swizzles are not in the valid pattern,
move the source according to the specified swizzle to a temporary register
first.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16464>
Atomics on Valhall work basically the same as on Bifrost, however the
instruction selection is simplified as there are no clauses. Support the
simplified set of atomic instructions.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
As opposed to LEA_ATTR_TEX. In principle we could do this for Bifrost too, but
let's keep the Midgard compatible path for now.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
This should be kept to only things aco uses, and expanded when
radeonsi support is added. Things should be removed if lowered in NIR.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
Avoid unnessecary work on FOREIGN queue release barriers. If we can't modify
the image there can't be a situation where we need to update the presentable
dcc data.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16371>
DrmDevice::create_all correctly opened the node with O_RDWR, while
DrmDevice::create was not, causing failure to create writable buffer.
Fixes pps-config on Freedreno.
Fixes: 1cc72b2aef
("pps: Gfx-pps v0.3.0")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16406>
This pattern pops up a bunch and the semantics of nir_channels() aren't
very convenient much of the time. Let's add a nir_trim_vector() which
matches nir_pad_vector().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>