We set the number of task shader ring entries in radv_device
based on the generous assumption that each CU can run task/mesh
shaders with maximum occupancy.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Mostly the same as for compute shaders, but with a few extras:
task_ring_offsets:
Same as what ring_offsets is to graphics shaders.
Contains an address that points to a buffer that contains
the ring buffer descriptors.
task_ring_entry:
Index that can be used to address the draw and payload rings.
draw_id:
Same meaning as in graphics shaders.
task_ib_addr/task_ib_stride:
Indirect buffer address and stride from the draw calls.
These are used to emulate the firstTask feature of NV_mesh_shader.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
This is going to be used by both task and mesh shaders for
accessing the draw and payload ring buffers.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Task shaders store their output payload to VRAM where mesh
shaders read from. There are two ring buffers:
1. Draw ring: this is where mesh dispatch sizes and
the ready bit are stored.
2. Payload ring: this is where the optional payload
is stored (up to 16K per task workgroup).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Otherwise, it would mutate `fneg(fadd(-0, 0))` into `fadd(0, -0)` which
isn't correct since -0 + (+0) = +0 + (-0) = +0.
This fixes the OpenCL contraction tests on Iris.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16041>
This may slightly increase perf somewhere because the hardware can now
pre-cache binding tables. The real feature is that INTEL_DEBUG=bat now
dumps out surface states for compute.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15759>
SM5 requires swizzles for 64 bits alu source to be either .xyzw,
.xyxy, .zwxy, or .zwzw. If the swizzles are not in the valid pattern,
move the source according to the specified swizzle to a temporary register
first.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16464>
Atomics on Valhall work basically the same as on Bifrost, however the
instruction selection is simplified as there are no clauses. Support the
simplified set of atomic instructions.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
As opposed to LEA_ATTR_TEX. In principle we could do this for Bifrost too, but
let's keep the Midgard compatible path for now.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
This should be kept to only things aco uses, and expanded when
radeonsi support is added. Things should be removed if lowered in NIR.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
Avoid unnessecary work on FOREIGN queue release barriers. If we can't modify
the image there can't be a situation where we need to update the presentable
dcc data.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16371>
DrmDevice::create_all correctly opened the node with O_RDWR, while
DrmDevice::create was not, causing failure to create writable buffer.
Fixes pps-config on Freedreno.
Fixes: 1cc72b2aef
("pps: Gfx-pps v0.3.0")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16406>
This pattern pops up a bunch and the semantics of nir_channels() aren't
very convenient much of the time. Let's add a nir_trim_vector() which
matches nir_pad_vector().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>
Because we pull the RT from the variable location and use that to look
up formats, we need a constant RT index. To deal with arrays (possibly
of arrays), we would either need to handle array derefs (we don't today)
or we need to require the variables to be split into one variable per
RT. Given that we have to lower indirect derefs anyway (to get constant
indices), we may as well require the client to split output variables by
calling nir_lower_io_arrays_to_elements_no_indirect().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>
This was set as an attempt to detect when the GPU got hung, and let
our hangcheck detect this issue and kill the run. This however has
not been working so well, so let's try without it.
Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16423>
nir_texop_samples_identical is lowered to
nir_texop_fragment_mask_fetch_amd earlier.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16426>