First, make all key_size functions take nr_samplers and nr_sampler_views
separately so we ensure both get passed in. Second, rework the offset
helpers to take MAX(nr_samplers, nr_sampler_views) so we get the image
param offset correct if nr_samplers < nr_sampler_views. While we're
here, also re-order the size calculations to be in the same order as the
things land in memory.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16435>
For doing performance investigation, I often find it useful to have a "are
we tripping over any of our performance TODOs?" flag, so add it and use it
in a few of the TODOs.
This also greatly cleans up the deqp-vk logs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16316>
We set the number of task shader ring entries in radv_device
based on the generous assumption that each CU can run task/mesh
shaders with maximum occupancy.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Mostly the same as for compute shaders, but with a few extras:
task_ring_offsets:
Same as what ring_offsets is to graphics shaders.
Contains an address that points to a buffer that contains
the ring buffer descriptors.
task_ring_entry:
Index that can be used to address the draw and payload rings.
draw_id:
Same meaning as in graphics shaders.
task_ib_addr/task_ib_stride:
Indirect buffer address and stride from the draw calls.
These are used to emulate the firstTask feature of NV_mesh_shader.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
This is going to be used by both task and mesh shaders for
accessing the draw and payload ring buffers.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Task shaders store their output payload to VRAM where mesh
shaders read from. There are two ring buffers:
1. Draw ring: this is where mesh dispatch sizes and
the ready bit are stored.
2. Payload ring: this is where the optional payload
is stored (up to 16K per task workgroup).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Otherwise, it would mutate `fneg(fadd(-0, 0))` into `fadd(0, -0)` which
isn't correct since -0 + (+0) = +0 + (-0) = +0.
This fixes the OpenCL contraction tests on Iris.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16041>
This may slightly increase perf somewhere because the hardware can now
pre-cache binding tables. The real feature is that INTEL_DEBUG=bat now
dumps out surface states for compute.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15759>
SM5 requires swizzles for 64 bits alu source to be either .xyzw,
.xyxy, .zwxy, or .zwzw. If the swizzles are not in the valid pattern,
move the source according to the specified swizzle to a temporary register
first.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16464>
Atomics on Valhall work basically the same as on Bifrost, however the
instruction selection is simplified as there are no clauses. Support the
simplified set of atomic instructions.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
As opposed to LEA_ATTR_TEX. In principle we could do this for Bifrost too, but
let's keep the Midgard compatible path for now.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
This should be kept to only things aco uses, and expanded when
radeonsi support is added. Things should be removed if lowered in NIR.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
Avoid unnessecary work on FOREIGN queue release barriers. If we can't modify
the image there can't be a situation where we need to update the presentable
dcc data.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16371>
DrmDevice::create_all correctly opened the node with O_RDWR, while
DrmDevice::create was not, causing failure to create writable buffer.
Fixes pps-config on Freedreno.
Fixes: 1cc72b2aef
("pps: Gfx-pps v0.3.0")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16406>