split_store_data() splits a vector and p_as_uniforms it if needed.
scan_write_mask()/advance_write_mask() are similar to
u_bit_scan_consecutive_range(), but makes it easier to only clear part of
the range and will also give ranges for zero'd bits.
split_buffer_store() is a helper for splitting VMEM/SMEM stores.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4639>
This helper is used for recombining split loads, passing the result to
p_as_uniform, aligning the offset down and shifting it right if needed and
handling large constant offsets.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4639>
Eventually, we'll probably want to replace the current
RegClass(type, size) constructor with this.
This has a functional change in that get_reg_class() now creates v1/v2
instead of v4b/v8b.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4639>
Previously the setprio was inside the branch, so it would only reset
the priority on the first wave, but not the others.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4536>
The TCS inputs and outputs must always fit into the LDS,
which implies that their addresses also always fit 24 bits.
On AMD GPUs, 24-bit multiplication is much faster than 32-bit
multiplication, so we can take the opportunity to use that
for TCS I/O instead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4536>
Previously, it was just unreachable, which means it will generate
invalid shaders when it encounters a situation when it can't allocate
registers for eg. a large load.
This commit makes it slightly easier to notice such problems without
triggering a GPU hang.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4536>
Fixes random dEQP-VK.transform_feedback.fuzz.* crashes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 2dc550202e
('aco: copy-propagate p_create_vector copies of vectors')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4730>
Instead of copying the operands of the other p_create_vector and labelling
the definition with label_vec, copy the operands and label it with
label_temp so that it can be copy-propagated.
This was found while removing a redundant copy in load_input_from_temps()
which removed duplicate p_create_vector instructions.
shader-db (Navi):
Totals from 139 (0.11% of 127638) affected shaders:
VGPRs: 8472 -> 7948 (-6.19%)
CodeSize: 514592 -> 512368 (-0.43%)
MaxWaves: 1089 -> 1195 (+9.73%)
Instrs: 100214 -> 99658 (-0.55%)
Cycles: 400856 -> 398632 (-0.55%)
VMEM: 15545 -> 15338 (-1.33%)
Copies: 5140 -> 4584 (-10.82%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4667>
For copies like v[7:8] = v[8:9], what currently happens is:
- do_copy() will skip the second dword
- the uses of the second dword will be reduced to 0
- the copy operation will be removed from the map
and v8 will never be set to v9.
So just decrease the uses of other operations after splitting or removing
the current operation, so: "v8 = v9" will be split off, it's uses reduced
and then the new copy will be done in the next iteration.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4686>
This simplifies code and helps some shaders
Totals from affected shaders:
Code Size: 51227172 -> 51202216 (-0.05 %) bytes
Max Waves: 19955 -> 19948 (-0.04 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
This simplifies definition handling and
helps a few shaders
Totals from affected shaders:
Code Size: 659540 -> 659376 (-0.02 %) bytes
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
The optimizer isn't yet updated to handle this, since lower_to_hw_instr
will be the only user for now.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>
Is simpler and helps a couple of shaders.
Totals from affected shaders: (Vega)
Code Size: 16341296 -> 16335460 (-0.04 %) bytes
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4642>
It's like the layer, it has to be exported via the pos and also
as a varying if the fragment shader reads it.
Fixes dEQP-VK.draw.shader_viewport_index.fragment_shader_*
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4564>