[why]
Fix in 0 input stream, it would crash in accessing null param->streams.
[how]
- Only use stream_ctx->stream instead of param->stream in color
handling.
- Revise the geometric downscaling support.
Reviewed-by: Tomson Chang <tomson.chang@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Roy Chan <roy.chan@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31274>
[why]
The fix point conversion takes quite a bit of CPU time.
And if there are no changes, we don't need to convert all sw points
into hw points and generate the corresponding configs.
[how]
Introduce a config cache header so that all config caching handlings
can be done in the same way.
Luts won't be cached if it is in bypass mode, only cache when it is
non bypass and some dirty flag is set by the upper layer.
The config cache handling will re-apply the cached config if not
dirty and not in bypass mode.
Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Roy Chan <roy.chan@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31274>
As we don't have capacity for more parallelism atm, increase the
fraction of the CTS for those jobs.
job name average min
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
--- ---
radv-stoney-angle 16
radv-stoney-vkcts 15
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31414>
It is taking 19 min on average to run this job.
As we don't have more capacity, introduce a fraction of 2 and create the
`full` version to be run on nightly pipelines.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31414>
This is what callers actually want, and it simplifies nir_opt_remove_phis
because we can assume dominance meta data is valid.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>
s_bfe uses 7 bits for the size, so when we extract from -1,
we can get all possible lane masks in one instruction.
Foz-DB Navi31:
Totals from 38601 (48.62% of 79395) affected shaders:
Instrs: 13670163 -> 13509738 (-1.17%)
CodeSize: 68011644 -> 67368308 (-0.95%)
Latency: 61203404 -> 61065419 (-0.23%); split: -0.23%, +0.00%
InvThroughput: 6897028 -> 6894634 (-0.03%); split: -0.05%, +0.01%
SALU: 1491291 -> 1342553 (-9.97%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>
If the shader has multiple payload variables, split passes might not
preserve the order and this can cause the offsets used for the stores to
not match the payload offsets for nir_intrinsic_trace_ray.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31204>
There's never any need for anything higher. If this were too high (such
as NIR_ALIGN_MUL_MAX), it would have caused issues.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31204>
Instead of re-emitting some dynamic states when a new shader is bound,
only re-emit the user SGPR states. This is slightly more optimal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31160>
This doesn't change anything in practice because if we have a task
shader, we also have a mesh shader and the state was already re-emitted.
Though, this will prevent a regression from the upcoming patches because
the user SGPR layout will change.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31160>
Use the AMD_IMAGE_OPCODES=0 environment variable to test the lower image
opcode code path used by AMD CDNA/compute devices without graphics.
Runs a very limited subset of GL tests.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31180>
In wave64 for hw with native wave32, ds_append seems to be split in a load for
the low half and an atomic for the high half, and other LDS instructions can
be scheduled between the two.
Which means the result of the low half is unusable because it might be out of date.
I was only able to reproduce this issue in WGP mode, but be conservative and
apply the workaround in CU mode too.
Foz-DB Navi31:
Totals from 13 (0.02% of 79395) affected shaders:
Instrs: 7599 -> 7656 (+0.75%)
CodeSize: 39708 -> 39972 (+0.66%)
Latency: 83174 -> 83572 (+0.48%)
InvThroughput: 8271 -> 8357 (+1.04%)
Copies: 718 -> 717 (-0.14%)
VALU: 3689 -> 3703 (+0.38%)
SALU: 935 -> 965 (+3.21%)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11921
Fixes: 45e935800a ("aco: implement nir_shared_append/consume_amd")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31301>