This moves some ops down to when they're needed, generally reducing the
number of temps in use. It's not always a win -- sometimes you can end up
moving a generator of a component used by a nir_op_vec down, which means
that op's sources stay live while the vec (whose register likely gets
coalesced with the ops creating it) is also live. But it's generally
good.
softpipe results:
temps in affected programs: 18115 -> 18026 (-0.49%)
imm in affected programs: 19 -> 22 (15.79%)
r300 results:
instructions in affected programs: 174 -> 178 (2.30%)
vinst in affected programs: 156 -> 160 (2.56%)
sinst in affected programs: 54 -> 50 (-7.41%)
temps in affected programs: 2634 -> 2169 (-17.65%)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14096>
This brings us into parity on state tracker paths with most other
supported drivers, and a lot of additional optimization on our shaders.
Results on a subset of shader-db that doesn't crash:
instructions in affected programs: 59502 -> 47991 (-19.35%)
vinst in affected programs: 17633 -> 15197 (-13.82%)
sinst in affected programs: 9296 -> 7319 (-21.27%)
flowcontrol in affected programs: 627 -> 310 (-50.56%)
presub in affected programs: 4220 -> 1554 (-63.18%)
temps in affected programs: 5775 -> 8570 (48.40%)
lits in affected programs: 215 -> 37 (-82.79%)
The temps (register usage) increase is unfortunate, but it seems that
instruction counts tend to be our limit before reg counts are.
Fixes: #3354
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14096>
1. We can create resource with size of "1" on drm, because size
is not passed to the renderer.
2. We can't create resource with size of "1" on vtest, because shmem
is created based on that.
3. If renderer supports copy_transfer_from_host, then use staging
buffer for transfer in both ways to and from host.
This will allow to reduce memory consumption in the guest.
v2:
- add inline function for checking if we can use this optimization
- add check in readback path. If renderer doesn't support
copy transfer from host, then we need to go with previous
path in readback (through transfer_get ioctl)
v3:
- fix logic for readback
v4:
- refactor the implementation to integrate it more to
existing code base
v5:
- reuse COPY_TRANSFER3D in both directions
v6:
- encode direction in COPY_TRANSFER3D if host supports it
v7:
- renamed cap bit
- introduced COPY_TRANSFER3D_SIZE_LEGACY define
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13689>
Previously, the number of previously encoded frames the encoder handled
was 1 - the encoder now supports many more encoded pictures, so the
encoder now has to keep track of multiple reconstructed pictures.
v2: Add a check to make sure an array index is not negative (Boyuan)
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13915>
Sets the not_referenced parameter to be the same as the previously
hardcoded frontends/va value (false) to ensure UVD/VCE encoding
functionality remains unaffected by the change in frontends/va code.
This commit will eventually be reverted once more testing is completed.
Fixes: a90802ef644 ("frontends/va/enc: allow for frames to be marked as (not) referenced")
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13915>
Base the number of reconstructed pictures the encoder allocates based on
the number of reference pictures to be used for encoding. Also move the
calculation and allocation of reconstructed pictures to VCN 1, from VCN
2.
v2: Add back the accidentally deleted
'two_pass_search_center_map_offset' (Boyuan)
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13915>
Packed headers for the H.264 encoder has not been implemented yet, so
don't report packed header support for the H.264 encoder
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13915>
Since we typically use an ALU op to set the condition modifier for the
IF-BRK-ENDIF, we were particularly likely to remove the increment of the
loop counter!
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14117>
We only have one bit of negate, so we have to make sure that immediate
usage has matching negates on all used channels (or rewrite to do so).
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14117>
If we're not making progress (which the function was already giving us!),
then there's no need to recompute the list. Reduces
pixmark-piano/7.shader_test compile time from 50 seconds to 10.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14117>
include it explicitly in the correct places
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14104>
This adds a bunch of other headers in, and adds mtypes.h to iris
for perf query object.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14104>
Inferring from blob's cmdstream the size of shader instruction
cache for:
- a630 is 64
- a650 is 128
- a660 is 128
On a650 and a660 gpu could hang if we exceed the limit. Though
it is not reproducible with computerator or a single amber
test. Also while blob limits the size to 128 - Turnip still
hangs with it but does not hang with the limit of 127.
On a630 there seem to be no hang when limit is exceeded.
Fixes the hang of compute shader in Alien Isolation on a650/a660.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14044>
This allows us to force all shaders to offer shader features only
provided to compatibility shaders.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14040>
When targeting the blitter or compute engines, the destination is not
really a render target. But it's still useful to know whether we're
talking about the source or destination.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14094>
This may not be a complete set, as I haven't been able to run dEQP-GLES2
to completion (GPU hangs at some point, no particular test seems to be
guilty). But this will help me assess NIR-to-TGSI for the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14092>