Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists. That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.
To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.
However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.
There's a bit of churn, but this is largely a mechanical set of changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
Previously we used PC to set query data to 0 during
CmdResetQueryPool. This was slow when clearing large query pools.
Switching to blorp to clear pools is faster for large query pools.
Red Dead Redemption 2: +1.5% speedup
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22178>
All the flushes should already have happened, we just need CS to wait
for the operations to complete. Just use a MI_SEMAPHORE_WAIT to check
the availability bit is set.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22178>
This pass rarely makes any changes, so work a little harder to preserve
more meta data.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.2% ± 0.1% (n = 5, pooled s = 0.431885).
v2: Add some parenthesis. Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Two linked list management changes:
- Use the list head sentinel as the initial cursor. It is, after all, a
proper node in the list.
- Iterate the list of blocks starting with the second block instead of
skipping the first block in the loop.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-0.24% ± 0.09% (n = 5, pooled s = 0.324106).
v2: Use nir_cursor instead of direct list manipultion. Suggested by
Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-1.09% ± 0.084% (n = 5, pooled s = 0.354471)
Reduces the size of a release build by 26k.
text data bss dec hex filename
23163641 400720 231360 23795721 16b1809 before/lib64/dri/iris_dri.so
23137264 400720 231360 23769344 16ab100 after/lib64/dri/iris_dri.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Since one of the register must always be either VGRF or FIXED_GRF, much
of regions_overlap and reg_offset can be elided.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.29% ± 0.097% (n = 5, pooled s = 0.361697).
Using a release build, improves performance of compiling shaders from
batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s =
0.178312).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
This function only exists in builds with assertions, so it only matters
there.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-5.2% ± 0.16% (n = 5, pooled s = 0.657887).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Reduce debug log spam by only logging the shader if a pass made some
changes. This can also elide some nir_validate calls in debug builds.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
The version that takes a list of instructions is not used. I did not do
any archaeology to find out when the last user was removed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
MTL support fp64, but not int64. The fsign(double(x))*FOO optimization
would try to use a 64-bit int xor operation to conditionally toggle
the sign bit off the result.
Since this only affects high bit of the result, we can do a 32-bit
move of the low dword, and a 32-bit xor on the high dword.
Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp64.input_args.modf_denorm_flush_to_zero
on MTL.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22259>
Using divergence analysis, figure out when SSBO & shared memory loads
are uniform and carry the data only once in register space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
Having references to inst->src[X] when you're also modifying
inst->src[X] is a receipe for disaster. Making changes to the lowering
code I've been bitten quite a few times by this take copies of all
sources to do the lowering.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
Ensure that the register's liveness is not expanded to loops.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
Those SENDs are still doing a full register write. We just inserted
some predication for a workaround.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
There are a number of instances of the dead code elimination pass that
could reduce the count. For some reason this also seems to affect
register allocation itself.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
These helper functions will only get invoked for GFX < 11 and the
L3BypassDisable field is present starting from GFX12+.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22275>
Commit ca4ec49b0e took primitive ID override fields in to use, this
has to be checked as part of Wa_14015297576.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22282>
Anything Gfx12.5+ has a different format.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 90c86fe63e ("intel: add MTL performance metrics")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22257>
src/intel/perf/intel_perf_query_layout.c needs a valid kmd type to
look at the metrics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 757e2dd692 ("intel/perf: Disable it for Xe KMD")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22257>
This is a bit more wordy, but it will greatly simplify some future
changes.
v2: Rebase on ADD3 changes.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
It's a lot more useful to have it in the same stream with the
INTEL_DEBUG=fs output.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
Android CTS 13_r4 tests dEQP-VK.memory.allocation.random* fail
with VK_ERROR_OUT_OF_DEVICE_MEMORY on ADL boards with 32GB memory
as memory allocation requests from DEQP are much larger(~2.9GB+)
based on device heap size/8.
Increase the limit to unsigned 32bit max(~4GB) which helps to
fix the dEQP-VK.memory.allocation.random* tests.
v1: Bound allocation by the largest memory heap size (Lionel Landwerlin)
v2: Clean up comments to reflect the code change (Ivan Briano)
Update the value of MAX_MEMORY_ALLOCATION_SIZE (Lionel Landwerlin)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22066>
Not fixing anything, but required for another fix.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22066>