This pass rarely makes any changes, so work a little harder to preserve
more meta data.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.2% ± 0.1% (n = 5, pooled s = 0.431885).
v2: Add some parenthesis. Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Two linked list management changes:
- Use the list head sentinel as the initial cursor. It is, after all, a
proper node in the list.
- Iterate the list of blocks starting with the second block instead of
skipping the first block in the loop.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-0.24% ± 0.09% (n = 5, pooled s = 0.324106).
v2: Use nir_cursor instead of direct list manipultion. Suggested by
Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-1.09% ± 0.084% (n = 5, pooled s = 0.354471)
Reduces the size of a release build by 26k.
text data bss dec hex filename
23163641 400720 231360 23795721 16b1809 before/lib64/dri/iris_dri.so
23137264 400720 231360 23769344 16ab100 after/lib64/dri/iris_dri.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Since one of the register must always be either VGRF or FIXED_GRF, much
of regions_overlap and reg_offset can be elided.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.29% ± 0.097% (n = 5, pooled s = 0.361697).
Using a release build, improves performance of compiling shaders from
batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s =
0.178312).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
This function only exists in builds with assertions, so it only matters
there.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-5.2% ± 0.16% (n = 5, pooled s = 0.657887).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
Reduce debug log spam by only logging the shader if a pass made some
changes. This can also elide some nir_validate calls in debug builds.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
The version that takes a list of instructions is not used. I did not do
any archaeology to find out when the last user was removed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
MTL support fp64, but not int64. The fsign(double(x))*FOO optimization
would try to use a 64-bit int xor operation to conditionally toggle
the sign bit off the result.
Since this only affects high bit of the result, we can do a 32-bit
move of the low dword, and a 32-bit xor on the high dword.
Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp64.input_args.modf_denorm_flush_to_zero
on MTL.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22259>
Using divergence analysis, figure out when SSBO & shared memory loads
are uniform and carry the data only once in register space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
Having references to inst->src[X] when you're also modifying
inst->src[X] is a receipe for disaster. Making changes to the lowering
code I've been bitten quite a few times by this take copies of all
sources to do the lowering.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
Ensure that the register's liveness is not expanded to loops.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
Those SENDs are still doing a full register write. We just inserted
some predication for a workaround.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
There are a number of instances of the dead code elimination pass that
could reduce the count. For some reason this also seems to affect
register allocation itself.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
These helper functions will only get invoked for GFX < 11 and the
L3BypassDisable field is present starting from GFX12+.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22275>
Commit ca4ec49b0e took primitive ID override fields in to use, this
has to be checked as part of Wa_14015297576.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22282>
Anything Gfx12.5+ has a different format.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 90c86fe63e ("intel: add MTL performance metrics")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22257>
src/intel/perf/intel_perf_query_layout.c needs a valid kmd type to
look at the metrics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 757e2dd692 ("intel/perf: Disable it for Xe KMD")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22257>
This is a bit more wordy, but it will greatly simplify some future
changes.
v2: Rebase on ADD3 changes.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
It's a lot more useful to have it in the same stream with the
INTEL_DEBUG=fs output.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
Android CTS 13_r4 tests dEQP-VK.memory.allocation.random* fail
with VK_ERROR_OUT_OF_DEVICE_MEMORY on ADL boards with 32GB memory
as memory allocation requests from DEQP are much larger(~2.9GB+)
based on device heap size/8.
Increase the limit to unsigned 32bit max(~4GB) which helps to
fix the dEQP-VK.memory.allocation.random* tests.
v1: Bound allocation by the largest memory heap size (Lionel Landwerlin)
v2: Clean up comments to reflect the code change (Ivan Briano)
Update the value of MAX_MEMORY_ALLOCATION_SIZE (Lionel Landwerlin)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22066>
Not fixing anything, but required for another fix.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22066>
BSpec specifies default value for CFE_STATE.OverDispatchControl
is 2, or 50% overdispatch. No observed performance impact.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22179>
On Gen12.5+, we only need to emit CFE_STATE when scratch space
has changed, not on every pipeline binding. Also, only grow the
scratch space, never shrink it. Need to reset after secondary buf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22179>
Also fix missing trace point stuff in command buffer begin/end
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22179>
The CS Stall in cmd_buffer_flush_compute_state is only required
on HW that uses MEDIA_VFE_STATE, gen12 and previous.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22179>
Let's bump the binding table value guess to 32 instead of 8 and also stop
decoding the binding table entries as soon as we hit the zero
initialized value in the map.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22221>
Indeed, the parameter "mem_ctx" was not processed.
For instance, this issue is triggered with the crocus driver and
"piglit/bin/shader_runner tests/spec/arb_tessellation_shader/execution/compatibility/tes-clip-vertex-different-from-position.shader_test -auto -fbo":
SUMMARY: AddressSanitizer: 235216 byte(s) leaked in 48 allocation(s).
Fixes: 96ba0344db ("intel: Use common helpers for TCS passthrough shaders")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22173>