this should enable compression on more intermediate fb attachments
it also means that VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT can now be set
on images where ZINK_BIND_MUTABLE is not set, so non-resource APIs need
to check ZINK_BIND_MUTABLE
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23514>
Calculate the hash outside the critical region, then use that both
for search and insertion.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23280>
Use the local variable in the assertions, move them out the critical region.
No behavior change.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23279>
The element type passed is different than the array type and it is not
a "base type" in the glsl_type sense, so pick a name that reflects that.
Also stick to a single name for the array_size.
Just renames, no behavior change.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23279>
The hardware gets given a session context from userspace in each
submission, but if the session context changes the hardware wants
a FENCE to be emitted to know it can give up the current session.
IF a test submits interleaved session ctx access and uses a single
vulkan submit the hardware crashes, unless each IB is submitted
in a separate submission so the fence can be sent.
In theory it could be possible to construct a single command buffer
to trigger this so I do think the hardware should be smarter here.
Should this be fixed in the kernel to always emit a fence between
IBs?
Fixes: dEQP-VK.video.decode.h264_interleaved
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23641>
There's no reason to differentiate between primitive types and structs here. `cl_prop_for_struct`
can handle primitive types just fine.
Drop `cl_prop_for_type` and rename the existing `cl_prop_for_struct` to `cl_prop_for_type`.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
It's a layering violation and really the wrong tool for the job. Add a new fn to view a given slice
as a &[u8] instead of going though the clprop machinery which creates a new Vec.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
These are mostly just obvious patterns that somebody will eventually
want to add.
DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar
results (Ice Lake shown)
total instructions in shared programs: 20570033 -> 20570026 (<.01%)
instructions in affected programs: 7363 -> 7356 (-0.10%)
helped: 6 / HURT: 0
total cycles in shared programs: 902118781 -> 902118854 (<.01%)
cycles in affected programs: 419132 -> 419205 (0.02%)
helped: 4 / HURT: 2
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819500 -> 152819380 (-0.00%)
Cycles: 15014627187 -> 15014624437 (-0.00%)
Totals from 115 (0.02% of 662497) affected shaders:
Instrs: 28963 -> 28843 (-0.41%)
Cycles: 404582 -> 401832 (-0.68%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
v2: Fix a copy-and-paste bug s/('find_lsb', a)/a/ in the patterns. See
piglit!819.
DG2, Tiger Lake, Ice Lake, Skylake, and Broadwell had similar results (Ice Lake shown)
total instructions in shared programs: 20570063 -> 20570033 (<.01%)
instructions in affected programs: 452 -> 422 (-6.64%)
helped: 30 / HURT: 0
total cycles in shared programs: 902118723 -> 902118781 (<.01%)
cycles in affected programs: 1762 -> 1820 (3.29%)
helped: 0 / HURT: 29
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819969 -> 152819500 (-0.00%)
Cycles: 15014628652 -> 15014627187 (-0.00%); split: -0.00%, +0.00%
Totals from 469 (0.07% of 662497) affected shaders:
Instrs: 7644 -> 7175 (-6.14%)
Cycles: 31787 -> 30322 (-4.61%); split: -4.90%, +0.29%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
The needs of this pass are ever so slightly more than what
nir_opt_algebraic can do. :( Specifically, it needs to be able to look
at the relationship of constant values used in an expression tree.
v2: Add nir_mov_alu to handle swizzles on the original sources.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
Only retry when there's some kind of non-job failure, such as
runner-internal issues, or API/network issues, etc. If the job itself
fails or times out, then given the length of these jobs, there's no
point trying again and just tying up the job slots for even more hours.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23108>
Rather than always retrying, only retry jobs on a limited set of causes.
This notably excludes retries when a job is stuck due to lack of runners
to schedule it; if we can't get a slot on a runner in time, there's no
reason to try again, since our window of opportunity has gone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23108>
In zink-on-anv fs-mod-dvec3-dvec3.shader_test, we were memsetting 2MB of
last_grf_write 2400 times, multiple times through the scheduler. Just
resetting for the processed instructions reduces runtime from 21s to 16s.
No change on steam shader-db runtime across several runs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
No need to re-calloc it per block when we're going to use it again. Also,
this fixes the vec4 backend to avoid allocating giant grf_count-sized
arrays on the stack.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
We were zeroing it out per block, but it doesn't actually help to count
per block, since the question is "will scheduling this instruction free
the reg?". Saves some memsetting, which was showing up high in the
profile (but not from this source).
No change on iris SKL shader-db.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
The compiler will use the unsigned bit pattern of the check and combine this
with the 1 bit, which will always result in use_sb to be zero.
Fix this by making use_sb a bool
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23647>
Only found 3 shaders affected in Red Dead Redemption :
Totals from 3 (0.05% of 5969) affected shaders:
Instrs: 2246 -> 2230 (-0.71%)
Cycles: 156506 -> 148402 (-5.18%); split: -5.23%, +0.05%
This will have a larger effect when we add the
load_ubo_uniform_block_intel intrinsic where we will have larger
blocks (vec8/vec16 vs vec4 only now).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
This new approach handles things as follows:
1. Fences won't be attached to events anymore, applications only wait on
the cv attached to the event.
2. Only the queue is allowed to update event status for non user events.
This will eliminate all remaining status updating races between the
queue and applications waiting on events.
3. Queue minimized flushing by bundling events
4. Increase cv wait timeout as there is really no point in waking up too
often.
Reduces amount of emited fences on radeonsi in luxmark 3.1 luxball by 90%
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed by Nora Allen <blackcatgames@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23612>
Since commit 'c98ddc778a3 broadcom/compiler: force a last thrsw for spilling'
we always ensure we signal the last thread section explicitly with a
last thread switch.
Relying on VPM stores to detect the last thread section is particularly bad,
because we can have VPM stores occurring quite early in a shader program,
which would disable TMU spilling almost entirely.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22461>