This requires adding a nop in the relates v-slot, and the readport
valiation seems to be broken for this case, so drop this for now.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21357>
Propagating the indirect load to more instructions would result
in more address load instructions. This would (a) remove the advantage
of eliminating one move, and (b) introduce more latency, because between
address load and use two cycles must pass.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21357>
The instruction that is split may still be referenced as extra
dependency in other instructions, so add a handle to the instruction
that it can be set to be scheduled.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21357>
This shader originates from Granite 3D engine and has been adapted
to be used with Open GL and some GLSL ES specifics.
GLSL ES adaptation:
- remove Vulkan specifics: EXT_samplerless_texture_functions usage,
specialization constants, push constant usage
- inline bitextract.h
- always DECODE_8BIT and hardcode error color (for now)
- port to GLSL ES, required some type changes, explicit type
conversions and setting up precisions for types
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19886>
Generates required resources for ASTC texture decoding pass.
Partition table resources will be cached in to hash during runtime
as one is required for each block size.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19886>
Commit introduces ASTC decoding lookup tables from Granite 3D engine.
These lookup tables will be used during transcoding by a compute
shader in later commits when decoding ASTC textures.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19886>
Using 8191.875 seems to big for the hardware to correctly render wide
rectangular lines. This can also be reproduced with AMDVLK by forcing
rectangularLines = True, and fixed by reducing the maximum size as well.
Other drivers seem to expose that value.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21287>
In the upcoming header update, vk_layer.h starts including vulkan_core.h
instead of vulkan.h. This will break this layer as it needs a couple of
window-system extension #defines.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
VK_LAYER_EXPORT is going away in the next Vulkan header update. We
already have a PUBLIC macro in util/macros.h which does the same thing.
Unlike VK_LAYER_EXPORT, it should work in Windows too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
This switches us to using get_all_required() for figuring out which
enum types we care about and then carefully filtering every value as
needed. We also add a number field to Extension so we keep all the
extension XML parsing in one place.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
We now use get_all_required() to get all required commands and use that
to filter instead of doing it manually. Also, we can pull entrypoint
extension etc. information from the requirements struct. Finally, we
also have to filter the actual commands themselves as well as arguments
per-API because there may be multiple versions or variants depending on
the API being used.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
This searches for the names of everything of a particular type: command,
enum, etc. and returns a Requirements struct with any core version and
extensions that require it.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
This adds an Extension.from_xml() helper for doing the parsing so we can
re-use it in other code. We also improve filtering of extensions. The
Vulkan XML schema is changing to make the supported attribute a comma-
separated list. This is to allow for vulkansc to also exist in the XML
schema.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
These are a left-over from when these classes were used by ANV to define
extension enables in python. They haven't been used since we added
extension table structs and move extension enables to C.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21225>
This generally gives us better results and doing it here in nir will
also allow us to remove more glsl optimisation calls that do a similiar
thing for us.
(Updated shader-db results by idr.)
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20246333 -> 20240715 (-0.03%)
instructions in affected programs: 235253 -> 229635 (-2.39%)
helped: 425 / HURT: 114
total cycles in shared programs: 891730115 -> 891631113 (-0.01%)
cycles in affected programs: 37347925 -> 37248923 (-0.27%)
helped: 952 / HURT: 692
total spills in shared programs: 7072 -> 6716 (-5.03%)
spills in affected programs: 505 -> 149 (-70.50%)
helped: 7 / HURT: 0
total fills in shared programs: 9897 -> 8511 (-14.00%)
fills in affected programs: 1674 -> 288 (-82.80%)
helped: 7 / HURT: 0
total sends in shared programs: 1053685 -> 1053411 (-0.03%)
sends in affected programs: 2821 -> 2547 (-9.71%)
helped: 30
HURT: 2
LOST: 13
GAINED: 13
Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 18149157 -> 18147271 (-0.01%)
instructions in affected programs: 204630 -> 202744 (-0.92%)
helped: 294 / HURT: 121
total cycles in shared programs: 939488196 -> 939508444 (<.01%)
cycles in affected programs: 36394777 -> 36415025 (0.06%)
helped: 718 / HURT: 620
total sends in shared programs: 1005426 -> 1005152 (-0.03%)
sends in affected programs: 2821 -> 2547 (-9.71%)
helped: 30 / HURT: 2
LOST: 2
GAINED: 2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19715>
This is based on brw_nir_lower_mem_access_bit_sizes() but ended up being
substantially different. While the core concepts are all the same, the
brw_* version made a lot of Intel-specific assumptions. The new version
takes a callback which takes a number of bytes of data and an alignment
pair and returns a bit size and number of components to load/store.
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21232>
This makes the pass significantly faster cutting execution time
by around 30% in the cts test
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
This 30% improvement is in addition to all the improvements from
the proceeding patches.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20381>
As per the previous commit if we don't reuse these dynamic arrays
we end up needlessly thrashing the memory handling functions.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20381>
Due to how this pass works we can end up thrashing memory if we
do not reuse these hash tables rather than reusing them.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20381>
Previously the pass was comparing every deref to every load/store
causing the pass to slow down more the larger the shader is.
Here we use a hash table so we can simple store everything needed
for comparision of a var separately.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20381>
The fix in 947f7b452a introduced an extra loop over the copies
array to find the correct entry in the case it had been moved.
The problem is these loops can be iterated over millions of times
so lets simply update the entry pointer in the case we change its
location in the array.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20381>
We loop, effectively, over two stacks: ready and to_do and finish only
when both are empty. In the case where ready is empty, we pull one off
of to_do, add a copy to a temporary, and push it onto the ready stack.
Previously, we assumed that we would never get to the temporary copy
case if to_do has exactly one entry because that would imply that there
was only one copy left which means there can't possibly be a cycle to
break. This was true until c7fc44f9eb ("nir/from_ssa: Respect and
populate divergence information") which changed things such that
temporary copies sometimes get added in the case where a convergent
value is copied both to convergent and divergent destinations.
This patch adjusts our loop iteration to always attempt to clear the
ready stack before checking if there's anything left on the to_do stack.
I also added an assert to make the exit condition more clear.
Fixes: c7fc44f9eb ("nir/from_ssa: Respect and populate divergence information")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8037
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21315>
There is an optimization in the parallel copy algorithm where, after a
copy has been performed, we can treat the destination as the new source
for future copies of the same source. In particular, consider the
following parallel copy: A -> B, C -> A, A -> C. In this case, after we
have done the A -> B copy, we can make note that the value in A is now
in B and emit the sequence: A -> B, C -> A, B -> C. This allows us to
resolve the swap cycle between A anc C without allocating a temporary
register because we know B is also a copy of A.
When one of the registers involved is convergent and the other is
divergent, this optimization is problematic because, while convergent to
divergent copies are fine, we can't re-use the divergent copy in later
copies if any of those copies are to a convergent variable. We could,
but it would require a read_first_invocation which would get messy. In
In c7fc44f9eb ("nir/from_ssa: Respect and populate divergence
information"), we attempted to deal with this by limiting the rename
optimization to the case where the divergence matched.
The problem is that we did the re-name part whenever the divergence
matched but only marked it as ready if the thing being copied was a
destination. (We actually left two instances of loc[a] = b, one which
always happened and one which only happened if we also wanted to flag
the source as being ready to use as a destination.) While this
technically doesn't cause any problems, it may result in more inter-mov
dependencies which hurts instruction scheduling. For example, if we had
the parallel copy A -> B, A -> C, A -> D, we now end up emitting the
sequence A -> B, B -> C, C -> D which has many more data hazards between
instructions caused by the constant shuffling.
This commit restores the original logic in which we only perform the
rename optimization if the rename would free up a register we will later
use as a destination. This isn't entirely optimal as it still doesn't
prove that there is a cycle involved first, but it should lead to a
reduction in unnecessary dependencies.
No shader-db changes on SKL or DG2
Fixes: c7fc44f9eb ("nir/from_ssa: Respect and populate divergence information")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21315>
For long-lived stateobjs, it is common to re-emit to the same submit
multiple times. By giving each submit a unique sequence # we can detect
this case and skip the extra append_bo().
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
It is a pretty common pattern to allocate a non-zero sequence # for
lightweight checking if an object is the same, changed, for use in cache
keys, etc. (And also pretty common to forget to handle the rollover
zero case.) Add a helper for this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
Now that we are not tracking cross-context batch dependencies, there is
no scenario where one context could trigger flushing another context's
batch. So we can drop the batch lock intended to protect against this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
Avoid taking screen unlock for batch unref. Instead just split the
destroy fxn into locked and unlocked variants. That way we only end
up taking the screen lock on final unref but avoid it in the common
case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>
PROG state mostly just disables various LRZ related flags, which can
be handled as a simple mask. The exception is ztest mode, which is
either overriden by PROG state, or we use the all 1's value (which
isn't valid from hw standpoint) to signal that it needs to be computed
at draw time, which fortunately fits in with the bitmask approach.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21274>