The evauation of loading the AR register was off by one, so that
splitting an ALU group could actually happen after AR was loaded
resulting in a failure to lower to assembly.
Fixes: d617052db6 ("r600/sfn: take address loads into account when scheduling")
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36742>
Without this update a very long ALU block may not be splitted as
required and lowering to assembly may fail because the maximum
supported length of a ALU CF is overrun.
Fixes: 6aafa2bb49 ("r600/sfn: Split ALU blocks in scheduler to fit into 128 slots")
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36742>
Calling vkCmdBindDescriptorBuffersEXT() does not invalidate previously
set descriptor sets. Move the state dirtying to
vkCmdSetDescriptorBufferOffets.
Fixes: ab7641b8dc ("anv: implement descriptor buffer binding")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36768>
If an application switches back and forth between descriptor sets and
descriptor buffers before executing a draw/dispatch, we could end up in
a wrong state due to pending_db_mode not getting updated.
Fixes: ab7641b8dc ("anv: implement descriptor buffer binding")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36768>
These information need to consider that pipelines/shaders in the same
IES struct might slightly differ. They will be used to determine the
preprocess buffer size in a better way.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36753>
I completely missed that it's required for the application to pass the
IES struct in vkGetGeneratedCommandsMemoryRequirementsEXT. Also any
changes to the IES struct requires to call it again.
This will allow us to do more optimizations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36753>
layout->push_constant_mask is only the DGC push constant mask (ie. the
tokens that are specified), but with IES all push constants are emitted
from the DGC shader. So it should be the total range of push constant.
This used to work by luck due to the preprocess buffer alignment.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36753>
../../src/amd/compiler/aco_util.h:300:9: note: ambiguity is between a regular call to this operator and a call with the argument order reversed
300 | bool operator==(const monotonic_buffer_resource& other) { return buffer == other.buffer; }
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36722>
DRM syncobjs always let you wait repeatedly on them, so we can set the
flag in the core instead of having each driver override it once they try
to enable the emulated timeline semaphores.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36563>
Previously some KeplerA chips failed various dEQP tests when instruction
scheduling was enabled.
In particular, `memory_model.message_passing` had issues where a
`membar` instruction canceled some in-flight predicate writes, and
`barrier.write_image_tess_control_read_image_compute.image_128_r32_uint`
had issues around the `Cont` instruction.
This patch refines instruction scheduling to better match the output of
nvcc. Fixing the various dEQP failing tests.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13528
Fixes: c35990c4bc ("nak: Add real instruction dependencies for Kepler")
Signed-off-by: Lorenzo Rossi <snowycoder@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36393>
Optimizations cutting down on GPRs often lead to the akward situations
where RA being more restricted and having to insert more mov instructions
pumping up the instruction counts.
In order to give developers more reliable stats we just set the max_gprs
to the next multiple of 8 including taking hw reserved registers into
account.
This does not impact occupancy in any way despite the increase in gprs.
Totals:
CodeSize: 920980864 -> 914748784 (-0.68%); split: -0.69%, +0.02%
Number of GPRs: 3544248 -> 3879749 (+9.47%)
Static cycle count: 217345431 -> 216414194 (-0.43%); split: -0.50%, +0.07%
Totals from 78493 (89.58% of 87622) affected shaders:
CodeSize: 795883088 -> 789651008 (-0.78%); split: -0.80%, +0.02%
Number of GPRs: 3108571 -> 3444072 (+10.79%)
Static cycle count: 187450578 -> 186519341 (-0.50%); split: -0.58%, +0.08%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36514>
The warning is:
../../src/util/dbghelp.h:900:10: warning: the current #pragma pack alignment value is modified in the included file [-Wpragma-pack]
900 | #include <pshpack4.h>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36708>