try_evict_regs might end up calling check_dst_overlap which only works
for dst regs. Make sure this doesn't happen for src regs.
Fixes: 34803d15ab ("ir3/ra: Add proper support for multiple destinations")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
The first assert happened before setting the current instruction which
caused the error message to refer to the previous instruction.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
I might be wrong about this one, but I don't see what it could be for.
We don't want to delete too much either, in case this script gets called
in the same job as another test suite.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29749>
Copy propagation often eliminates all uses of an instruction. If we
detect that we've done so, we can eliminate the instruction ourselves
rather than leaving it hanging until the next DCE pass.
This saves some CPU time as other passes don't see dead code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
The new def-based pass works better in many cases, and should be less
resource intensive. However, the limited visibility of the defs-based
pass due to many values not being SSA yet makes it unable to fully
replace the old pass. Try the new one, and if it can't make progress,
then try the old one. That way, things will mostly be handled by the
new pass, but everything that was being cleaned up still will be.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
While the limited visibility due to partial SSA is a downside to the new
pass, it has a huge number of advantages that make it worth switching
over even now. It's much more efficient, can eliminate redundant memory
loads across blocks, and doesn't generate loads of unnecessary copies
that other passes have to clean up. This means we also eliminate the
infighting between the old CSE, coalescing, and copy propagation passes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
This has a number of advantages compared to the pass I wrote years ago:
- It can easily perform either Global CSE or block-local CSE, without
needing to roll any dataflow analysis, thanks to SSA def analysis.
This global CSE is able to detect and coalesce memory loads across
blocks. Although it may increase spilling a little, the reduction
in memory loads seems to more than compensate.
- Because SSA guarantees that values are never written more than once,
the new CSE pass can directly reuse an existing value. The old pass
emitted copies at the point where it discovered a value because it
had no idea whether it'd be mutated later. This led it to generate
a ton of trash for copy propagation to clean up later, and also a
nasty fragility where CSE, register coalescing, and copy propagation
could all fight one another by generating and cleaning up copies,
leading to infinite optimization loops unless we were really careful.
Generating less trash improves our CPU efficiency.
- It uses hash tables like nir_instr_set and nir_opt_cse, instead of
linearly walking lists and comparing each element. This is much more
CPU efficient.
- It doesn't use liveness analysis, which is one of the most expensive
analysis passes that we have. Def analysis is cheaper.
In addition to CSE'ing SSA values, we continue to handle flag writes,
as this is a huge source of CSE'able values. These remain block local.
However, we can simply track the last flag write, rather than creating
entire sets of instruction entries like the old pass. Much simpler.
The only real downside to this pass is that, because the backend is
currently only partially SSA, it has limited visibility and isn't able
to see all values. However, the results appear to be good enough that
the new pass can effectively replace the old pass in almost all cases.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
Like NIR, we print SSA defs as %1, %2, and so on. The number here is
the VGRF number. VGRFs that don't correspond to a SSA def remain
printed as vgrf1, vgrf2, and so on.
This makes it much easier to see what values are SSA and which aren't.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
Even without a full use list, simply tracking the number of uses will
let us tell "this is the only use of the def" or "we've just replaced
all uses of a def". It's inexpensive to calculate and will be useful.
(rebased by Kenneth Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
This introduces a new analysis pass that opportunistically looks for
VGRFs which happen to satisfy the SSA definition properties.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
Our code to initialize gl_SubgroupInvocation uses multiple instructions
some of which are partial writes. This makes it difficult to analyze
expressions involving gl_SubgroupInvocation, which appear very
frequently in compute shaders.
To make this easier, we add a new virtual opcode which initializes
a full VGRF to the value of gl_SubgroupInvocation. (We also expand
it to UD for SIMD8 so there are not partial write issues.) We then
lower it to the original code later on in compilation, after we've
done the bulk of our optimizations.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
This gathers a number of sources into a contiguous vector register,
typically using LOAD_PAYLOAD. However, it uses MOV for a single source.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
The support is incomplete and largely untested, but more importantly
glsl ir is depreciated at this point. This feature was added to support
building additional passes but that shouldn't ever be needed from here
on.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29469>
for drivers that don't support PIPE_CAP_SHAREABLE_SHADERS,
the zombie shader mechanism is used, storing shaders to delete after
the next flush
the zombie mechanism also calls bind_*_state(pipe, NULL) during deletion,
however, which breaks drivers in the following scenario:
* create_all_shaders(pipe_A)
* bind_vs(pipe_A, vs_A)
* bind_fs(pipe_A, fs_A)
* draw(pipe_A)
* makeCurrent(pipe_B)
* delete_vs(pipe_B, vs_B)
* vs_B must only be deleted on pipe_A
* zombie_shader_add(pipe_A, vs_B)
* makeCurrent(pipe_A)
* free_zombie_shaders(pipe_A)
* bind_vs(pipe_A, NULL)
* delete_vs(pipe_A, vs_B)
* draw(pipe_A)
* boom
the problem being that bind_vs(pipe_A, NULL) was called when deleting
vs_B, but it was actually vs_A which was bound
to solve this, just flag the shader state for updating and let st figure it out
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11122
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29680>
Since nir_opt_varyings requires scalar IO and thus all drivers have to
scalarize it, this gives the option to re-vectorize IO after that.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29406>
These allow avoiding dead-locks in non-compliant applications that
execute barriers under non-uniform control flow. They're not expected
to have any major disadvantage so let's enable them unconditionally.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29562>
See also HSDES#14015504893 regarding the region-based tessellation
redistribution feature which allows fine-tuning the number of regions
per patch. This sets it to the recommended value, since region-based
redistribution is enabled by default.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29562>
This was caused when enabling VK_KHR_maintenance5 extension, but the
problem is fixed using a new Vulkan Loader.
Fixes: a589901328 ("v3dv: expose VK_KHR_maintenance5")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29756>
This came up while reviewing
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398 ... Possibly
this intrinsic should be renamed to load_smem_constant_amd for consistency with
load_global_constant. But if we're not going to convey constantness in the
intrinsic name, let's at least document the restriction, because NIR's optimizer
relies on it.
(I didn't inspect every call site, but it looks like load_smem_amd is just used
for descriptor loads so there's no bug to fix.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29743>