This is actually a no-op on AMD, so we really don't want to lower it to
something more complicated. There may be a more efficient way to do
this on Intel too. In addition, in the future we'll want to use this for
lowering boolean reduce operations, where the inverse ballot will
operate on the backend's "natural" ballot type as indicated by
options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In
total, there are now three possible lowerings we may have to perform:
- inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot
with natural source type, when the backend supports inverse_ballot
natively.
- inverse_ballot with source type of uvec4 from SPIR-V to arithmetic,
when the backend doesn't support inverse_ballot.
- inverse_ballot with natural source type from reduce operation, when
the backend doesn't support inverse_ballot.
Previously we just did the second lowering unconditionally in vtn, but
it's just a combination of the first and third. We add support here for
the first and third lowerings in nir_lower_subgroups, instead of simply
moving the second lowering, to avoid unnecessary churn.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
While earlier changes to pipe control emission allowed debug dump of
each pipe control, they also changed debug output to almost always print
same reason/function for each pc. These changes fix the output so that
we print the original function name where pc is emitted.
As example:
pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_batch_emit_pipe_control_write
pc: emit PC=( ) reason: gfx11_batch_emit_pipe_control_write
changes back to:
pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_emit_apply_pipe_flushes
pc: emit PC=( ) reason: cmd_buffer_emit_depth_stencil
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25282>
This is kept as a separate commit because the change looks like a lot
more than it it. The order of the two loops is swapped, then the two
loops are merged.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
Using a single data structure seems better. There's no appreciable
performance change. On batman_arkham_city_goty.foz, the difference
reported was 0.48%±0.36% (n=20). Several commits in the MR, including
some that should have no effect at all, reported similar changes. I
attribute this primarily changing of loop alignments and similar.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
On batman_arkham_city_goty.foz, this improves fossil-db time by
-3.83%±0.24% (n=20). This fossil takes the longest time of any in my
database.
v2: Add some comments for cmp_entry_src_entry_src and
cmp_entry_src_nr. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
The larger predicate here already requires that inst->opcode must be
BRW_OPCODE_MOV, so it can't BRW_OPCODE_SEL. With that removed, the
other simplifications are pretty straight forward.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
The caller already loops over the sources. This means that the caller
must loop over the sources in reverse because constant propagation
prefers to propagate into the last sources first.
The shader-db and fossil-db changes (below) are all due to SEL
instructions. Changing the order sources are visited changes whether a
SEL with two immediate sources is
(+f0.0) sel g12 IMM_A IMM_B
or
(-f0.0) sel g12 IMM_B IMM_A
The ordering of the sources affects the order the constant combining
encounters the values, and the determines which value is "combined"
and which value remains an immediate.
This affects the results by luck. If there are two instructions:
(+f0.0) sel g12 IMM_A IMM_B
(+f0.0) sel g13 IMM_A IMM_C
Picking IMM_A is advantageous over picking IMM_B and IMM_C. Since the
selection algorithm in constant combining is greedy, this case
requires the algorithm see the values in just the right order for the
right thing to happen.
v2: Rebase on many, many changes. Move instruction source fixup
reordering out or try_constant_propagate.
v3: Rebase on !7698.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
If the linked list structure used depended on the list head to know when
to terminate, this would be a pretty serious bug. If try_constant_propage
or try_copy_propagate make progress, inst->src[i].nr will change. This
results in the foreach_in_list using a different list header on later
iterations of the loop.
This causes two shaders in shader-db and 9 shaders in fossil-db to
change. Looking at the code changes, these are cases where there was a
copy of a copy that gets propagated. The part that confuses me is the
VGRF numbers involved should **not** hash to the same bucket, so it
should be impossible to find the original source from the intermediate
VGRF.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
Unless the change in liveout also causes livein to change, updates to
liveout cannot have any global effect. Changes to livein already flag
additional interation.
I had additional changes in this area that didn't pan out. While working
on those change, I was a little confused about this bit of code. It's
unnecessary, so it's better to delete it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
This script can:
* validate that genxml files do not duplicate imported items
* add imports to genxml files and optimize the file by dropping
duplicate items
* reverse the import operation by flattening genxml files
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20593>
For example, gen11.xml will import the HEVC_ARBITRATION_PRIORITY
struct from gen9.xml.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20593>
This function drops duplicated items from a genxml file when they are
equivalent to the same item imported from another genxml file.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20593>
Since the output can now depend on other imported xml files, we need
to add them all as dependencies to ensure that if any xml file is
changed, then all pack files are rebuilt.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20593>
With the previous commit we are now able to build Anv without
including i915_drm.h from common code.
This is important as avoids that i915 specific code is included in
common code by mistake.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25044>
Imported bos already imported need special handling in i915.
That handling was moved to
anv_i915_gem_import_bo_alloc_flags_to_bo_flags() as the number of
imported bos is low.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25044>
The bo_flags are i915 specific and should not be handled in common
code, so here adding it to backend as it is in the hot-path.
There still i915 bo_flags handling in anv_device_import_bo() that
will be handled in the next patch.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25044>
Sync xe_drm.h with commit e51e857ffad4 ("drm/xe/uapi: Remove useless max_page_size").
Most relevant changes are the removal of max_page_size from
drm_xe_query_mem_region and the typo fix in XE_QUERY_CONFIG_MIN_ALIGNMENT.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25162>
There's a bunch of noise over time in the anv-tgl-fails.txt from the set
of tests run changing and catching more of the failures. If we have a
nightly full run, we can keep things up to date more easily (as seen here,
where I finish filling out the modifiers crashes and drop a stale xfail).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25155>
GLSL doesn't use that type. SPIR-V used for a while but later started
relying on its own data structures and stopped using it.
See ca62e849d3 ("nir/spirv: Stop using glsl_type for function types")
If we were ever to add this one again, would be better to have a way to
grab a key for lookup that did not require allocations, right now that's
needed to inject return type as the first element in params array.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25160>
Up until now, the mesh pipeline assumed it would be always linked to the
fragment shader, and so the calculated MUE map would always be
available.
That is not the case for fast linked pipeline libraries, so the URB
setup needs to account for this. We do this by replicating what's done
for non-mesh pipelines, defining the URB based on the FS inputs, and
always assuming they will be laid out in order of varying number, except
that we also account for per-primitive attributes.
Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>
The compaction introduced in a252123363 ("intel/compiler/mesh: compactify MUE layout")
is not suitable for the case where graphics pipeline libraries are fast
linked, as the fragment shader won't receive the mue_map to know where
to locate its inputs.
For that case, keep doing what we did before and lay things down in the
order varyings are defined, which is also how it works for the non-mesh
case.
Fixes dEQP-VK.fragment_shading_rate.*fast_linked_library*.ms
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>