Ian Romanick
c63ea755fe
intel/fs: Use nir_opt_uniform_subgroup
...
shader-db:
All Skylake and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20300435 -> 20300384 (<.01%)
instructions in affected programs: 303 -> 252 (-16.83%)
helped: 2 / HURT: 0
total cycles in shared programs: 842810326 -> 842809750 (<.01%)
cycles in affected programs: 8374 -> 7798 (-6.88%)
helped: 2 / HURT: 0
fossil-db:
All Intel platforms (note below) had similar results. (Ice Lake shown)
Instrs: 165559735 -> 165551893 (-0.00%)
Cycles: 15133083961 -> 15132539132 (-0.00%); split: -0.00%, +0.00%
Spill count: 45262 -> 45258 (-0.01%)
Fill count: 74293 -> 74286 (-0.01%)
Totals from 854 (0.13% of 656120) affected shaders:
Instrs: 3461998 -> 3454156 (-0.23%)
Cycles: 154252729 -> 153707900 (-0.35%); split: -0.36%, +0.01%
Spill count: 2655 -> 2651 (-0.15%)
Fill count: 3881 -> 3874 (-0.18%)
DG2 did not see changes in spills or fills.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044 >
2024-02-27 08:38:45 -08:00
Ian Romanick
b22fff90d5
intel/fs: Enable nir_opt_uniform_atomics in all shader stages
...
The problem seems to have been related to
nir_intrinsic_load_global_block_intel being marked as non-divergent.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044 >
2024-02-27 08:37:05 -08:00
Sagar Ghuge
269d2c4a3f
intel/compiler: Enable packing of offset with LOD or Bias
...
Move intel_nir_lower_texture just before nir_lower_tex since we need to
operate on the offset and those are getting lowerd.
v2: (Ian)
- Rename variable name to intel_tex_options
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447 >
2024-02-27 00:22:46 +00:00
Caio Oliveira
d8f9a05f32
intel/compiler: Rename the passes and files related to intel_nir.h
...
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27644 >
2024-02-16 22:35:05 +00:00
Caio Oliveira
dc76cfc781
intel/compiler: Collect NIR-only passes in intel_nir.h
...
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27644 >
2024-02-16 22:35:05 +00:00
Caio Oliveira
c5b80de583
intel/compiler: Rename brw_vue_map to intel_vue_map
...
And move to the intel_shader_enums.h file.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475 >
2024-02-14 22:31:23 -08:00
Lionel Landwerlin
2437556d83
intel/fs: rerun divergence prior to lowering non-uniform interpolate at sample
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 74a40cc4b6 ("intel/fs: move lower of non-uniform at_sample barycentric to NIR")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797 >
2024-02-13 00:06:44 +00:00
Sagar Ghuge
98b62434bd
intel/compiler: Lower texture operation to combine LOD and AI
...
We have to push the lowering of texture operations a bit further in
pipeline since nir_lower_tex gets invoked twice and if there is no LOD
source present, nir_lower_tex adds that as a source. Once that's all
done we can easily combine the LOD and array index into a single 32-bit
value.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458 >
2024-02-12 21:25:48 +00:00
Sagar Ghuge
15129c7634
intel/compiler: Use nir_tex_src_backend1 to pack LOD and array index
...
Since this lowering is totally Intel specific, we don't have to
introduce the new texture source. We can use the nir_tex_src_backend1
source to pack LOD/LOD Bias and array index into 32 bit single value.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458 >
2024-02-12 21:25:48 +00:00
Ian Romanick
84de7a88d3
intel/compiler/xe2: Emit texture instructions w/ combined LOD and array index
...
The extra assertions are just there to help validate
pack_lod_and_array_index (in nir_lower_tex.c).
v2: Split got_lod_or_bias into two variables. This simplifies some
changes that Sagar is working on. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305 >
2024-02-02 02:39:10 +00:00
Caio Oliveira
4af079960d
intel/compiler: Enable lower_rotate_to_shuffle in subgroup lowering
...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27272 >
2024-01-25 19:07:42 +00:00
Daniel Schürmann
a3ed36da1a
treewide: replace calls to nir_opt_trivial_continues() with nir_opt_loop()
...
Totals from 850 (1.11% of 76636) affected shaders: (RADV, GFX11)
MaxWaves: 18134 -> 18130 (-0.02%)
Instrs: 3011298 -> 3008585 (-0.09%); split: -0.17%, +0.08%
CodeSize: 15836804 -> 15841972 (+0.03%); split: -0.09%, +0.12%
VGPRs: 63580 -> 63604 (+0.04%)
SpillSGPRs: 966 -> 1148 (+18.84%); split: -0.83%, +19.67%
Latency: 36102291 -> 30186144 (-16.39%); split: -16.41%, +0.02%
InvThroughput: 9058100 -> 7011821 (-22.59%); split: -22.61%, +0.02%
VClause: 65369 -> 65364 (-0.01%); split: -0.03%, +0.02%
SClause: 100309 -> 100305 (-0.00%); split: -0.04%, +0.04%
Copies: 335658 -> 336472 (+0.24%); split: -0.70%, +0.94%
Branches: 110806 -> 108945 (-1.68%); split: -1.94%, +0.26%
PreSGPRs: 73476 -> 73934 (+0.62%); split: -0.25%, +0.87%
PreVGPRs: 58809 -> 58840 (+0.05%); split: -0.01%, +0.06%
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24940 >
2024-01-03 20:48:04 +00:00
Dave Airlie
f76f4be301
intel/compiler: move gen5 final pass to actually be final pass
...
This got broken by the register conversion, this pass needs to be
after all the others.
Fixes: ce75c3c3fe ("intel: Switch to intrinsic-based registers")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26731 >
2023-12-18 07:24:37 +00:00
Lionel Landwerlin
6dbb5f1e07
intel/fs: rerun divergence analysis prior to convert_from_ssa
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9964
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26235 >
2023-11-17 06:40:49 +00:00
Rhys Perry
f695a9fed2
intel/compiler: use nir_lower_fp16_casts
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25566 >
2023-11-16 11:02:31 +00:00
Caio Oliveira
d2125dac85
intel/compiler: Take more precise params in brw_nir_optimize()
...
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986 >
2023-11-08 18:10:31 +00:00
Caio Oliveira
c4be90b4ba
intel/compiler: Remove unused parameter from brw_nir_adjust_payload()
...
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986 >
2023-11-08 18:10:31 +00:00
Iván Briano
54498937c5
intel/compiler: round f2f16 correctly for RTNE case
...
v2: bcsel -> b2i32 (Ian)
Fixes upcoming Vulkan CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_frag
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage_frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281 >
2023-10-09 23:37:52 +00:00
Connor Abbott
4282386311
nir/spirv: Add inverse_ballot intrinsic
...
This is actually a no-op on AMD, so we really don't want to lower it to
something more complicated. There may be a more efficient way to do
this on Intel too. In addition, in the future we'll want to use this for
lowering boolean reduce operations, where the inverse ballot will
operate on the backend's "natural" ballot type as indicated by
options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In
total, there are now three possible lowerings we may have to perform:
- inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot
with natural source type, when the backend supports inverse_ballot
natively.
- inverse_ballot with source type of uvec4 from SPIR-V to arithmetic,
when the backend doesn't support inverse_ballot.
- inverse_ballot with natural source type from reduce operation, when
the backend doesn't support inverse_ballot.
Previously we just did the second lowering unconditionally in vtn, but
it's just a combination of the first and third. We add support here for
the first and third lowerings in nir_lower_subgroups, instead of simply
moving the second lowering, to avoid unnecessary churn.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123 >
2023-09-20 14:41:18 +00:00
Pavel Ondračka
1c72c71bdf
nir/move_vec_src_uses_to_dest: allow to skip reuse of constant sources
...
And enable this for r300 and intel-vec4
crocus HSW (mostly helps few doplhin ubershaders):
total instructions in shared programs: 1576736 -> 1576589 (<.01%)
instructions in affected programs: 38235 -> 38088 (-0.38%)
helped: 12
HURT: 0
total cycles in shared programs: 111025838 -> 110944796 (-0.07%)
cycles in affected programs: 5646582 -> 5565540 (-1.44%)
helped: 15
HURT: 6
total spills in shared programs: 447 -> 432 (-3.36%)
spills in affected programs: 186 -> 171 (-8.06%)
helped: 12
HURT: 0
total fills in shared programs: 792 -> 774 (-2.27%)
fills in affected programs: 291 -> 273 (-6.19%)
helped: 12
HURT: 0
r300 RV530:
total instructions in shared programs: 96655 -> 96304 (-0.36%)
instructions in affected programs: 15020 -> 14669 (-2.34%)
helped: 79
HURT: 18
total temps in shared programs: 13027 -> 12952 (-0.58%)
temps in affected programs: 677 -> 602 (-11.08%)
helped: 41
HURT: 9
total cycles in shared programs: 147745 -> 147314 (-0.29%)
cycles in affected programs: 21831 -> 21400 (-1.97%)
helped: 84
HURT: 19
r300 RV370:
total instructions in shared programs: 63678 -> 63669 (-0.01%)
instructions in affected programs: 931 -> 922 (-0.97%)
helped: 12
HURT: 6
total temps in shared programs: 10028 -> 10013 (-0.15%)
temps in affected programs: 339 -> 324 (-4.42%)
helped: 33
HURT: 10
total cycles in shared programs: 101118 -> 101087 (-0.03%)
cycles in affected programs: 2659 -> 2628 (-1.17%)
helped: 22
HURT: 6
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24932 >
2023-09-19 18:05:37 +02:00
Alyssa Rosenzweig
d1eb17e92e
treewide: Drop nir_ssa_for_src users
...
Via Coccinelle patch:
@@
expression b, s, n;
@@
-nir_ssa_for_src(b, *s, n)
+s->ssa
@@
expression b, s, n;
@@
-nir_ssa_for_src(b, s, n)
+s.ssa
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25247 >
2023-09-18 10:25:17 -04:00
Ian Romanick
5eddf60e56
intel/compiler: Combine control barriers with identical memory semantics
...
This prevents the second barrier generating a spurious, identical fence
message as the first barrier.
fossil-db stats on Alchemist:
Totals:
Instrs: 196513342 -> 196512777 (-0.00%); split: -0.00%, +0.00%
Cycles: 14271426028 -> 14271404569 (-0.00%); split: -0.00%, +0.00%
Send messages: 8021892 -> 8021770 (-0.00%)
Totals from 46 (0.01% of 653252) affected shaders:
Instrs: 76761 -> 76196 (-0.74%); split: -0.75%, +0.01%
Cycles: 2027946 -> 2006487 (-1.06%); split: -1.45%, +0.39%
Send messages: 7589 -> 7467 (-1.61%)
Nothing in shader-db was affected.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842 >
2023-09-09 04:41:25 +00:00
Lionel Landwerlin
10e75aae1b
intel/nir: rerun lower_tex if it lowers something
...
nir_lower_tex can lower tg4 coords into tg4 offset which on DG2+ we
also need to lower into constant offsets.
Unfortunately the nir_lower_tex pass is not able to lower the
instructions it itself generates, so the easy fix for when
nir_lower_tex lowers tg4 coords into tg4 offsets is to rerun the pass.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9735
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25015 >
2023-09-05 13:35:51 +00:00
Lionel Landwerlin
74a40cc4b6
intel/fs: move lower of non-uniform at_sample barycentric to NIR
...
We use a non-uniform lowering loop in the backend which we can do
better in NIR because we can also use divergence analysis there.
This change also limits VGRF usage to a single VGRF to hold the sample
ID in the backend.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716 >
2023-08-29 23:19:13 +00:00
Alyssa Rosenzweig
cda1961835
treewide: Also handle struct nir_builder form
...
Via Coccinelle patch:
@def@
typedef bool;
typedef nir_builder;
typedef nir_instr;
typedef nir_def;
identifier fn, instr, intr, x, builder, data;
@@
static fn(struct nir_builder* builder,
-nir_instr *instr,
+nir_intrinsic_instr *intr,
...)
{
(
- if (instr->type != nir_instr_type_intrinsic)
- return false;
- nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
|
- nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
- if (instr->type != nir_instr_type_intrinsic)
- return false;
)
<...
(
-instr->x
+intr->instr.x
|
-instr
+&intr->instr
)
...>
}
@pass depends on def@
identifier def.fn;
expression shader, progress;
@@
(
-nir_shader_instructions_pass(shader, fn,
+nir_shader_intrinsics_pass(shader, fn,
...)
|
-NIR_PASS_V(shader, nir_shader_instructions_pass, fn,
+NIR_PASS_V(shader, nir_shader_intrinsics_pass, fn,
...)
|
-NIR_PASS(progress, shader, nir_shader_instructions_pass, fn,
+NIR_PASS(progress, shader, nir_shader_intrinsics_pass, fn,
...)
)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24852 >
2023-08-24 15:48:02 +00:00
Alyssa Rosenzweig
465b138f01
treewide: Use nir_shader_intrinsic_pass sometimes
...
This converts a lot of trivial passes. Nice boilerplate deletion. Via Coccinelle
patch (with a small manual fix-up for panfrost where coccinelle got confused by
genxml + ninja clang-format squashed in, and for Zink because my semantic patch
was slightly buggy).
@def@
typedef bool;
typedef nir_builder;
typedef nir_instr;
typedef nir_def;
identifier fn, instr, intr, x, builder, data;
@@
static fn(nir_builder* builder,
-nir_instr *instr,
+nir_intrinsic_instr *intr,
...)
{
(
- if (instr->type != nir_instr_type_intrinsic)
- return false;
- nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
|
- nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
- if (instr->type != nir_instr_type_intrinsic)
- return false;
)
<...
(
-instr->x
+intr->instr.x
|
-instr
+&intr->instr
)
...>
}
@pass depends on def@
identifier def.fn;
expression shader, progress;
@@
(
-nir_shader_instructions_pass(shader, fn,
+nir_shader_intrinsics_pass(shader, fn,
...)
|
-NIR_PASS_V(shader, nir_shader_instructions_pass, fn,
+NIR_PASS_V(shader, nir_shader_intrinsics_pass, fn,
...)
|
-NIR_PASS(progress, shader, nir_shader_instructions_pass, fn,
+NIR_PASS(progress, shader, nir_shader_intrinsics_pass, fn,
...)
)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24852 >
2023-08-24 15:48:02 +00:00
Faith Ekstrand
b5d6b7c402
nir: Drop most uses if nir_instr_rewrite_src()
...
Generated by the following semantic patch:
@@
expression I, S, D;
@@
-nir_instr_rewrite_src(I, S, nir_src_for_ssa(D));
+nir_src_rewrite(S, D);
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24729 >
2023-08-18 01:00:15 +00:00
Faith Ekstrand
4695bebc79
nir: Drop nir_dest
...
Instead, we replace every use of it with nir_def. Most of this commit
was generated by sed:
sed -i -e 's/dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp
A few manual fixups were required in lima and the nir_legacy code.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674 >
2023-08-14 21:22:53 +00:00
Faith Ekstrand
6c1d32581a
nir: Drop nir_alu_dest
...
Instead, we replace it directly with nir_def. We could replace it with
nir_dest but the next commit gets rid of that so this avoids unnecessary
churn. Most of this commit was generated by sed:
sed -i -e 's/dest.dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp
There were a few manual fixups required in the nir_legacy.c and
nir_from_ssa.c as nir_legacy_reg and nir_parallel_copy_entry both have a
similar pattern.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674 >
2023-08-14 21:22:53 +00:00
Faith Ekstrand
ed9affa02f
nir: Drop most instances of nir_ssa_dest_init()
...
Generated using the following two semantic patches:
@@
expression I, J, NC, BS;
@@
-nir_ssa_dest_init(I, &J->dest, NC, BS);
+nir_def_init(I, &J->dest.ssa, NC, BS);
@@
expression I, J, NC, BS;
@@
-nir_ssa_dest_init(I, &J->dest.dest, NC, BS);
+nir_def_init(I, &J->dest.dest.ssa, NC, BS);
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24658 >
2023-08-13 17:12:52 +00:00
Alyssa Rosenzweig
09d31922de
nir: Drop "SSA" from NIR language
...
Everything is SSA now.
sed -e 's/nir_ssa_def/nir_def/g' \
-e 's/nir_ssa_undef/nir_undef/g' \
-e 's/nir_ssa_scalar/nir_scalar/g' \
-e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \
-e 's/nir_gather_ssa_types/nir_gather_types/g' \
-i $(git grep -l nir | grep -v relnotes)
git mv src/compiler/nir/nir_gather_ssa_types.c \
src/compiler/nir/nir_gather_types.c
ninja -C build/ clang-format
cd src/compiler/nir && find *.c *.h -type f -exec clang-format -i \{} \;
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585 >
2023-08-12 16:44:41 -04:00
Lionel Landwerlin
9934613c74
anv/hasvk: track robustness per pipeline stage
...
And split them into UBO and SSBO
v2 (Lionel):
- Get rid of robustness fields in anv_shader_bin
v3 (Lionel):
- Do not pass unused parameters around
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17545 >
2023-08-09 09:00:12 +03:00
Alyssa Rosenzweig
5fead24365
treewide: Drop is_ssa asserts
...
We only see SSA now.
Via Coccinelle patch:
@@
expression x;
@@
-assert(x.is_ssa);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24432 >
2023-08-03 22:40:28 +00:00
Alyssa Rosenzweig
17d66055ae
nir: Remove reg_intrinsics parameter to convert_from_ssa
...
All users must set it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24450 >
2023-08-02 10:26:45 -04:00
Lionel Landwerlin
fe81d40bff
intel/nir: add lower for sparse images & textures
...
We have to lower images into image load + sampler residency.
There is also a restriction on sampler access with a compare, lower
those as 2 sampler instructions to meet the restriction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882 >
2023-07-27 02:02:59 +03:00
Iván Briano
377c2a045f
intel/compiler: call brw_nir_adjust_payload from brw_postprocess_nir
...
Calling anything after nir_trivialize_registers() risks undoing some of
its work.
In this case, brw_nir_adjust_payload() will do a constant folding pass
if any payload adjusting happened, and that can turn a bunch of
@store_regs into basically noops.
Fixes dEQP-VK.subgroups.*task
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24325 >
2023-07-25 22:48:09 +00:00
Marcin Ślusarz
a252123363
intel/compiler/mesh: compactify MUE layout
...
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.
There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
float[N] requires N 1-dword slots, but
i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
if possible is beneficial
This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.
Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407 >
2023-07-24 07:55:29 +00:00
Alyssa Rosenzweig
1466014184
nir: Rename lower_locals_to_reg_intrinsics back
...
The short name is freed up.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24253 >
2023-07-21 11:25:49 +00:00
Faith Ekstrand
ce75c3c3fe
intel: Switch to intrinsic-based registers
...
Results on HSW (vec4 only):
total instructions in shared programs: 2978400 -> 2974135 (-0.14%)
instructions in affected programs: 77870 -> 73605 (-5.48%)
helped: 143
HURT: 48
helped stats (abs) min: 1 max: 100 x̄: 30.22 x̃: 9
helped stats (rel) min: 0.03% max: 30.49% x̄: 8.02% x̃: 6.39%
HURT stats (abs) min: 1 max: 4 x̄: 1.19 x̃: 1
HURT stats (rel) min: 0.08% max: 16.67% x̄: 3.71% x̃: 3.23%
95% mean confidence interval for instructions value: -26.69 -17.97
95% mean confidence interval for instructions %-change: -6.24% -3.90%
Instructions are helped.
total cycles in shared programs: 45345924 -> 44742666 (-1.33%)
cycles in affected programs: 29083466 -> 28480208 (-2.07%)
helped: 4785
HURT: 3879
helped stats (abs) min: 2 max: 8072 x̄: 276.00 x̃: 24
helped stats (rel) min: 0.02% max: 54.43% x̄: 7.78% x̃: 1.95%
HURT stats (abs) min: 2 max: 14736 x̄: 184.95 x̃: 20
HURT stats (rel) min: 0.02% max: 97.00% x̄: 7.69% x̃: 1.53%
95% mean confidence interval for cycles value: -83.49 -55.77
95% mean confidence interval for cycles %-change: -1.16% -0.55%
Cycles are helped.
total spills in shared programs: 1093 -> 539 (-50.69%)
spills in affected programs: 772 -> 218 (-71.76%)
helped: 74
HURT: 0
total fills in shared programs: 760 -> 757 (-0.39%)
fills in affected programs: 66 -> 63 (-4.55%)
helped: 3
HURT: 0
Results on TGL (all stages):
total instructions in shared programs: 21486982 -> 21488266 (<.01%)
instructions in affected programs: 2245938 -> 2247222 (0.06%)
helped: 1288
HURT: 1385
helped stats (abs) min: 1 max: 93 x̄: 4.05 x̃: 2
helped stats (rel) min: 0.02% max: 3.82% x̄: 0.61% x̃: 0.46%
HURT stats (abs) min: 1 max: 134 x̄: 4.69 x̃: 2
HURT stats (rel) min: <.01% max: 5.59% x̄: 0.65% x̃: 0.44%
95% mean confidence interval for instructions value: 0.13 0.83
95% mean confidence interval for instructions %-change: <.01% 0.08%
Instructions are HURT.
total cycles in shared programs: 809326677 -> 809475669 (0.02%)
cycles in affected programs: 447781659 -> 447930651 (0.03%)
helped: 1924
HURT: 1994
helped stats (abs) min: 1 max: 74567 x̄: 1217.49 x̃: 10
helped stats (rel) min: <.01% max: 38.44% x̄: 1.09% x̃: 0.17%
HURT stats (abs) min: 1 max: 76426 x̄: 1249.47 x̃: 8
HURT stats (rel) min: <.01% max: 137.11% x̄: 1.64% x̃: 0.17%
95% mean confidence interval for cycles value: -125.61 201.67
95% mean confidence interval for cycles %-change: 0.12% 0.48%
Inconclusive result (value mean confidence interval includes 0).
LOST: 4
GAINED: 4
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24104 >
2023-07-19 02:11:57 +00:00
Marcin Ślusarz
36ff6c0004
intel/compiler: remove NV_mesh_shader support
...
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24071 >
2023-07-14 08:27:14 +00:00
Faith Ekstrand
73e191924c
nir: Add a reg_intrinsics flag to nir_convert_from_ssa
...
It doesn't do anything yet. We leave that to the subsequent patches so we can
keep the tree-wide refactor as simple as possible.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089 >
2023-07-12 01:34:27 +00:00
Lionel Landwerlin
c26c0a36d3
intel/fs: disable coarse pixel shader with interpolater messages at sample
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9292
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23962 >
2023-07-06 12:48:52 +00:00
Marcin Ślusarz
1ac1d5d62e
anv,intel/compiler: enable shortcut in wg id to wg idx lowering on >= gfx12.5
...
This speeds up vk_meshlet_cadscene in "VK mesh ext" renderer by 1.4%
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334 >
2023-07-04 09:15:08 +00:00
Marcin Ślusarz
7ec1ef75d3
intel/compiler: pass num_workgroups from task to mesh shaders
...
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334 >
2023-07-04 09:15:08 +00:00
Yonggang Luo
68b8aa788d
intel/compiler: Switch to use nir_foreach_function_impl
...
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23920 >
2023-06-29 11:29:54 +00:00
Alyssa Rosenzweig
815efcdf7e
nir: Use nir_builder_create
...
perl -p0e 's/nir_builder ([^;]*);\s*nir_builder_init\(&\1, /nir_builder \1 = nir_builder_create(/g' -i $(git grep -l nir_builder_init)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23860 >
2023-06-27 18:13:02 +00:00
Alyssa Rosenzweig
6689c678fe
nir/lower_locals_to_regs: Add bool bitsize knob
...
GLSL booleans (and hence bool derefs) may be translated either as 1-bit or
32-bit NIR registers, depending whether the backend uses nir_lower_bool_to_int32
or not. Add a knob for this and choose the right type for different backends.
Fixes nir_validate failure on
dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_bvec3 run under
lavapipe. That test indexes into a bvec3 array, and gallivm first lowers bools
and then lowers derefs to registers, resulting in random 1-bit booleans mixed in
with 32-bit bools.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23804 >
2023-06-26 08:22:06 -04:00
Caio Oliveira
59cc77f0fa
compiler: Move from nir_scope to mesa_scope
...
Just moving the enum and performing renames, no behavior change.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23328 >
2023-06-19 23:29:26 +00:00
Ian Romanick
e419eefd34
intel/fs: Use nir_opt_reassociate_bfi
...
All Skylake and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19907072 -> 19907054 (<.01%)
instructions in affected programs: 8859 -> 8841 (-0.20%)
helped: 9 / HURT: 0
total cycles in shared programs: 855791238 -> 855779334 (<.01%)
cycles in affected programs: 3308294 -> 3296390 (-0.36%)
helped: 12 / HURT: 13
Broadwell
total instructions in shared programs: 17818231 -> 17817440 (<.01%)
instructions in affected programs: 9887 -> 9096 (-8.00%)
helped: 9 / HURT: 0
total cycles in shared programs: 902970035 -> 902941221 (<.01%)
cycles in affected programs: 2767243 -> 2738429 (-1.04%)
helped: 14 / HURT: 5
total spills in shared programs: 17784 -> 17718 (-0.37%)
spills in affected programs: 318 -> 252 (-20.75%)
helped: 1 / HURT: 0
total fills in shared programs: 25458 -> 24949 (-2.00%)
fills in affected programs: 1346 -> 837 (-37.82%)
helped: 1 / HURT: 0
Haswell
total instructions in shared programs: 16707799 -> 16707586 (<.01%)
instructions in affected programs: 24049 -> 23836 (-0.89%)
helped: 41 / HURT: 0
total cycles in shared programs: 882730648 -> 882723174 (<.01%)
cycles in affected programs: 5096737 -> 5089263 (-0.15%)
helped: 25 / HURT: 12
total spills in shared programs: 14937 -> 14909 (-0.19%)
spills in affected programs: 436 -> 408 (-6.42%)
helped: 4 / HURT: 0
total fills in shared programs: 17569 -> 17529 (-0.23%)
fills in affected programs: 444 -> 404 (-9.01%)
helped: 4 / HURT: 0
No shader-db changes on any older Intel platforms.
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 153118594 -> 153117340 (-0.00%); split: -0.00%, +0.00%
Cycles: 15011967556 -> 15011904351 (-0.00%); split: -0.00%, +0.00%
Fill count: 203692 -> 203684 (-0.00%)
Totals from 703 (0.11% of 662496) affected shaders:
Instrs: 192826 -> 191572 (-0.65%); split: -0.65%, +0.00%
Cycles: 29937640 -> 29874435 (-0.21%); split: -0.25%, +0.04%
Fill count: 4146 -> 4138 (-0.19%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968 >
2023-06-14 18:49:53 +00:00
Lionel Landwerlin
6b9f838d62
intel/fs: handle load_global_constant_uniform_block_intel
...
Again, load the data just once in GRF, share it across lanes.
Shader-db on dg2:
total instructions in shared programs: 23214555 -> 23215400 (<.01%)
instructions in affected programs: 199977 -> 200822 (0.42%)
helped: 3
HURT: 38
helped stats (abs) min: 5 max: 670 x̄: 283.67 x̃: 176
helped stats (rel) min: 1.34% max: 49.41% x̄: 22.15% x̃: 15.70%
HURT stats (abs) min: 1 max: 185 x̄: 44.63 x̃: 32
HURT stats (rel) min: 0.13% max: 42.86% x̄: 10.25% x̃: 9.30%
95% mean confidence interval for instructions value: -18.65 59.87
95% mean confidence interval for instructions %-change: 3.29% 12.47%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 5928 -> 5928 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 851137495 -> 851152449 (<.01%)
cycles in affected programs: 16406137 -> 16421091 (0.09%)
helped: 9
HURT: 32
helped stats (abs) min: 10 max: 13498 x̄: 6443.22 x̃: 5581
helped stats (rel) min: 0.11% max: 4.75% x̄: 1.45% x̃: 0.34%
HURT stats (abs) min: 3 max: 15056 x̄: 2279.47 x̃: 735
HURT stats (rel) min: 0.10% max: 23.71% x̄: 4.58% x̃: 4.65%
95% mean confidence interval for cycles value: -1315.40 2044.87
95% mean confidence interval for cycles %-change: 1.71% 4.80%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 11856 -> 11825 (-0.26%)
spills in affected programs: 2368 -> 2337 (-1.31%)
helped: 4
HURT: 0
total fills in shared programs: 16258 -> 16207 (-0.31%)
fills in affected programs: 2930 -> 2879 (-1.74%)
helped: 4
HURT: 0
total sends in shared programs: 1038194 -> 1038185 (<.01%)
sends in affected programs: 40 -> 31 (-22.50%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.25 x̃: 2
helped stats (rel) min: 10.00% max: 33.33% x̄: 21.46% x̃: 21.25%
95% mean confidence interval for sends value: -4.64 0.14
95% mean confidence interval for sends %-change: -40.41% -2.51%
Inconclusive result (value mean confidence interval includes 0).
LOST: 0
GAINED: 0
Some VK/DX titles result (on DG2 only), it's mostly additional
instruction counts except for the unity spaceship demo where a CS
shader gets additional SIMDness. The reason for additional
instructions is that since we're doing block loads, we need to find
the live channels in control flow to select a single lane value that
is valid.
aztec_ruins_high:
Totals from 3 (1.12% of 269) affected shaders:
Instrs: 17732 -> 17896 (+0.92%)
Cycles: 796518 -> 819302 (+2.86%)
cyberpunk_2077:
Totals from 17 (0.17% of 10301) affected shaders:
Instrs: 10848 -> 11658 (+7.47%)
Cycles: 248243 -> 259168 (+4.40%); split: -0.57%, +4.97%
fallout_4_dxvk_g2:
Totals from 2 (0.12% of 1638) affected shaders:
Instrs: 3157 -> 3368 (+6.68%)
Cycles: 487807 -> 490426 (+0.54%); split: -0.26%, +0.79%
Max live registers: 139 -> 141 (+1.44%)
red_dead_redemption2:
Totals from 68 (1.14% of 5970) affected shaders:
Instrs: 34871 -> 36486 (+4.63%)
Cycles: 551430 -> 565211 (+2.50%)
Send messages: 2074 -> 2072 (-0.10%)
Max live registers: 5078 -> 5077 (-0.02%)
total_war_warhammer2:
Totals from 5 (1.05% of 478) affected shaders:
Instrs: 6905 -> 6971 (+0.96%); split: -0.16%, +1.12%
Cycles: 97035 -> 97989 (+0.98%); split: -0.07%, +1.05%
unity spaceship demo (instruction count going up due to a CS shader
bump from SIMD8->16):
Totals from 53 (9.71% of 546) affected shaders:
Instrs: 223748 -> 233223 (+4.23%); split: -0.01%, +4.25%
Cycles: 23134697 -> 25207080 (+8.96%); split: -0.17%, +9.13%
Subgroup size: 480 -> 488 (+1.67%)
Spill count: 2156 -> 2242 (+3.99%); split: -0.19%, +4.17%
Fill count: 4617 -> 4845 (+4.94%); split: -0.09%, +5.02%
Max live registers: 5991 -> 6050 (+0.98%); split: -0.40%, +1.39%
Max dispatch width: 480 -> 488 (+1.67%)
witcher_3_dxvk_g2:
Totals from 27 (2.51% of 1074) affected shaders:
Instrs: 57067 -> 57677 (+1.07%); split: -0.03%, +1.10%
Cycles: 1397871 -> 1436704 (+2.78%); split: -0.35%, +3.13%
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477 >
2023-06-14 12:04:05 +00:00