Commit graph

8839 commits

Author SHA1 Message Date
Pavel Ondračka
a93bc6afc4 nir: check for x - ffract(x) patterns when lowering f2i32
We already skip emitting ftrunc in nir_lower_int_to_float when there is
ffloor, fround or any other integer-making opcode preceding f2i32. However
if lower_ffloor is set for driver that doesn't support integers, the lowered
x - ffract(x) patterns would not be recognized and extra ftruct would be
emitted, doing unnecessary rounding.

This optimization only works if there is no non-trivial swizzling used for
the fadd, fneg and ffract involved, which seems to be 99% of the cases according
to my testing.

This is needed to enable nir ffloor lowering on r300 driver without regressions.

I'm not sure if this helps anybody else, the only hardware which sets
lower_ffloor and converts ints to floats (and can't do trunc) are some old
etnaviv cards, so maybe it will help there a bit.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
2023-01-05 12:01:32 +00:00
Qiang Yu
cf2ea3fce9 nir/xfb: save high_16bits output info
It is combined with slot location to identify a varying
when using VARYING_SLOT_VARx_16BIT.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20157>
2023-01-05 01:12:06 +00:00
Ian Romanick
043508d8f8 glsl: Remove bit_count lowering
As far as I can tell, every driver that supports GLSL 1.30 or
GL_EXT_gpu_shader4 (and therefore also enables support for
GL_MESA_shader_integer_functions) also sets the NIR lower_bit_count
flag.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>
2023-01-03 18:37:53 -08:00
Ian Romanick
abe5acf7fd glsl: Remove bitfield_reverse lowering
As far as I can tell, every driver that supports GLSL 1.30 or
GL_EXT_gpu_shader4 (and therefore also enables support for
GL_MESA_shader_integer_functions) also sets the NIR
lower_bitfield_reverse flag.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>
2023-01-03 18:37:53 -08:00
Ian Romanick
f5722c4973 glsl: Remove bitfield_extract and bitfield_insert lowering
As far as I can tell, every driver that supports GLSL 1.30 or
GL_EXT_gpu_shader4 (and therefore also enables support for
GL_MESA_shader_integer_functions) also sets some subset of the various
NIR lower_bitfield_extract and lower_bitfield_insert flags.

v2: Declaration of 'result' still needs to be added to the IR. Noticed
by marge.

v3: Fix 'git rebase --autosquash' putting the v2 fix in the wrong
place. I've never seen that happen before. :(

Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>
2023-01-03 18:37:53 -08:00
Ian Romanick
db241fbd70 nir: Don't allow conflicting bitfield lowering passes
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>
2023-01-03 18:37:53 -08:00
Pavel Ondračka
53d9b696e4 nir: basic tests for nir_opt_shrink_vectors
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20213>
2023-01-03 12:32:33 +01:00
Pavel Ondračka
3305c9602d nir: fix shrinking of load_const for large vectors
Specifically when shrinking load_const with number of components
> 5, if the final number of components is not allowed (for example 8->6)
it would report false for progress even if we actually did some
reshuffling and also it would skip on the rewrite of the readers.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20213>
2023-01-03 12:32:33 +01:00
Pavel Ondračka
cb7f201288 nir: remove duplicate alu channels in nir_opt_shrink_vectors
This will clean code like:
   vec3 32 ssa_8 = frcp ssa_7.www
   vec3 32 ssa_9 = fmul ssa_7.xyz, ssa_8
into
   vec1 32 ssa_8 = frcp ssa_7.w
   vec3 32 ssa_9 = fmul ssa_7.xyz, ssa_8.xxx

This helps r300 driver because we can only do single channel for math
ops at a time, so the first version would result in three frcp
instructions. The nir_opt_shrink_vectors comments even claim the pass
should be doing this, however it actually does it only for nir_op_vecx
instructions, so extend this for generic alu instructions.

RV530 shader-db:
total instructions in shared programs: 135032 -> 133707 (-0.98%)
instructions in affected programs: 46121 -> 44796 (-2.87%)
helped: 452
HURT: 26
total temps in shared programs: 17051 -> 17033 (-0.11%)
temps in affected programs: 1509 -> 1491 (-1.19%)
helped: 91
HURT: 30

12.02->12.08 (+0.5%) fps gain in Unigine Sanctuary (n=5) with RV530

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7051
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reiewed-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20213>
2023-01-03 12:32:33 +01:00
Christian Gmeiner
9e56f69edf isaspec: encode: handle special fieldname properties
Without this change a fieldname like '{DST::align=12}' was not used for
encoding. Change the regex to include such fieldnames and remove the fieldname
property in a later step.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20462>
2022-12-31 13:43:15 +00:00
Danylo Piliaiev
1c9ee30838 nir/fold_16bit_tex_image: Add type granularity for dst folding
Some HW may be able to fold only some of dst types, e.g.
for Adreno folding i32 -> i16 could cause a different result since
folded variant clamps the result instead of masking it.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20396>
2022-12-23 15:48:18 +01:00
Lionel Landwerlin
3af08b9c30 nir/divergence: handle shader_record_ptr intrinsic
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes 6b8fd65e84 ("spirv: Implement the new ray-tracing storage classes")

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20413>
2022-12-23 09:22:13 +00:00
Danylo Piliaiev
8482ad0110 nir/nir_lower_is_helper_invocation: Lower helper invocation if required
nir_lower_is_helper_invocation lowers intrinsic_is_helper_invocation
and uses load_helper_invocation (which is lowered by nir_lower_system_values).
While nir_lower_system_values may lower SYSTEM_VALUE_HELPER_INVOCATION
into intrinsic_is_helper_invocation.

So they depend on each other. Break the dependency by making
nir_lower_is_helper_invocation aware of lower_helper_invocation option
and emitting lowered load_helper_invocation when required.

Happens with SPIR-V 1.6 for which gl_HelperInvocation is translated into
"BuiltIn HelperInvocation" + "Volatile", which nir_lower_system_values
translates into is_helper_invocation.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19677>
2022-12-20 11:06:52 +00:00
Qiang Yu
e85c5d8779 nir/divergence_analysis: add missing intrinsics
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Singed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
2022-12-19 09:22:24 +08:00
Qiang Yu
194add2c23 nir: lower image add lower_to_fragment_mask_load_amd option
Like lower_to_fragment_fetch_amd option in lower tex,
this is for radeonsi to lower MS image ops.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
2022-12-19 09:22:16 +08:00
Qiang Yu
1461b5f61b nir: add image fragment mask load intrinsic
Like nir_texop_fragment_mask_fetch_amd, this is used to load multi
sample image fmask data for AMD GPU.

We will lower multi sample image load and samples_identical intrinsics
to use it latter for radeonsi. RADV does not need this because it
always expand fmask images before dispatch compute shader.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
2022-12-19 09:22:11 +08:00
Alyssa Rosenzweig
71e2028ce3 nir: Add store_zs_agx intrinsic
Will be used for frag depth/stencil export with multisampling.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
2022-12-17 18:10:28 +00:00
Yonggang Luo
36ba2e31f6 glsl: fixes -Werror,-Wunused-but-set-variable for clang-15 in glcpp-parse.y and glsl_parser.yy
error messages:
src/compiler/glsl/glcpp/glcpp-parse.c:1691:9: error: variable 'glcpp_parser_nerrs' set but not used [-Werror,-Wunused-but-set-variable]
    int yynerrs = 0;
        ^

src/compiler/glsl/glsl_parser.cpp:2370:9: error: variable '_mesa_glsl_nerrs' set but not used [-Werror,-Wunused-but-set-variable]
    int yynerrs = 0;
        ^

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19875>
2022-12-16 19:02:17 +00:00
Yonggang Luo
113def3bbd glsl: Fixes indent issue after replace tab with 3 space by tools in glcpp-parse.y
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19875>
2022-12-16 19:02:17 +00:00
Yonggang Luo
3261a54c79 glsl: replace tab with 3 space in glcpp-parse.y
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19875>
2022-12-16 19:02:17 +00:00
Yonggang Luo
c5a4520b3c glsl: Fixes ident issue in glsl_parser.yy and update editorconfig for it
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19875>
2022-12-16 19:02:17 +00:00
Karol Herbst
6d6c6caff1 nir_lower_io_to_scalar: handle load/store_global
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20106>
2022-12-16 08:02:32 +00:00
Karol Herbst
3cd641bebd nir_lower_io_to_scalar: make use of nir_get_io_offset_src
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20106>
2022-12-16 08:02:32 +00:00
Kenneth Graunke
0521027182 nir: Allow more than just ALU instructions in 'weak' GVN
This removes the ALU-only restriction on the "weak" GVN introduced by
the previous commit.  This makes it slightly more aggressive, allowing
it to coalesce things like UBO loads (still within sister then/else
blocks).  This also can have surprisingly large cascading effects.

I was concerned that this might increase register pressure, but
shader-db and fossil-db show effectively no change in spills/fills,
so it seems to be fine.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19823>
2022-12-14 20:56:55 +00:00
Kenneth Graunke
d5d03a7273 nir: Perform 'weak' global value numbering in all GCM passes
Full global value numbering (GVN) can be pretty aggressive, moving
values far away from their original locations, even out of loops,
and can extend their live ranges a lot.  So we've left it disabled.

This patch introduces a weaker form of GVN: we only allow coalescing
identical values when they appear on either side of the same if/else
construct.  For now, we also only allow ALU instructions.

This allows nir_opt_gcm to clean up identical instructions appearing
on both sides of if/then/else control flow.  But it avoids aggressively
combining every other occurrence of a value in the program.

This can still have surprisingly large cascading effects, as simple
constructs are cleaned up, leading to more opportunities to do the
same clean up, up a chain of nested ifs.  It also enables greater use
of the select peephole as ifs are cleaned up.

shader-db and fossil-db results show a reduction in spills/fills on
Icelake, so it doesn't seem to be hurting register pressure.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19823>
2022-12-14 20:56:55 +00:00
Samuel Pitoiset
877c10efd1 spirv: add support for AMD_shader_early_and_late_fragment_tests
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19738>
2022-12-14 08:16:27 +00:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
8b37046765 nir/builder: Handle i2b conversions specially in nir_type_convert
The shaders affected here are ones that were previously affected when
i2b was unconditionally lowered in opt_algebraic. There are a few places
where some transformations happen in a different order, so some
algebraic patterns are missed.

All Broadwell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914369 -> 19914566 (<.01%)
instructions in affected programs: 92375 -> 92572 (0.21%)
helped: 0 / HURT: 90

total cycles in shared programs: 853851470 -> 853867215 (<.01%)
cycles in affected programs: 12400663 -> 12416408 (0.13%)
helped: 28 / HURT: 69

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16710721 -> 16710700 (<.01%)
instructions in affected programs: 108010 -> 107989 (-0.02%)
helped: 57 / HURT: 103

total cycles in shared programs: 884299412 -> 884306546 (<.01%)
cycles in affected programs: 12986423 -> 12993557 (0.05%)
helped: 87 / HURT: 102

total spills in shared programs: 14937 -> 14925 (-0.08%)
spills in affected programs: 12 -> 0
helped: 9 / HURT: 0

total fills in shared programs: 17569 -> 17557 (-0.07%)
fills in affected programs: 12 -> 0
helped: 9 / HURT: 0

Sandy Bridge
total instructions in shared programs: 13902341 -> 13902347 (<.01%)
instructions in affected programs: 7311 -> 7317 (0.08%)
helped: 3 / HURT: 8

total cycles in shared programs: 741795500 -> 741792266 (<.01%)
cycles in affected programs: 273308 -> 270074 (-1.18%)
helped: 9 / HURT: 2

No shader-db changes on any other Intel platform.

No fossil-db changes on any Intel platform.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
58164794f4 spirv: Use nir_type_convert instead of nir_type_conversion_op
In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

v2: Use the actual bit size of the source to determine the conversion
op.  With mediump, the "planned" bit size and the actual bit size might
be different.  Fixes many, many Vulkan CTS assertion failures on any
platform that sets mediump_16bit_alu (e.g., Freedreno).

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
ded3572947 nir: Use nir_type_convert instead of nir_type_conversion_op
In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
1197030727 glsl: Use nir_type_convert instead of nir_type_conversion_op
In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

In gl_nir_lower_packed_varyings, all of the type conversions are between
int32 and uint32 types.  In NIR, those are just moves, so elide them.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
9f86d18b2d nir/builder: Add rounding mode parameter to nir_type_convert
Later changes will use this.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
43da822312 glsl_to_nir: Fix NIR bit-size of ir_triop_bitfield_extract and ir_quadop_bitfield_insert
Previously these would return result->bit_size of 32 even though the
type might have been int16_t or uint16_t.  This prevents many assertion
failures in "glsl: Use nir_type_convert instead of
nir_type_conversion_op" on zink.

Fixes: 5e922fbc16 ("glsl_to_nir: fix bitfield_extract with 16-bit operands")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
9342c14eeb nir/builder: Emit x != 0 for nir_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variation that include both ine
and i2b), just emit a != 0 instead of i2b(a).

I think that the changes to the unit tests weaken them slightly, but
perhaps that's okay?

No shader-db changes on any Intel platform.  The GLSL paths use other
means to generate i2b operations, but the SPIR-V paths use nir_i2b.
Presumably since 4676b3d3dd (nir: Use nir_test_mask instead of
i2b(iand)), no fossil-db changes either.

v2: Use nir_ine_imm.  Suggested by Jesse.

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
7a5e9df39d nir: Use nir_i2b wrapper everywhere instead of using nir_i2b1 directly
No shader-db or fossil-db changes on any Intel platform.

v2: Add missed i2b1 in ir3_nir_opt_preamble.c.

v3: Add missed i2b1 in ac_nir_lower_ngg.c.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
b60b2f2add nir/algebraic: Optimize some b2i involved in masking operations
v2: Remove the ineg from the b2i in the ior pattern.  Suggested by
Jason.

All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914441 -> 19914369 (<.01%)
instructions in affected programs: 63507 -> 63435 (-0.11%)
helped: 24 / HURT: 0

total cycles in shared programs: 853869766 -> 853851470 (<.01%)
cycles in affected programs: 10551542 -> 10533246 (-0.17%)
helped: 24 / HURT: 0

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141163061 -> 141092683 (-0.0%)
Instructions helped: 14103
Instructions hurt: 55

Cycles in all programs: 9132376195 -> 9133183045 (+0.0%)
Cycles helped: 13775
Cycles hurt: 380

Spills in all programs: 18286 -> 18284 (-0.0%)
Spills helped: 1

Fills in all programs: 30647 -> 30643 (-0.0%)
Fills helped: 1

Gained: 133
Lost: 130

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
ba0b248ac2 nir/algebraic: Eliminate unary op on src of integer comparison w/ zero
This helps because it enables cmod propagation to do more.

The removed patterns involving b2i will be handled by other existing
patterns after the unary operations are removed.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914458 -> 19914441 (<.01%)
instructions in affected programs: 5456 -> 5439 (-0.31%)
helped: 17 / HURT: 0

total cycles in shared programs: 855302118 -> 853869766 (-0.17%)
cycles in affected programs: 327354347 -> 325921995 (-0.44%)
helped: 291 / HURT: 81

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141205979 -> 141205961 (-0.0%)
Instructions helped: 4
Instructions hurt: 3

SENDs in all programs: 7466919 -> 7466913 (-0.0%)
SENDs helped: 1

Cycles in all programs: 9133387327 -> 9133384475 (-0.0%)
Cycles helped: 3
Cycles hurt: 12

In the shader that was helped for sends, it appears that a NIR pass that
moves code out of loops was able to move 3 send operations outside a
loop after this change.  I did not investigate further.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
ee15d89322 nir/algebraic: Simplify min and max of b2i
This prevents ~400 shader-db regresssions and a handful of fossil-db
regressions after i2b is always lowered.

All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown)
total cycles in shared programs: 855301494 -> 855302118 (<.01%)
cycles in affected programs: 52787 -> 53411 (1.18%)
helped: 4 / HURT: 5

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141206055 -> 141205979 (-0.0%)
Instructions helped: 14

Cycles in all programs: 9133376616 -> 9133387327 (+0.0%)
Cycles helped: 13
Cycles hurt: 3

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
19222867e4 nir/algebraic: Reassociate some iand to eliminate an operation
No shader-db changes on any Intel platform.

All of the helped shaders were presumably regressed by 4676b3d3dd (nir:
Use nir_test_mask instead of i2b(iand)).

v2: Add some comments explaining why specific replacements are used.  In
the umin pattern, only markup the first usage of 'b' in the source
pattern.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141384970 -> 141200966 (-0.1%)
Instructions helped: 45842

Cycles in all programs: 9133648977 -> 9133282672 (-0.0%)
Cycles helped: 26812
Cycles hurt: 6025

Gained: 23
Lost: 135

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
d48ce1f47d nir/algebraic: Remove redundant i2b(b2i(x)) patterns
A loop below already adds all the permutations... including the 1-bit
version that isn't included in this group.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
14a9bb04e4 nir/algebraic: Remove redundant i2b(-x) pattern
The exact same pattern appears later (around line 1323).

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Ian Romanick
8d90b13954 nir/algebraic: Catch some kinds of copy-and-paste bugs in algebraic patterns
A later commit adds a pattern

   (('umin', ('iand', a, '#b(is_pos_power_of_two)'),
             ('iand', c, '#b(is_pos_power_of_two)')),
    ('iand', ('iand', a, b), ('iand', c, b))),

When I originally made that pattern, I copied and pasted the search to
the replacement as

  (('umin', ('iand', a, '#b(is_pos_power_of_two)'),
            ('iand', c, '#b(is_pos_power_of_two)')),
   ('iand', ('iand', a, '#b(is_pos_power_of_two)'),
            ('iand', c, '#b(is_pos_power_of_two)'))),

The caused the variables in the replacement to be marked is_constant,
and that resulted in an assertion failure deep inside nir_search.

    src/compiler/nir/nir_search.c:530: construct_value: Assertion `!var->is_constant' failed.

These extra validation rules catch this kind of error at compile time
rather than at run time.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:20 +00:00
Marek Olšák
a3aea98a2a nir: validate that store_buffer_amd doesn't use a non-trivial writemask
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Marek Olšák
150c2cec63 nir: add ACCESS_USES_FORMAT_AMD for typed buffer opcodes
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Marek Olšák
716ac4a55d nir: replace IS_SWIZZLED flag with ACCESS_IS_SWIZZLED_AMD
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Marek Olšák
7998c3bdd3 nir: remove redundant SLC_AMD in favor of ACCESS_STREAM_CACHE_POLICY
ACCESS_STREAM_CACHE_POLICY was added to map to SLC for AMD.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Marek Olšák
c0d69b40bc nir: add nir_texop_sampler_descriptor_amd
We'll use it to query the min/mag filter in the shader.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Qiang Yu
9a6416b374 nir,ac/llvm,radv: add stream id index to nir_load_ring_gsvs_amd
For used by legacy GS to store output to different ring according
to stream id.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20158>
2022-12-13 11:43:45 +08:00
Qiang Yu
796a150196 nir: add nir_load_ring_gs2vs_offset_amd
Used by legacy GS output lowering.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20158>
2022-12-13 11:42:33 +08:00
Qiang Yu
fd240f759f nir,radv,radeonsi: add nir_atomic_add_gs_invocation_count_amd
For shader query emulation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20156>
2022-12-13 01:26:42 +00:00