Commit graph

3511 commits

Author SHA1 Message Date
Marcin Ślusarz
24fef8f33d intel/compiler: Use Task/Mesh InlineData for the first few push constants
Replace load_mesh_global_arg_addr_intel with a more general intrinsic
load_mesh_inline_data_intel, since inline data now hold both
a pointer descriptor information and the first few push constants.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>
2022-01-29 06:32:19 +00:00
Christian Gmeiner
e5f9cdac1f nir/nir_lower_tex_shadow: support tex_instr without deref src
Use texture_index if there is no deref src.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14308>
2022-01-28 18:40:53 +00:00
Christian Gmeiner
e67bca3fe7 nir: make lower_sample_tex_compare a common pass
This pass was originally written for d3d12, but is useful for hardware
that lacks sample compare support like some etnaviv GPU models.

Also rename the lowering pass and some surrounding code to
nir_lower_tex_shadow as suggested by Emma.

I'd like to use the pass that's already in tree.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14308>
2022-01-28 18:40:53 +00:00
Emma Anholt
61400f8a2d nir/lower_locals_to_regs: Do an ad-hoc copy propagate on our generated MOV.
I noticed the inefficiency in NIR-to-TGSI output while trying to debug a
failure handling some arrays in r600.  While this makes reading CTS
shaders easier, the effect in the real world is pretty limited.  From
softpipe shader-db:

total instructions in shared programs: 2929840 -> 2929836 (<.01%)
instructions in affected programs: 118 -> 114 (-3.39%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14321>
2022-01-25 06:01:13 +00:00
Bas Nieuwenhuizen
d1530a3f3b Revert "nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants"
This reverts commit a1af902531.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5423
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14532>
2022-01-21 16:58:11 +00:00
Rhys Perry
af51efe195 nir/builder: assume scalar alignment if not provided
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14455>
2022-01-21 13:45:33 +00:00
Rhys Perry
e9e1a44872 nir/builder: set write mask if not provided
Zero isn't really a valid write mask. If it's provided, use a full write
mask.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14455>
2022-01-21 13:45:33 +00:00
Rhys Perry
495debebad nir/algebraic: optimize expressions using fmulz/ffmaz
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
14b8227083 nir: add some missing nir_alu_type_get_base_type
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
f2fbba7920 nir/algebraic: optimize open-coded fmulz/ffmaz
This pattern will be found in future versions of D3D9 DXVK.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
312a284980 nir/algebraic: add ignore_exact() wrapper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
7f05ea3793 nir: add nir_op_fmulz and nir_op_ffmaz
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Emma Anholt
cac6f633b2 nir/opt_offsets: Use nir_ssa_scalar to chase offset additions.
For nir_to_tgsi, I want to be able to fold into the base from a vector
load_const, which the ad-hoc scalar chasing couldn't handle.

r300:
total instructions in shared programs: 1278731 -> 1256502 (-1.74%)
instructions in affected programs: 457909 -> 435680 (-4.85%)
total flowcontrol in shared programs: 8316 -> 8313 (-0.04%)
flowcontrol in affected programs: 5 -> 2 (-60.00%)
total temps in shared programs: 213687 -> 213774 (0.04%)
temps in affected programs: 13140 -> 13227 (0.66%)
total consts in shared programs: 952850 -> 949929 (-0.31%)
consts in affected programs: 386352 -> 383431 (-0.76%)

Fixes: #5781
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
2022-01-19 22:28:34 +00:00
Emma Anholt
645ca56425 nir/opt_offsets: Also apply the max offset to top-level constant folding.
nir_to_tgsi wants this for disabling folding into shared var accesses at
all.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
2022-01-19 22:28:34 +00:00
Emma Anholt
ec4b9909f0 nir/opt_offsets: Disable unsigned wrap checks on non-native-integers HW.
Since we don't have 32-bit ints, these checks for 32-bit unsigned wrapping
don't help and just reduce optimization opportunities (particularly for
DX9 addressing math).

Doesn't affect any current consumers.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
2022-01-19 22:28:34 +00:00
Emma Anholt
700d2fbd0a nir: Add a .base field to nir_load_ubo_vec4.
This lets nir-to-tgsi fold the constant offset of addressing calculations
into the CONST[] reference, which is important for D3D9-era compatibility:
HW of that age has limited uniform space, and if we do the addressing math
as math in the shader for dynamic indexing, the nir_load_consts end up
taking up uniforms we don't have available.

r300:
total instructions in shared programs: 1279699 -> 1279167 (-0.04%)
instructions in affected programs: 134796 -> 134264 (-0.39%)
total instructions in shared programs: 1279699 -> 1279167 (-0.04%)
instructions in affected programs: 134796 -> 134264 (-0.39%)
total temps in shared programs: 213912 -> 213736 (-0.08%)
temps in affected programs: 2166 -> 1990 (-8.13%)
total consts in shared programs: 953237 -> 952973 (-0.03%)
consts in affected programs: 45980 -> 45716 (-0.57%)

Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
2022-01-19 22:28:34 +00:00
Dave Airlie
ccbf700d6c nir: remove gl.h include from nir headers.
This saves a lot of pointless gl.h includes across the board,
it moves the one place that needs GLenum into a separate file
only used in those passes that require it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
2022-01-19 21:54:58 +00:00
Dave Airlie
1352e0ba0c mesa/*: add a shader primitive type to get away from GL types.
This creates an internal shader_prim enum, I've fixed up most
users to use it instead of GL types.

don't store the enum in shader_info as it changes size, and confuses
other things.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
2022-01-19 21:54:58 +00:00
Connor Abbott
9c9e8c3349 nir: Reorder ffma and fsub combining
It's relatively common to do something like "a * b - c", which on most
GPUs can be implemented in a single instruction. Before
opt_algebraic_late this will be something like
"fadd(fmul(a, b), fneg(c))", and we want to turn it info
"ffma(a, b, fneg(c))". But because the fsub pattern was first we
instead turned it into "fsub(fmul(a, b), c)". Fix this by reordering
them.

Selected shader-db results on freedreno:

total instructions in shared programs: 1561330 -> 1551619 (-0.62%)
instructions in affected programs: 780272 -> 770561 (-1.24%)
helped: 1941
HURT: 491
helped stats (abs) min: 1 max: 147 x̄: 7.98 x̃: 4
helped stats (rel) min: 0.07% max: 30.77% x̄: 4.36% x̃: 3.17%
HURT stats (abs)   min: 1 max: 307 x̄: 11.76 x̃: 5
HURT stats (rel)   min: 0.09% max: 18.71% x̄: 2.26% x̃: 1.38%
95% mean confidence interval for instructions value: -4.57 -3.41
95% mean confidence interval for instructions %-change: -3.21% -2.84%
Instructions are helped.

total nops in shared programs: 358926 -> 356263 (-0.74%)
nops in affected programs: 167116 -> 164453 (-1.59%)
helped: 1395
HURT: 859
helped stats (abs) min: 1 max: 108 x̄: 6.80 x̃: 3
helped stats (rel) min: 0.17% max: 100.00% x̄: 19.15% x̃: 10.57%
HURT stats (abs)   min: 1 max: 307 x̄: 7.95 x̃: 3
HURT stats (rel)   min: 0.00% max: 381.82% x̄: 20.04% x̃: 10.00%
95% mean confidence interval for nops value: -1.77 -0.59
95% mean confidence interval for nops %-change: -5.55% -2.87%
Nops are helped.

total non-nops in shared programs: 1202404 -> 1195356 (-0.59%)
non-nops in affected programs: 496682 -> 489634 (-1.42%)
helped: 1951
HURT: 265
helped stats (abs) min: 1 max: 39 x̄: 4.02 x̃: 3
helped stats (rel) min: 0.07% max: 15.38% x̄: 2.97% x̃: 1.96%
HURT stats (abs)   min: 1 max: 22 x̄: 2.97 x̃: 2
HURT stats (rel)   min: 0.05% max: 10.00% x̄: 1.14% x̃: 0.75%
95% mean confidence interval for non-nops value: -3.38 -2.99
95% mean confidence interval for non-nops %-change: -2.60% -2.36%
Non-nops are helped.

total systall in shared programs: 288317 -> 292975 (1.62%)
systall in affected programs: 87876 -> 92534 (5.30%)
helped: 388
HURT: 431
helped stats (abs) min: 1 max: 214 x̄: 14.39 x̃: 8
helped stats (rel) min: 0.25% max: 100.00% x̄: 22.12% x̃: 11.96%
HURT stats (abs)   min: 1 max: 232 x̄: 23.77 x̃: 7
HURT stats (rel)   min: 0.00% max: 1300.00% x̄: 51.71% x̃: 17.30%
95% mean confidence interval for systall value: 3.07 8.30
95% mean confidence interval for systall %-change: 9.49% 23.97%
Systall are HURT.

(The systall hurt is probably just due to having having fewer
instructions to hide latency with.)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14554>
2022-01-18 17:44:50 +00:00
Qiang Yu
2cee73f0f7 nir: fix nir_tex_instr hash not count is_sparse field
This fixes nir_opt_cse miss replace a non-sparse tex instruction
with a sparse tex instruction and fail the nir_validate_shader().

Fixes: 3a7972f72a ("nir,spirv: add sparse texture fetches")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14362>
2022-01-18 16:10:35 +08:00
Rhys Perry
d95a0b52e4 nir/unsigned_upper_bound: don't follow 64-bit f2u32()
Fixes Doom Eternal crash.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 72ac3f6026 ("nir: add nir_unsigned_upper_bound and nir_addition_might_overflow")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14555>
2022-01-17 10:59:21 +00:00
Emma Anholt
f6ffefba3e nir: Apply nir_opt_offsets to nir_intrinsic_load_uniform as well.
Doing this for ir3 required adding a struct for limits of how much base to
fold in (which NTT wants as well for its case of shared vars), otherwise
the later work to lower to the 1<<9 word limit would emit more
instructions.

The shader-db results are that sometimes the reduction in NIR instruction
count results in the fewer sampler prefetches due to the shader being
estimated to be shorter (dota2, nexuiz):

total instructions in shared programs: 8996651 -> 8996776 (<.01%)
total cat5 in shared programs: 86561 -> 86577 (0.02%)

Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>
2022-01-16 19:11:29 +00:00
Emma Anholt
b024102d7c freedreno/ir3: Use nir_opt_offset for removing constant adds for shared vars.
Saves some work in carchase and manhattan31:

instructions in affected programs: 2842 -> 2818 (-0.84%)
nops in affected programs: 1131 -> 1105 (-2.30%)
non-nops in affected programs: 1236 -> 1238 (0.16%)
mov in affected programs: 57 -> 61 (7.02%)
dwords in affected programs: 2144 -> 2150 (0.28%)
cat0 in affected programs: 1195 -> 1169 (-2.18%)
cat1 in affected programs: 151 -> 155 (2.65%)
cat2 in affected programs: 142 -> 140 (-1.41%)
sstall in affected programs: 190 -> 178 (-6.32%)
(ss) in affected programs: 63 -> 63 (0.00%)
systall in affected programs: 532 -> 511 (-3.95%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>
2022-01-16 19:11:29 +00:00
Daniel Schürmann
79a987ad2a nir/opt_if: also merge break statements with ones after the branch
This optimizations turns

     loop {
        ...
        if (cond1) {
           if (cond2) {
              do_work_1();
              break;
           } else {
              do_work_2();
           }
           do_work_3();
           break;
        } else {
           ...
        }
     }

 into:

     loop {
        ...
        if (cond1) {
           if (cond2) {
              do_work_1();
           } else {
              do_work_2();
              do_work_3();
           }
           break;
        } else {
           ...
        }
     }

As this optimizations moves code into the NIF statement,
it re-iterates on the branch legs in case of success.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7587>
2022-01-13 02:30:32 +00:00
Daniel Schürmann
dad609d152 nir/opt_if: merge two break statements from both branch legs
This optimization turns

     loop {
        ...
        if (cond) {
           do_work_1();
           break;
        } else {
           do_work_2();
           break;
        }
     }

 into:

     loop {
        ...
        if (cond) {
           do_work_1();
        } else {
           do_work_2();
        }
        break;
     }

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7587>
2022-01-13 02:30:32 +00:00
Daniel Schürmann
8a78706643 nir: refactor nir_opt_move
This patch is a rewrite of nir_opt_move.
Differently from the previous version, each instruction is checked
if it can be moved downwards and then inserted before the first user
of the definition. The advantage is that less insert operations are
performed, the original order is kept if two movable instructions have
the same first user, and instructions without user in the same block
are moved towards the end.

v2: Only return true if an instruction really changed the position.
    Don't care for discards, this will be handled by another MR.
v3: fix self-referring phis and update according to nir_can_move_instr().
v4: use nir_can_move_instr() and nir_instr_ssa_def()
v5: deduplicate some code

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3657>
2022-01-12 13:41:54 +00:00
Marcin Ślusarz
f286ecf906 nir: handle per-view clip/cull distances
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>
2022-01-11 22:45:23 +00:00
Marcin Ślusarz
0d6f83cbf1 nir: remove invalid assert affecting per-view variables
per-view variables can have arbitrary (but > 0) number of array levels

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>
2022-01-11 22:45:23 +00:00
Marcin Ślusarz
4fed440724 nir: add load_mesh_view_count and load_mesh_view_indices intrinsics
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>
2022-01-11 22:45:23 +00:00
Rhys Perry
67fc7a1763 nir/uniform_atomics: fix is_atomic_already_optimized without workgroups
dims_needed would have been zero, so this would always returned true for
non-compute stages.

Also fix this for variable workgroup sizes.

Improves Shadow of the Tomb Raider RX 6800 performance by 10.6%, 11.5% and
4.5% (day_of_dead, jungle and paititi scenes).

radv_perf before and after:
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '62.913333333333334', 'min_fps': '62.81', 'max_fps': '62.98', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '64.02666666666666', 'min_fps': '63.93', 'max_fps': '64.11', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '74.81666666666666', 'min_fps': '74.72', 'max_fps': '74.88', 'interations': '3'}

{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '69.57', 'min_fps': '69.52', 'max_fps': '69.63', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '71.41000000000001', 'min_fps': '71.31', 'max_fps': '71.5', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '78.16666666666667', 'min_fps': '78.07', 'max_fps': '78.23', 'interations': '3'}

Performance now seems slightly better than AMDVLK 2021.Q4.3:
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '68.02666666666666', 'min_fps': '67.95', 'max_fps': '68.16', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '70.24666666666667', 'min_fps': '69.83', 'max_fps': '70.51', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '77.19', 'min_fps': '77.18', 'max_fps': '77.2', 'interations': '3'}

fossil-db (Sienna Cichlid):
Totals from 40 (0.03% of 134621) affected shaders:
CodeSize: 62676 -> 65996 (+5.30%)
Instrs: 11372 -> 12111 (+6.50%)
Latency: 144122 -> 142848 (-0.88%); split: -1.09%, +0.21%
InvThroughput: 19686 -> 19847 (+0.82%); split: -0.06%, +0.87%
VClause: 304 -> 306 (+0.66%)
SClause: 603 -> 604 (+0.17%); split: -0.83%, +1.00%
Copies: 780 -> 858 (+10.00%)
Branches: 235 -> 329 (+40.00%)
PreSGPRs: 1072 -> 1083 (+1.03%); split: -0.37%, +1.40%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14407>
2022-01-10 19:57:38 +00:00
Rhys Perry
b00138090e nir/lower_shader_calls: fix store_scratch write_mask
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14447>
2022-01-10 19:01:04 +00:00
Danylo Piliaiev
b8d486f298 nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8
Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
2022-01-10 13:20:39 +02:00
Gert Wollny
6f348d9c99 nir_lower_io: propagate the "invariant" flag to outputs
Ultimately this is consumed by nir-to-tgsi and needed by virglrenderer
to correctly declare output variables.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>
2022-01-07 16:35:43 +00:00
Emma Anholt
558a600629 nir_to_tgsi: Enable fdot_replicates flag.
That's how the TGSI math opcodes work.

This lets lower_vec_to_regs coalesce the DP output into the .yzw channels,
giving an impressive shader-db win on softpipe:

total instructions in shared programs: 2929840 -> 2794036 (-4.64%)
instructions in affected programs: 1651438 -> 1515634 (-8.22%)
total temps in shared programs: 372730 -> 332744 (-10.73%)
temps in affected programs: 118151 -> 78165 (-33.84%)

and a minor one on r300:

total instructions in shared programs: 51238 -> 51149 (-0.17%)
instructions in affected programs: 2621 -> 2532 (-3.40%)
total vinst in shared programs: 15655 -> 15618 (-0.24%)
vinst in affected programs: 468 -> 431 (-7.91%)
total temps in shared programs: 9838 -> 9828 (-0.10%)
temps in affected programs: 59 -> 49 (-16.95%)

and a bigger one on i915g:
total instructions in shared programs: 398064 -> 395901 (-0.54%)
instructions in affected programs: 29271 -> 27108 (-7.39%)
total tex_indirect in shared programs: 12261 -> 12233 (-0.23%)
tex_indirect in affected programs: 98 -> 70 (-28.57%)
LOST:   0
GAINED: 5

The r300 change is less impressive because it does some backend copy-prop,
but also because intermediate storage of DPs now takes a vec4 instead of a
scalar.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14200>
2022-01-07 09:58:24 +00:00
Jesse Natalie
c09c0b351f nir_opt_dead_cf: Remove dead ifs
An if that looks like:
if (x) { } else { }
That has no phis following it is dead. Currently these are only
removed by peephole select, but that means that 'x' is considered
used until that pass is run, which can make it difficult to apply
sane lowering in the case where loading 'x' requires complex or
expensive transformations, but 'x' is *really* unused.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14400>
2022-01-07 05:15:48 +00:00
Alyssa Rosenzweig
24ea7cbb06 nir: Extend store_combined_output_pan
Extend store_combined_output_pan to take a dual source blend input in
addition to colour, depth, and stencil inputs. Use the last source for
this, and represent the type with the DEST_TYPE index. This is a hack
but there is no SRC2_TYPE and NIR doesn't seem to mind as long as we
know what we mean. This allows the backend to emit a combined "blend
render target #0" instruction taking two sources.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>
2022-01-02 01:12:05 +00:00
Alyssa Rosenzweig
5c168f09eb nir: Eliminate store_combined_output_pan BASE
It's meaningless for this intrinsic and is just adding noise to the
lowering pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>
2022-01-02 01:12:05 +00:00
Daniel Schürmann
17ecd0b31a nir/opt_algebraic: lower fneg_hi/lo to fmul
This pattern, found in the FSR upscaling shader,
helps the vectorization efforts by keeping the
chain of vectorized instructions intact.
Radeon can optimize it to per-component fneg modifiers.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688>
2021-12-21 13:23:37 +01:00
Caio Oliveira
729df14e45 nir: Handle volatile semantics for loading HelperInvocation builtin
SPV_EXT_demote_to_helper_invocation added OpDemoteToHelperInvocation
operation to turn an invocation into a helper invocation, but the
value of HelperInvocation (a builtin from Input storage class)
couldn't be modified dynamically without breaking compatibility.

For the extension the operation OpIsHelperInvocation was added to get
the dynamic value.

For SPIR-V 1.6, the demote operation was promoted, but now to get the
dynamic value the shader must issue a load to HelperInvocation with
Volatile memory access semantics.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14209>
2021-12-17 16:37:14 -08:00
Rhys Perry
a65285f54b nir/opt_access: infer CAN_REORDER for global access
fossil-db (Sienna Cichlid):
Totals from 352 (0.26% of 134621) affected shaders:
VGPRs: 17240 -> 17272 (+0.19%)
CodeSize: 1753640 -> 1755744 (+0.12%); split: -0.04%, +0.16%
Instrs: 323190 -> 323801 (+0.19%); split: -0.03%, +0.22%
Latency: 3241205 -> 3241293 (+0.00%); split: -0.10%, +0.10%
InvThroughput: 568927 -> 568067 (-0.15%); split: -0.16%, +0.00%
SClause: 12109 -> 10444 (-13.75%); split: -13.76%, +0.01%
Copies: 27802 -> 27717 (-0.31%); split: -0.56%, +0.26%
PreSGPRs: 14699 -> 14690 (-0.06%)
PreVGPRs: 15793 -> 15799 (+0.04%)

fossil-db (Polaris10):
Totals from 348 (0.26% of 135668) affected shaders:
SGPRs: 21446 -> 21574 (+0.60%); split: -0.15%, +0.75%
VGPRs: 17004 -> 16996 (-0.05%); split: -0.09%, +0.05%
CodeSize: 1782796 -> 1783060 (+0.01%); split: -0.03%, +0.05%
Instrs: 337828 -> 337921 (+0.03%); split: -0.03%, +0.06%
Latency: 3726328 -> 3726721 (+0.01%); split: -0.09%, +0.10%
InvThroughput: 1307917 -> 1299841 (-0.62%); split: -0.62%, +0.00%
VClause: 4327 -> 4337 (+0.23%); split: -0.09%, +0.32%
SClause: 12178 -> 10529 (-13.54%); split: -13.55%, +0.01%
Copies: 40227 -> 40244 (+0.04%); split: -0.19%, +0.24%
PreSGPRs: 14946 -> 14937 (-0.06%)
PreVGPRs: 15637 -> 15643 (+0.04%)

fossil-db (Pitcairn):
Totals from 351 (0.26% of 135668) affected shaders:
SGPRs: 20382 -> 20619 (+1.16%); split: -0.79%, +1.95%
CodeSize: 1789732 -> 1789836 (+0.01%); split: -0.04%, +0.04%
MaxWaves: 1947 -> 1949 (+0.10%)
Instrs: 352274 -> 352318 (+0.01%); split: -0.04%, +0.06%
Latency: 4057829 -> 4058226 (+0.01%); split: -0.08%, +0.09%
InvThroughput: 1332245 -> 1317578 (-1.10%); split: -1.11%, +0.01%
VClause: 8581 -> 8583 (+0.02%); split: -0.13%, +0.15%
SClause: 12187 -> 10552 (-13.42%); split: -13.43%, +0.02%
Copies: 44906 -> 44915 (+0.02%); split: -0.24%, +0.26%
PreSGPRs: 16571 -> 16562 (-0.05%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
2021-12-17 18:51:24 +00:00
Rhys Perry
403ae3b48e nir/algebraic: optimize more 64-bit imul with constant source
Two 64-bit shifts and an addition are usually faster than the several
multiplications nir_lower_int64 creates.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
2021-12-17 18:51:24 +00:00
Rhys Perry
c56cf157c5 nir/opt_load_store_vectorize: improve ssbo/global alias analysis
If either the global access or the ssbo access is restrict, they shouldn't
alias.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
2021-12-17 18:51:24 +00:00
Jason Ekstrand
deec7a590b anv,nir: Use sample_pos_or_center in lower_wpos_center
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Jason Ekstrand
e8acc5a7ea nir: Add a new sample_pos_or_center system value
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Marcin Ślusarz
504e5cb4e8 nir/print: print const value near each use of const ssa variable
Without/with NIR_DEBUG=print,print_const:

-vec4 32 ssa_60 = fadd ssa_59, ssa_58
+vec4 32 ssa_60 = fadd ssa_59 /*(0xbf800000, 0x3e800000, 0x00000000, 0x3f800000) = (-1.000000, 0.250000, 0.000000, 1.000000)*/, ssa_58

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13880>
2021-12-17 10:04:50 +00:00
Marcin Ślusarz
23f8f836e0 nir/print: group hex and float vectors together
Vectors are much easier to follow in this format, because developer cares
either about hex or float values, never both.

Before/after:

-vec4 32 ssa_222 = load_const (0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */, 0x3f800000 /* 1.000000 */, 0x3f800000 /* 1.000000 */)
+vec4 32 ssa_222 = load_const (0x00000000, 0x00000000, 0x3f800000, 0x3f800000) = (0.000000, 0.000000, 1.000000, 1.000000)

-vec1 32 ssa_174 = load_const (0xbf800000 /* -1.000000 */)
+vec1 32 ssa_174 = load_const (0xbf800000 = -1.000000)

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13880>
2021-12-17 10:04:50 +00:00
Marcin Ślusarz
d2b4051ea9 nir/print: move print_load_const_instr up
... to avoid forward declarations in future commit

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13880>
2021-12-17 10:04:50 +00:00
Marcin Ślusarz
f7e63ec5d8 nir/print: compact printing of intrinsic indices
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14222>
2021-12-16 09:43:13 +00:00
Marcin Ślusarz
d8fa625bb3 nir/print: expand printing of io semantics.gs_streams
gs_streams can be set for at least 2 other intrinsics.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14222>
2021-12-16 09:43:13 +00:00
Marcin Ślusarz
be25db9f0f nir/print: simplify printing of IO semantics
Some of the tested flags are set for other intrinsics and they are
printed only when set, so there's no point in checking exact intrinsic
name or shader stage.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14222>
2021-12-16 09:43:13 +00:00