Commit graph

3670 commits

Author SHA1 Message Date
Lionel Landwerlin
e06f9d49bc nir/lower_shader_calls: consider relocated constants as rematerializable
After all they're constants.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:25 +00:00
Lionel Landwerlin
6d9ae6ec1e intel: add a new intrinsic to get the shader stage from bindless shaders
We'll use this to apply ray tracing operations in our trivial return
shader based on the stage we're in.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:25 +00:00
Lionel Landwerlin
b8f087b0e6 nir/builder: add nir_ior_imm() helper
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:25 +00:00
Lionel Landwerlin
bb40e999d1 intel/nir: use a single intel intrinsic to deal with ray traversal
In the future we'll want to reuse this intrinsic to deal with ray
queries. Ray queries will use a different global pointer and
programmatically change the control/level arguments of the trace send
instruction.

v2: Comment on barrier after sync trace instruction (Caio)
    Generalize lsc helper (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:25 +00:00
Lionel Landwerlin
4deb8e86df nir: change intel dss_id intrinsic to topology_id
This will allow to reuse the same intrinsic for various topology based
ID.

v2: fix intrinsic comment (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:24 +00:00
Alyssa Rosenzweig
aaf00a1b4d nir,zink: Make lower_discard_if a common pass
This pass (lowering discard_if to control flow and unconditional
discard) was originally written for Zink, but is useful for hardware
that lacks conditional discard instructions like AGX. In theory AGX
could implement a conditional discard with CSEL, but the vendor
compiler uses a lowering like this one. Since I like not writing code,
I'd like to use the pass that's already in tree.

v2: Don't preserve dominance (Jason). Assert we don't see demotes or
terminates (Jason). Add Mike's ack.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14217>
2022-02-04 19:09:23 +00:00
Lionel Landwerlin
e227bb9fd5 nir/builder: add ishl_imm helper
v2: add (y >= x->bit_size) condition (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>
2022-02-02 17:09:46 +00:00
Connor Abbott
503a5bae59 nir: Add support for lowering shuffle to a waterfall loop
Qualcomm doesn't natively support shuffle, but it does natively support
relative shuffles where the delta is a constant. Therefore we'll expose
emulated support for both. Add support for this emulation of
subgroupShuffle() to NIR.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>
2022-02-01 16:27:45 +00:00
Connor Abbott
913bec10c4 nir/lower_subgroups: Rename lower_shuffle to lower_relative_shuffle
This option only applies to relative shuffles (up/down/xor), and in a
moment we're going to add an option to lower normal shuffles, so rename
it.

While we're here, rename lower_shuffle() to lower_to_shuffle() for
similar reasons.

Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>
2022-02-01 16:27:45 +00:00
Marcin Ślusarz
24fef8f33d intel/compiler: Use Task/Mesh InlineData for the first few push constants
Replace load_mesh_global_arg_addr_intel with a more general intrinsic
load_mesh_inline_data_intel, since inline data now hold both
a pointer descriptor information and the first few push constants.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>
2022-01-29 06:32:19 +00:00
Christian Gmeiner
e5f9cdac1f nir/nir_lower_tex_shadow: support tex_instr without deref src
Use texture_index if there is no deref src.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14308>
2022-01-28 18:40:53 +00:00
Christian Gmeiner
e67bca3fe7 nir: make lower_sample_tex_compare a common pass
This pass was originally written for d3d12, but is useful for hardware
that lacks sample compare support like some etnaviv GPU models.

Also rename the lowering pass and some surrounding code to
nir_lower_tex_shadow as suggested by Emma.

I'd like to use the pass that's already in tree.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14308>
2022-01-28 18:40:53 +00:00
Emma Anholt
61400f8a2d nir/lower_locals_to_regs: Do an ad-hoc copy propagate on our generated MOV.
I noticed the inefficiency in NIR-to-TGSI output while trying to debug a
failure handling some arrays in r600.  While this makes reading CTS
shaders easier, the effect in the real world is pretty limited.  From
softpipe shader-db:

total instructions in shared programs: 2929840 -> 2929836 (<.01%)
instructions in affected programs: 118 -> 114 (-3.39%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14321>
2022-01-25 06:01:13 +00:00
Bas Nieuwenhuizen
d1530a3f3b Revert "nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants"
This reverts commit a1af902531.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5423
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14532>
2022-01-21 16:58:11 +00:00
Rhys Perry
af51efe195 nir/builder: assume scalar alignment if not provided
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14455>
2022-01-21 13:45:33 +00:00
Rhys Perry
e9e1a44872 nir/builder: set write mask if not provided
Zero isn't really a valid write mask. If it's provided, use a full write
mask.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14455>
2022-01-21 13:45:33 +00:00
Rhys Perry
495debebad nir/algebraic: optimize expressions using fmulz/ffmaz
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
14b8227083 nir: add some missing nir_alu_type_get_base_type
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
f2fbba7920 nir/algebraic: optimize open-coded fmulz/ffmaz
This pattern will be found in future versions of D3D9 DXVK.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
312a284980 nir/algebraic: add ignore_exact() wrapper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
7f05ea3793 nir: add nir_op_fmulz and nir_op_ffmaz
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Emma Anholt
cac6f633b2 nir/opt_offsets: Use nir_ssa_scalar to chase offset additions.
For nir_to_tgsi, I want to be able to fold into the base from a vector
load_const, which the ad-hoc scalar chasing couldn't handle.

r300:
total instructions in shared programs: 1278731 -> 1256502 (-1.74%)
instructions in affected programs: 457909 -> 435680 (-4.85%)
total flowcontrol in shared programs: 8316 -> 8313 (-0.04%)
flowcontrol in affected programs: 5 -> 2 (-60.00%)
total temps in shared programs: 213687 -> 213774 (0.04%)
temps in affected programs: 13140 -> 13227 (0.66%)
total consts in shared programs: 952850 -> 949929 (-0.31%)
consts in affected programs: 386352 -> 383431 (-0.76%)

Fixes: #5781
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
2022-01-19 22:28:34 +00:00
Emma Anholt
645ca56425 nir/opt_offsets: Also apply the max offset to top-level constant folding.
nir_to_tgsi wants this for disabling folding into shared var accesses at
all.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
2022-01-19 22:28:34 +00:00
Emma Anholt
ec4b9909f0 nir/opt_offsets: Disable unsigned wrap checks on non-native-integers HW.
Since we don't have 32-bit ints, these checks for 32-bit unsigned wrapping
don't help and just reduce optimization opportunities (particularly for
DX9 addressing math).

Doesn't affect any current consumers.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
2022-01-19 22:28:34 +00:00
Emma Anholt
700d2fbd0a nir: Add a .base field to nir_load_ubo_vec4.
This lets nir-to-tgsi fold the constant offset of addressing calculations
into the CONST[] reference, which is important for D3D9-era compatibility:
HW of that age has limited uniform space, and if we do the addressing math
as math in the shader for dynamic indexing, the nir_load_consts end up
taking up uniforms we don't have available.

r300:
total instructions in shared programs: 1279699 -> 1279167 (-0.04%)
instructions in affected programs: 134796 -> 134264 (-0.39%)
total instructions in shared programs: 1279699 -> 1279167 (-0.04%)
instructions in affected programs: 134796 -> 134264 (-0.39%)
total temps in shared programs: 213912 -> 213736 (-0.08%)
temps in affected programs: 2166 -> 1990 (-8.13%)
total consts in shared programs: 953237 -> 952973 (-0.03%)
consts in affected programs: 45980 -> 45716 (-0.57%)

Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
2022-01-19 22:28:34 +00:00
Dave Airlie
ccbf700d6c nir: remove gl.h include from nir headers.
This saves a lot of pointless gl.h includes across the board,
it moves the one place that needs GLenum into a separate file
only used in those passes that require it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
2022-01-19 21:54:58 +00:00
Dave Airlie
1352e0ba0c mesa/*: add a shader primitive type to get away from GL types.
This creates an internal shader_prim enum, I've fixed up most
users to use it instead of GL types.

don't store the enum in shader_info as it changes size, and confuses
other things.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
2022-01-19 21:54:58 +00:00
Connor Abbott
9c9e8c3349 nir: Reorder ffma and fsub combining
It's relatively common to do something like "a * b - c", which on most
GPUs can be implemented in a single instruction. Before
opt_algebraic_late this will be something like
"fadd(fmul(a, b), fneg(c))", and we want to turn it info
"ffma(a, b, fneg(c))". But because the fsub pattern was first we
instead turned it into "fsub(fmul(a, b), c)". Fix this by reordering
them.

Selected shader-db results on freedreno:

total instructions in shared programs: 1561330 -> 1551619 (-0.62%)
instructions in affected programs: 780272 -> 770561 (-1.24%)
helped: 1941
HURT: 491
helped stats (abs) min: 1 max: 147 x̄: 7.98 x̃: 4
helped stats (rel) min: 0.07% max: 30.77% x̄: 4.36% x̃: 3.17%
HURT stats (abs)   min: 1 max: 307 x̄: 11.76 x̃: 5
HURT stats (rel)   min: 0.09% max: 18.71% x̄: 2.26% x̃: 1.38%
95% mean confidence interval for instructions value: -4.57 -3.41
95% mean confidence interval for instructions %-change: -3.21% -2.84%
Instructions are helped.

total nops in shared programs: 358926 -> 356263 (-0.74%)
nops in affected programs: 167116 -> 164453 (-1.59%)
helped: 1395
HURT: 859
helped stats (abs) min: 1 max: 108 x̄: 6.80 x̃: 3
helped stats (rel) min: 0.17% max: 100.00% x̄: 19.15% x̃: 10.57%
HURT stats (abs)   min: 1 max: 307 x̄: 7.95 x̃: 3
HURT stats (rel)   min: 0.00% max: 381.82% x̄: 20.04% x̃: 10.00%
95% mean confidence interval for nops value: -1.77 -0.59
95% mean confidence interval for nops %-change: -5.55% -2.87%
Nops are helped.

total non-nops in shared programs: 1202404 -> 1195356 (-0.59%)
non-nops in affected programs: 496682 -> 489634 (-1.42%)
helped: 1951
HURT: 265
helped stats (abs) min: 1 max: 39 x̄: 4.02 x̃: 3
helped stats (rel) min: 0.07% max: 15.38% x̄: 2.97% x̃: 1.96%
HURT stats (abs)   min: 1 max: 22 x̄: 2.97 x̃: 2
HURT stats (rel)   min: 0.05% max: 10.00% x̄: 1.14% x̃: 0.75%
95% mean confidence interval for non-nops value: -3.38 -2.99
95% mean confidence interval for non-nops %-change: -2.60% -2.36%
Non-nops are helped.

total systall in shared programs: 288317 -> 292975 (1.62%)
systall in affected programs: 87876 -> 92534 (5.30%)
helped: 388
HURT: 431
helped stats (abs) min: 1 max: 214 x̄: 14.39 x̃: 8
helped stats (rel) min: 0.25% max: 100.00% x̄: 22.12% x̃: 11.96%
HURT stats (abs)   min: 1 max: 232 x̄: 23.77 x̃: 7
HURT stats (rel)   min: 0.00% max: 1300.00% x̄: 51.71% x̃: 17.30%
95% mean confidence interval for systall value: 3.07 8.30
95% mean confidence interval for systall %-change: 9.49% 23.97%
Systall are HURT.

(The systall hurt is probably just due to having having fewer
instructions to hide latency with.)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14554>
2022-01-18 17:44:50 +00:00
Qiang Yu
2cee73f0f7 nir: fix nir_tex_instr hash not count is_sparse field
This fixes nir_opt_cse miss replace a non-sparse tex instruction
with a sparse tex instruction and fail the nir_validate_shader().

Fixes: 3a7972f72a ("nir,spirv: add sparse texture fetches")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14362>
2022-01-18 16:10:35 +08:00
Rhys Perry
d95a0b52e4 nir/unsigned_upper_bound: don't follow 64-bit f2u32()
Fixes Doom Eternal crash.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 72ac3f6026 ("nir: add nir_unsigned_upper_bound and nir_addition_might_overflow")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14555>
2022-01-17 10:59:21 +00:00
Emma Anholt
f6ffefba3e nir: Apply nir_opt_offsets to nir_intrinsic_load_uniform as well.
Doing this for ir3 required adding a struct for limits of how much base to
fold in (which NTT wants as well for its case of shared vars), otherwise
the later work to lower to the 1<<9 word limit would emit more
instructions.

The shader-db results are that sometimes the reduction in NIR instruction
count results in the fewer sampler prefetches due to the shader being
estimated to be shorter (dota2, nexuiz):

total instructions in shared programs: 8996651 -> 8996776 (<.01%)
total cat5 in shared programs: 86561 -> 86577 (0.02%)

Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>
2022-01-16 19:11:29 +00:00
Emma Anholt
b024102d7c freedreno/ir3: Use nir_opt_offset for removing constant adds for shared vars.
Saves some work in carchase and manhattan31:

instructions in affected programs: 2842 -> 2818 (-0.84%)
nops in affected programs: 1131 -> 1105 (-2.30%)
non-nops in affected programs: 1236 -> 1238 (0.16%)
mov in affected programs: 57 -> 61 (7.02%)
dwords in affected programs: 2144 -> 2150 (0.28%)
cat0 in affected programs: 1195 -> 1169 (-2.18%)
cat1 in affected programs: 151 -> 155 (2.65%)
cat2 in affected programs: 142 -> 140 (-1.41%)
sstall in affected programs: 190 -> 178 (-6.32%)
(ss) in affected programs: 63 -> 63 (0.00%)
systall in affected programs: 532 -> 511 (-3.95%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>
2022-01-16 19:11:29 +00:00
Daniel Schürmann
79a987ad2a nir/opt_if: also merge break statements with ones after the branch
This optimizations turns

     loop {
        ...
        if (cond1) {
           if (cond2) {
              do_work_1();
              break;
           } else {
              do_work_2();
           }
           do_work_3();
           break;
        } else {
           ...
        }
     }

 into:

     loop {
        ...
        if (cond1) {
           if (cond2) {
              do_work_1();
           } else {
              do_work_2();
              do_work_3();
           }
           break;
        } else {
           ...
        }
     }

As this optimizations moves code into the NIF statement,
it re-iterates on the branch legs in case of success.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7587>
2022-01-13 02:30:32 +00:00
Daniel Schürmann
dad609d152 nir/opt_if: merge two break statements from both branch legs
This optimization turns

     loop {
        ...
        if (cond) {
           do_work_1();
           break;
        } else {
           do_work_2();
           break;
        }
     }

 into:

     loop {
        ...
        if (cond) {
           do_work_1();
        } else {
           do_work_2();
        }
        break;
     }

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7587>
2022-01-13 02:30:32 +00:00
Daniel Schürmann
8a78706643 nir: refactor nir_opt_move
This patch is a rewrite of nir_opt_move.
Differently from the previous version, each instruction is checked
if it can be moved downwards and then inserted before the first user
of the definition. The advantage is that less insert operations are
performed, the original order is kept if two movable instructions have
the same first user, and instructions without user in the same block
are moved towards the end.

v2: Only return true if an instruction really changed the position.
    Don't care for discards, this will be handled by another MR.
v3: fix self-referring phis and update according to nir_can_move_instr().
v4: use nir_can_move_instr() and nir_instr_ssa_def()
v5: deduplicate some code

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3657>
2022-01-12 13:41:54 +00:00
Marcin Ślusarz
f286ecf906 nir: handle per-view clip/cull distances
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>
2022-01-11 22:45:23 +00:00
Marcin Ślusarz
0d6f83cbf1 nir: remove invalid assert affecting per-view variables
per-view variables can have arbitrary (but > 0) number of array levels

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>
2022-01-11 22:45:23 +00:00
Marcin Ślusarz
4fed440724 nir: add load_mesh_view_count and load_mesh_view_indices intrinsics
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>
2022-01-11 22:45:23 +00:00
Rhys Perry
67fc7a1763 nir/uniform_atomics: fix is_atomic_already_optimized without workgroups
dims_needed would have been zero, so this would always returned true for
non-compute stages.

Also fix this for variable workgroup sizes.

Improves Shadow of the Tomb Raider RX 6800 performance by 10.6%, 11.5% and
4.5% (day_of_dead, jungle and paititi scenes).

radv_perf before and after:
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '62.913333333333334', 'min_fps': '62.81', 'max_fps': '62.98', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '64.02666666666666', 'min_fps': '63.93', 'max_fps': '64.11', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '74.81666666666666', 'min_fps': '74.72', 'max_fps': '74.88', 'interations': '3'}

{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '69.57', 'min_fps': '69.52', 'max_fps': '69.63', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '71.41000000000001', 'min_fps': '71.31', 'max_fps': '71.5', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '78.16666666666667', 'min_fps': '78.07', 'max_fps': '78.23', 'interations': '3'}

Performance now seems slightly better than AMDVLK 2021.Q4.3:
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '68.02666666666666', 'min_fps': '67.95', 'max_fps': '68.16', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '70.24666666666667', 'min_fps': '69.83', 'max_fps': '70.51', 'interations': '3'}
{'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '77.19', 'min_fps': '77.18', 'max_fps': '77.2', 'interations': '3'}

fossil-db (Sienna Cichlid):
Totals from 40 (0.03% of 134621) affected shaders:
CodeSize: 62676 -> 65996 (+5.30%)
Instrs: 11372 -> 12111 (+6.50%)
Latency: 144122 -> 142848 (-0.88%); split: -1.09%, +0.21%
InvThroughput: 19686 -> 19847 (+0.82%); split: -0.06%, +0.87%
VClause: 304 -> 306 (+0.66%)
SClause: 603 -> 604 (+0.17%); split: -0.83%, +1.00%
Copies: 780 -> 858 (+10.00%)
Branches: 235 -> 329 (+40.00%)
PreSGPRs: 1072 -> 1083 (+1.03%); split: -0.37%, +1.40%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14407>
2022-01-10 19:57:38 +00:00
Rhys Perry
b00138090e nir/lower_shader_calls: fix store_scratch write_mask
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14447>
2022-01-10 19:01:04 +00:00
Danylo Piliaiev
b8d486f298 nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8
Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
2022-01-10 13:20:39 +02:00
Gert Wollny
6f348d9c99 nir_lower_io: propagate the "invariant" flag to outputs
Ultimately this is consumed by nir-to-tgsi and needed by virglrenderer
to correctly declare output variables.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>
2022-01-07 16:35:43 +00:00
Emma Anholt
558a600629 nir_to_tgsi: Enable fdot_replicates flag.
That's how the TGSI math opcodes work.

This lets lower_vec_to_regs coalesce the DP output into the .yzw channels,
giving an impressive shader-db win on softpipe:

total instructions in shared programs: 2929840 -> 2794036 (-4.64%)
instructions in affected programs: 1651438 -> 1515634 (-8.22%)
total temps in shared programs: 372730 -> 332744 (-10.73%)
temps in affected programs: 118151 -> 78165 (-33.84%)

and a minor one on r300:

total instructions in shared programs: 51238 -> 51149 (-0.17%)
instructions in affected programs: 2621 -> 2532 (-3.40%)
total vinst in shared programs: 15655 -> 15618 (-0.24%)
vinst in affected programs: 468 -> 431 (-7.91%)
total temps in shared programs: 9838 -> 9828 (-0.10%)
temps in affected programs: 59 -> 49 (-16.95%)

and a bigger one on i915g:
total instructions in shared programs: 398064 -> 395901 (-0.54%)
instructions in affected programs: 29271 -> 27108 (-7.39%)
total tex_indirect in shared programs: 12261 -> 12233 (-0.23%)
tex_indirect in affected programs: 98 -> 70 (-28.57%)
LOST:   0
GAINED: 5

The r300 change is less impressive because it does some backend copy-prop,
but also because intermediate storage of DPs now takes a vec4 instead of a
scalar.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14200>
2022-01-07 09:58:24 +00:00
Jesse Natalie
c09c0b351f nir_opt_dead_cf: Remove dead ifs
An if that looks like:
if (x) { } else { }
That has no phis following it is dead. Currently these are only
removed by peephole select, but that means that 'x' is considered
used until that pass is run, which can make it difficult to apply
sane lowering in the case where loading 'x' requires complex or
expensive transformations, but 'x' is *really* unused.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14400>
2022-01-07 05:15:48 +00:00
Alyssa Rosenzweig
24ea7cbb06 nir: Extend store_combined_output_pan
Extend store_combined_output_pan to take a dual source blend input in
addition to colour, depth, and stencil inputs. Use the last source for
this, and represent the type with the DEST_TYPE index. This is a hack
but there is no SRC2_TYPE and NIR doesn't seem to mind as long as we
know what we mean. This allows the backend to emit a combined "blend
render target #0" instruction taking two sources.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>
2022-01-02 01:12:05 +00:00
Alyssa Rosenzweig
5c168f09eb nir: Eliminate store_combined_output_pan BASE
It's meaningless for this intrinsic and is just adding noise to the
lowering pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>
2022-01-02 01:12:05 +00:00
Daniel Schürmann
17ecd0b31a nir/opt_algebraic: lower fneg_hi/lo to fmul
This pattern, found in the FSR upscaling shader,
helps the vectorization efforts by keeping the
chain of vectorized instructions intact.
Radeon can optimize it to per-component fneg modifiers.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688>
2021-12-21 13:23:37 +01:00
Caio Oliveira
729df14e45 nir: Handle volatile semantics for loading HelperInvocation builtin
SPV_EXT_demote_to_helper_invocation added OpDemoteToHelperInvocation
operation to turn an invocation into a helper invocation, but the
value of HelperInvocation (a builtin from Input storage class)
couldn't be modified dynamically without breaking compatibility.

For the extension the operation OpIsHelperInvocation was added to get
the dynamic value.

For SPIR-V 1.6, the demote operation was promoted, but now to get the
dynamic value the shader must issue a load to HelperInvocation with
Volatile memory access semantics.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14209>
2021-12-17 16:37:14 -08:00
Rhys Perry
a65285f54b nir/opt_access: infer CAN_REORDER for global access
fossil-db (Sienna Cichlid):
Totals from 352 (0.26% of 134621) affected shaders:
VGPRs: 17240 -> 17272 (+0.19%)
CodeSize: 1753640 -> 1755744 (+0.12%); split: -0.04%, +0.16%
Instrs: 323190 -> 323801 (+0.19%); split: -0.03%, +0.22%
Latency: 3241205 -> 3241293 (+0.00%); split: -0.10%, +0.10%
InvThroughput: 568927 -> 568067 (-0.15%); split: -0.16%, +0.00%
SClause: 12109 -> 10444 (-13.75%); split: -13.76%, +0.01%
Copies: 27802 -> 27717 (-0.31%); split: -0.56%, +0.26%
PreSGPRs: 14699 -> 14690 (-0.06%)
PreVGPRs: 15793 -> 15799 (+0.04%)

fossil-db (Polaris10):
Totals from 348 (0.26% of 135668) affected shaders:
SGPRs: 21446 -> 21574 (+0.60%); split: -0.15%, +0.75%
VGPRs: 17004 -> 16996 (-0.05%); split: -0.09%, +0.05%
CodeSize: 1782796 -> 1783060 (+0.01%); split: -0.03%, +0.05%
Instrs: 337828 -> 337921 (+0.03%); split: -0.03%, +0.06%
Latency: 3726328 -> 3726721 (+0.01%); split: -0.09%, +0.10%
InvThroughput: 1307917 -> 1299841 (-0.62%); split: -0.62%, +0.00%
VClause: 4327 -> 4337 (+0.23%); split: -0.09%, +0.32%
SClause: 12178 -> 10529 (-13.54%); split: -13.55%, +0.01%
Copies: 40227 -> 40244 (+0.04%); split: -0.19%, +0.24%
PreSGPRs: 14946 -> 14937 (-0.06%)
PreVGPRs: 15637 -> 15643 (+0.04%)

fossil-db (Pitcairn):
Totals from 351 (0.26% of 135668) affected shaders:
SGPRs: 20382 -> 20619 (+1.16%); split: -0.79%, +1.95%
CodeSize: 1789732 -> 1789836 (+0.01%); split: -0.04%, +0.04%
MaxWaves: 1947 -> 1949 (+0.10%)
Instrs: 352274 -> 352318 (+0.01%); split: -0.04%, +0.06%
Latency: 4057829 -> 4058226 (+0.01%); split: -0.08%, +0.09%
InvThroughput: 1332245 -> 1317578 (-1.10%); split: -1.11%, +0.01%
VClause: 8581 -> 8583 (+0.02%); split: -0.13%, +0.15%
SClause: 12187 -> 10552 (-13.42%); split: -13.43%, +0.02%
Copies: 44906 -> 44915 (+0.02%); split: -0.24%, +0.26%
PreSGPRs: 16571 -> 16562 (-0.05%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
2021-12-17 18:51:24 +00:00
Rhys Perry
403ae3b48e nir/algebraic: optimize more 64-bit imul with constant source
Two 64-bit shifts and an addition are usually faster than the several
multiplications nir_lower_int64 creates.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
2021-12-17 18:51:24 +00:00