Commit graph

3113 commits

Author SHA1 Message Date
Timothy Arceri
6dade5d534 nir: avoid uninitialized variable warning
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109231
2019-01-07 10:57:00 +11:00
Eric Anholt
f217a94542 nir: Add nir_lower_tex options to lower sampler return formats.
I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and
vc4, but nir could potentially do some useful stuff for us (like avoiding
unpack/repacks) if we give it the information.

v2: Skip lowering for txs/query_levels
v3: Fix a crash on old-style shadow
v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack
    the enum.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-04 15:59:57 -08:00
Eric Anholt
a74f2aeb4f nir: Allow nir_format_unpack_int/sint to unpack larger values.
For V3D, I want to unpack 4-16-bit packed integers for 8 and 16-bit
integer samplers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-04 15:59:30 -08:00
Caio Marcelo de Oliveira Filho
bbf9ee9b18 nir: remove dead code from copy_prop_vars
When copy_prop_vars also took care of dead write handling, intrin was
used as part of store_to_entry.  Now it isn't, so this assignment
isn't used really used.  Add a comment clarifying what happens to
intrin.

Fixes: 4dfa7adc10 "nir: Remove handling of dead writes from copy_prop_vars"
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-04 15:18:41 -08:00
Andres Gomez
f0312cfa93 glsl/linker: complete documentation for assign_attribute_or_color_locations
Commit 27f1298b9d ("glsl/linker: validate attribute aliasing before optimizations")
forgot to complete the documentation.

Cc: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-01-04 09:04:31 +02:00
Timothy Arceri
4d3f6cb973 nir: merge some basic consecutive ifs
After trying multiple times to merge if-statements with phis
between them I've come to the conclusion that it cannot be done
without regressions. The problem is for some shaders we end up
with a whole bunch of phis for the merged ifs resulting in
increased register pressure.

So this patch just merges ifs that have no phis between them.
This seems to be consistent with what LLVM does so for radeonsi
we only see a change (although its a large change) in a single
shader.

Shader-db results i965 (SKL):

total instructions in shared programs: 13098176 -> 13098152 (<.01%)
instructions in affected programs: 1326 -> 1302 (-1.81%)
helped: 4
HURT: 0

total cycles in shared programs: 332032989 -> 332037583 (<.01%)
cycles in affected programs: 60665 -> 65259 (7.57%)
helped: 0
HURT: 4

The cycles estimates reported by shader-db for i965 seem inaccurate
as the only difference in the final code is the removal of the
redundent condition evaluations and jumps.

Also the biggest code reduction (~7%) for radeonsi was in a tomb
raider tressfx shader but for some reason this does not get merged
for i965.

Shader-db results radeonsi (VEGA):

Totals from affected shaders:
SGPRS: 232 -> 232 (0.00 %)
VGPRS: 164 -> 164 (0.00 %)
Spilled SGPRs: 59 -> 59 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14584 -> 13520 (-7.30 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 13 -> 13 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-01-03 15:17:16 +11:00
Timothy Arceri
19cafe8084 nir: add rewrite_phi_predecessor_blocks() helper
This will also be used by the if merge pass in the following commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-01-03 15:17:16 +11:00
Timothy Arceri
5122fbc4ba nir: simplify does_varying_match()
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-01-03 11:47:56 +11:00
Timothy Arceri
8d05ee2005 nir: make use of does_varying_match() helper
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-01-03 11:47:56 +11:00
Timothy Arceri
0016166d19 nir: make nir_opt_remove_phis_impl() static
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-01-03 11:47:56 +11:00
Caio Marcelo de Oliveira Filho
7d6babf995 nir: add a way to print the deref chain
Makes debugging easier when we care about the deref chain and not the
deref instruction itself.  To make it take a const pointer, constify
some of the static functions in nir_print.c.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 10:09:04 -08:00
Iago Toral Quiroga
ec79069856 compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()
The 16-bit polynomial execution doesn't meet Khronos precision requirements.
Also, the half-float denorm range starts at 2^(-14) and with asin taking input
values in the range [0, 1], polynomial approximations can lead to flushing
relatively easy.

An alternative is to use the atan2 formula to compute asin, which is the
reference taken by Khronos to determine precision requirements, but that
ends up generating too many additional instructions when compared to the
polynomial approximation. Specifically, for the Intel case, doing this
adds +41 instructions to the program for each asin/acos call, which looks
like an undesirable trade off.

So for now we take the easy way out and fallback to using the 32-bit
polynomial approximation, which is better (faster) than the 16-bit atan2
implementation and gives us better precision that matches Khronos
requirements.

v2:
 - Fallback to 32-bit using recursion (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:39 +01:00
Iago Toral Quiroga
fda3f6d424 compiler/spirv: implement 16-bit frexp
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:35 +01:00
Iago Toral Quiroga
7d3c34197a compiler/spirv: implement 16-bit hyperbolic trigonometric functions
v2:
 - use nir_fadd_imm and nir_fmul_imm helpers (Jason)

v3:
 - since we need to define one for fsub use it for fdiv too (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga
88663ba67c compiler/spirv: implement 16-bit exp and log
v2
 - use nir_fmul_imm helper (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga
f18554e2ce compiler/spirv: implement 16-bit atan2
v2:
 - fix huge_val for 16-bit, it was mean't to be 2^14 not 10^14.

v3:
 - rebase on top of new bool sized opcodes
 - use nir_b2f helper
 - use nir_fmul_imm helper

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga
1c8de08ec9 compiler/spirv: implement 16-bit atan
v2:
 - use nir_fadd_imm and nir_fmul_imm helpers (Jason)
 - rebased on top of new sized boolean opcodes
 - use nir_b2f helper

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga
df118535ca compiler/spirv: implement 16-bit acos
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga
dbbbe24d76 compiler/spirv: implement 16-bit asin
v2:
  - use nir_fmul_imm and nir_fadd_imm helpers (Jason)

v3:
 - missed one case where we need to replace nir_imm_float
   with nir_imm_floatN_t (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga
95b7c29c2c compiler/spirv: handle 16-bit float in radians() and degrees()
v2:
 - use nir_imm_fmul helper (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga
aeee683780 compiler/nir: add nir_fadd_imm() and nir_fmul_imm() helpers
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga
5fc9ad1cb0 compiler/nir: add a nir_b2f() helper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Timothy Arceri
70be9afccb nir: link time opt duplicate varyings
If we are outputting the same value to more than one output
component rewrite the inputs to read from a single component.

This will allow the duplicate varying components to be optimised
away by the existing opts.

shader-db results i965 (SKL):

total instructions in shared programs: 12869230 -> 12860886 (-0.06%)
instructions in affected programs: 322601 -> 314257 (-2.59%)
helped: 3080
HURT: 8

total cycles in shared programs: 317792574 -> 317730593 (-0.02%)
cycles in affected programs: 2584925 -> 2522944 (-2.40%)
helped: 2975
HURT: 477

shader-db results radeonsi (VEGA):

SGPRS: 31576 -> 31664 (0.28 %)
VGPRS: 17484 -> 17064 (-2.40 %)
Spilled SGPRs: 184 -> 167 (-9.24 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 583340 -> 569368 (-2.40 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 6162 -> 6270 (1.75 %)
Wait states: 0 -> 0 (0.00 %)

vkpipeline-db results RADV (VEGA):

Totals from affected shaders:
SGPRS: 14880 -> 15080 (1.34 %)
VGPRS: 10872 -> 10888 (0.15 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 674016 -> 668396 (-0.83 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 2708 -> 2704 (-0.15 %)
Wait states: 0 -> 0 (0.00 %

V2: bunch of tidy ups suggested by Jason

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Timothy Arceri
d828694b80 nir: rework nir_link_opt_varyings()
This just cleans things up a little and make things more safe for
derefs.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Timothy Arceri
c0aba8b0dc nir: add can_replace_varying() helper
This will be reused by the following patch.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Timothy Arceri
50de3f80a8 nir: rename nir_link_constant_varyings() nir_link_opt_varyings()
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Samuel Pitoiset
f45e43e156 spirv: add support for SpvCapabilityStorageImageMultisample
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-20 18:01:09 +01:00
Iago Toral Quiroga
d6110d4d54 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs
The former expects to see SSA-only things, but the latter injects registers.

The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.

Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-20 08:02:44 +01:00
Caio Marcelo de Oliveira Filho
947f7b452a nir: properly find the entry to keep in copy_prop_vars
When copy propagation handles a store/copy, it iterates the current
copy entries to remove aliases, but keeps the "equal" entry (if
exists) to be updated.

The removal step may swap the entries around (to ensure there are no
holes), invalidating previous iteration pointers.  The bug was saving
such pointer to use later.  Change the code to first perform the
removals and then find the remaining right entry.

This was causing updates to be lost since they were being made to an
entry that was not part of the current copies.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-19 09:33:36 -08:00
Caio Marcelo de Oliveira Filho
0ddc911f4d nir: properly clear the entry sources in copy_prop_vars
When updating a copy entry source value from a "non-SSA" (the data
come from a copy instruction) to a "SSA" (the data or parts of it come
from SSA values), it was possible to hold invalid data in ssa[0]
depending on the writemask.  Because the union, ssa[0] could contain a
pointer to a nir_deref_instr left-over from previous non-SSA usage.

Change code to clean up the array before use to avoid invalid data
around.

Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-19 08:35:48 -08:00
Ian Romanick
96c4b135e3 nir/algebraic: Don't put quotes around floating point literals
The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value.  The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.

v2: Remove misleading comment about sized literals (suggested by
Timothy).  Add assertion that the name of a varible is entierly
alphabetic (suggested by Jason).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Fixes: 6bcd2af086 ("nir/algebraic: Add some optimizations for D3D-style Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
2018-12-18 23:28:31 -08:00
Sagar Ghuge
933c44bcc4 nir: Add a new lowering option to lower 3D surfaces from txd to txl.
Tested on gen9.

v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-18 13:44:09 -08:00
Jason Ekstrand
5dad1abfdc nir/dead_write_vars: Get modes directly from derefs
Instead of going all the way back to the variable, just look at the
deref.  The modes are guaranteed to be the same by nir_validate whenever
the variable can be found.  This fixes clear_unused_for_modes for
derefs that don't have an accessible variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand
fa40a58fd9 nir/copy_prop_vars: Get modes directly from derefs
Instead of going all the way back to the variable, just look at the
deref.  The modes are guaranteed to be the same by nir_validate whenever
the variable can be found.  This fixes apply_barrier_for_modes for
derefs that don't have an accessible variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand
cf7fb39805 nir/lower_wpos_center: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand
867fe35a16 nir/lower_io_to_scalar: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand
3fe0363dda nir/lower_io_arrays_to_elements: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand
8cc0f92492 nir/linking_helpers: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand
8410cf66d7 nir/propagate_invariant: Skip unknown vars
If we can't find the variable from the deref, just assume it isn't
invariant and continue on.  This can happen if, for instance, we're
writing to a deref that points into an SSBO.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Ian Romanick
29e4b949b4 Revert "nir/lower_indirect: Bail early if modes == 0"
"There's no point in walking the program if we're never going to
    actually lower anything."

Except we might lower compacted local arrays.  In that case, modes will
be 0, but there is still lowering to be done.

This reverts commit 7f75cf2a94.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
2018-12-18 10:47:54 -08:00
Ian Romanick
378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
09b7e1d8e4 nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Eric Anholt
708d8f4d0a nir: Fix clamping of uints for image store lowering.
I botched some copy-and-paste and clamped to signed int max instead of
uint max.  Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on
skl.

Fixes: d3e046e76c ("nir: Pull some of intel's image load/store format
conversion to nir_format.h")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-17 20:02:22 +00:00
Ian Romanick
9dc135efa1 nir: Release per-block metadata in nir_sweep
nir_sweep already marks all metadata invalid, so it is safe to release
the memory here too.

mean soft fp64 using uint64:   1,342,759,331 => 1,010,670,475
gfxbench5 aztec ruins high 11:    63,555,571 =>    61,889,811
deus ex mankind divided 148:      62,845,304 =>    62,829,640
deus ex mankind divided 2890:     71,922,686 =>    71,922,686
dirt showdown 676:                69,238,607 =>    69,238,607
dolphin ubershaders 210:          77,822,072 =>    77,822,072

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-16 14:39:56 -08:00
Ian Romanick
7adafd6e1c nir: Fix holes in nir_instr
Found using pahole.

Changes in peak memory usage according to Valgrind massif:

mean soft fp64 using uint64:   1,343,991,403 => 1,342,759,331
gfxbench5 aztec ruins high 11:    63,619,971 =>    63,555,571
deus ex mankind divided 148:      62,887,728 =>    62,845,304
deus ex mankind divided 2890:     72,399,750 =>    71,922,686
dirt showdown 676:                69,464,023 =>    69,238,607
dolphin ubershaders 210:          78,359,728 =>    77,822,072

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-16 14:39:56 -08:00
Ian Romanick
8161a87b24 nir/phi_builder: Use per-value hash table to store [block] -> def mapping
Replace the old array in each value with a hash table in each value.

Changes in peak memory usage according to Valgrind massif:

mean soft fp64 using uint64:   5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11:    63,619,971 =>    63,619,971
deus ex mankind divided 148:      62,887,728 =>    62,887,728
deus ex mankind divided 2890:     72,402,222 =>    72,399,750
dirt showdown 676:                74,466,431 =>    69,464,023
dolphin ubershaders 210:         109,630,376 =>    78,359,728

Run-time change for a full run on shader-db on my Haswell desktop (with
-march=native) is 1.22245% +/- 0.463879% (n=11).  This is about +2.9
seconds on a 237 second run.  The first time I sent this version of this
patch out, the run-time data was quite different.  I had misconfigured
the script that ran the test, and none of the tests from higher GLSL
versions were run.  These are generally more complex shaders, and they
are more affected by this change.

The previous version of this patch used a single hash table for the
whole phi builder.  The mapping was from [value, block] -> def, so a
separate allocation was needed for each [value, block] tuple.  There was
quite a bit of per-allocation overhead (due to ralloc), so the patch was
followed by a patch that added the use of the slab allocator.  The
results of those two patches was not quite as good:

mean soft fp64 using uint64:   5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11:    63,619,971 =>    63,619,971
deus ex mankind divided 148:      62,887,728 =>    62,887,728
deus ex mankind divided 2890:     72,402,222 =>    72,402,222 *
dirt showdown 676:                74,466,431 =>    72,443,591 *
dolphin ubershaders 210:         109,630,376 =>    81,034,320 *

The * denote tests that are better now.  In the tests that are the same
in both patches, the "after" peak memory usage was at a different
location.  I did not check the local peaks.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-16 14:39:56 -08:00
Jason Ekstrand
6bcd2af086 nir/algebraic: Add some optimizations for D3D-style Booleans
D3D Booleans use a 32-bit 0/-1 representation.  Because this previously
matched NIR exactly, we didn't have to really optimize for it.  Now that
we have 1-bit Booleans, we need some specific optimizations to chew
through the D3D12-style Booleans.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15136811 -> 14967944 (-1.12%)
    instructions in affected programs: 2457021 -> 2288154 (-6.87%)
    helped: 8318
    HURT: 10

    total cycles in shared programs: 373544524 -> 359701825 (-3.71%)
    cycles in affected programs: 151029683 -> 137186984 (-9.17%)
    helped: 7749
    HURT: 682

    total loops in shared programs: 4431 -> 4399 (-0.72%)
    loops in affected programs: 32 -> 0
    helped: 21
    HURT: 0

    total spills in shared programs: 10290 -> 10051 (-2.32%)
    spills in affected programs: 2532 -> 2293 (-9.44%)
    helped: 18
    HURT: 18

    total fills in shared programs: 22203 -> 21732 (-2.12%)
    fills in affected programs: 3319 -> 2848 (-14.19%)
    helped: 18
    HURT: 18

Note that a large chunk of the improvement fixing regressions caused by
switching to 1-bit Booleans.  Previously, our ability to optimize D3D
booleans was improved by using the D3D representation directly in NIR.
Now that NIR does 1-bit bools, we need a few more optimizations.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
3b30814791 nir/algebraic: Optimize 1-bit Booleans
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
44227453ec nir: Switch to using 1-bit Booleans for almost everything
This is a squash of a few distinct changes:

    glsl,spirv: Generate 1-bit Booleans

    Revert "Use 32-bit opcodes in the NIR producers and optimizations"

    Revert "nir/builder: Generate 32-bit bool opcodes transparently"

    nir/builder: Generate 1-bit Booleans in nir_build_imm_bool

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
11dc130779 nir: Add a bool to int32 lowering pass
We also enable it in all of the NIR drivers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00