Commit graph

110 commits

Author SHA1 Message Date
Julian Winkler
b273611bb1 nir: Add a structurizer
v2 (Karol):
  renamed pathes to paths
  use more bool
  use _mesa_set_intersects
  deduplicated some code
  fixed some typos
v3 (Karol):
  don't enable structurizer as we do this in vtn now
v4 (Jason):
  A few clean-ups due to unstructured NIR changes
v5 (Jason):
  Misc whitespace and style cleanups

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2401>
2020-08-14 20:35:36 +00:00
Eric Anholt
ee2f21b10d nir: Remove the old nir_opt_shrink_load.
The old pass only handled intrinsic load_constant, while the new
nir_opt_shrink_vectors handles ALU ops, nir load_consts, along with all
the load intrinsics.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6050>
2020-08-03 21:26:45 +00:00
Eric Anholt
1c9906d5ff nir: Add a pass to cut the trailing ends of vectors.
Ideally we'd also handle unused middles of vectors and reswizzle ALU-only
uses of it so we could write fewer channels, but that's future work/

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6050>
2020-08-03 21:26:45 +00:00
Rhys Perry
2adb337256 nir,radv/aco: add and use pass to lower make available/visible barriers
Lower them to ACCESS_COHERENT to simplify the backend and
probably give better performance than invalidating or writing back the
entire L0/L1 cache.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4905>
2020-07-28 16:56:34 +00:00
Neil Roberts
7665398e6c nir/scheduler: Move nir_scheduler to its own header
nir_schedule already has a struct for options which makes it more than
just a function declaration. Later patches intend to add more structs to
complement these options. In order to make the code easier to manage,
this moves the nir_scheduler-related parts out of nir.h to their own
header.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
2020-07-24 09:21:11 +02:00
Danylo Piliaiev
348e8b5618 nir/tests: Add tests for opt_if_simplification
Test cases:

opt_if_simplification - the most trivial test case.

opt_if_simplification_single_source_phi_after_if - tests that
opt_if_simplification correctly handles single-source phis after
the if, found in https://gitlab.freedesktop.org/mesa/mesa/-/issues/3282

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5945>
2020-07-22 14:20:21 +00:00
Mike Blumenkrantz
1fd3563025 nir: add lowering pass for fragcolor -> fragdata
this is needed for zink and other drivers which can support fragcolor but
not fragdata and want to correctly handle EXT_multiview_draw_buffers

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5687>
2020-07-08 14:51:34 +00:00
Mike Blumenkrantz
fb2fe802f6 nir: add lowering pass for clip plane enabling
a pass which rewrites gl_ClipDistance[n] to an undef if the corresponding
clip plane is disabled in the rasterizer state

this pass is needed for zink to handle api disables of clip planes

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5529>
2020-07-03 08:56:30 +00:00
Dylan Baker
a8e2d79e02 meson: use gnu_symbol_visibility argument
This uses a meson builtin to handle -fvisibility=hidden. This is nice
because we don't need to track which languages are used, if C++ is
suddenly added meson just does the right thing.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
2020-06-01 18:59:18 +00:00
Rob Clark
42d38ad028 nir: add pass to lower disjoint wrmask's
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2020-05-13 20:24:49 -07:00
Alyssa Rosenzweig
42c9bbaeed nir: Move nir_lower_mediump_outputs from ir3
(Original code from ir3)

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4716>
2020-04-27 16:32:24 +00:00
Jonathan Marek
f8558fb1ce nir: add common convert_ycbcr for vulkan csc
Copied from anv, replaced state with passing model/range directly.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: D Scott Phillips <d.scott.phillips@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
2020-04-20 22:01:43 +00:00
Eric Engestrom
79af30768d meson: inline inc_common
Let's make it clear what includes are being added everywhere, so that
they can be cleaned up.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360>
2020-03-28 21:36:54 +01:00
Iago Toral Quiroga
467c9a0faa nir: add a bool bitsize lowering pass
The pass lowers 1-bit booleans produced by NIR to the native bitsize
of the operations that produce them.

v2: change on lower_load_const_instr after upstream changes. Added
    TODO2 to explain it, as it was not properly tested yet (see
    already existing TODO) (Neil)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Caio Marcelo de Oliveira Filho
bf432cd831 nir: Add pass to combine adjacent scoped memory barriers
SPIR-V generates very granular barriers, however HW and backends might
not necessarily take advantage of those.  This pass provides a general
mechanism to combine such barriers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224>
2020-03-12 19:21:36 +00:00
Daniel Schürmann
ce87da71e9 nir: add pass to lower discard() to demote()
This pass is intended to work around game bugs, only!
It also lowers nir_intrinsic_load_helper_invocation to
nir_intrinsic_is_helper_invocation for consistency.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047>
2020-03-09 12:29:32 +00:00
Alyssa Rosenzweig
7ab4e4dd96 nir: Add SSBO->global lowering pass
To facilitate lowering SSBOs to globals, we need a load_ssbo_address
intrinsic. This intrinsic takes an SSBO index and loads the address in
global memory of the SSBO (likely implemented via a uniform in the
driver). In the future, we'll support bounds checking, but at the moment
this is not supported (this pass should only be used for trusted
contexts at the moment, i.e. contexts without robustness extensions).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2753>
2020-02-21 13:06:22 +00:00
Arcady Goldmints-Orlov
e9f83185a2 Rename nir_lower_constant_initializers to nir_lower_variable_initalizers
This is naming is more clear as nir_variables can be initializes not
just with a nir_constant but with a pointer to another nir_variable.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047>
2020-02-12 15:41:49 +00:00
Michel Dänzer
65610ec774 gitlab-ci: Add ppc64el and s390x cross-build jobs
Using LLVM 8 for ppc64el and 7 for s390x (which hits some coroutine
related issues with LLVM 8).

There are some test failures we need to ignore for now. Also, the
timeout needs to be bumped from the default 30s for some tests, because
they can take longer under emulation.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3643>
2020-02-05 10:52:31 +00:00
Erik Faye-Lund
d9ff5f0414 nir/zink: move clip_halfz-lowering to common code
Etnaviv also does the same thing, so let's try to avoid repetition here,
and use the same for it code as well.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Paul Cercueil <paul@crapouillou.net>
2020-01-03 22:48:19 +00:00
Karol Herbst
11f736a6f9 nir/tests: add serializer tests
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-12-11 13:00:44 +01:00
Eric Anholt
8afab607ac nir: Add a scheduler pass to reduce maximum register pressure.
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable.  A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.

Results for v3d:

total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)

v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
    patch, include SSA defs in live values so we can switch to bottom-up
    if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
    successfully register allocate on GLES3.1 dEQP.  Make sure that
    discards don't move after store_output.  Comment spelling fix.
2019-11-25 21:12:21 +00:00
Rhys Perry
0a759c3be6 nir: add load/store vectorizer tests
v7: run nir_opt_algebraic
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v10: add tests for 64-bit offsets
v10: add tests for signed offsets

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
2019-11-25 13:59:11 +00:00
Rhys Perry
ce9205c03b nir: add a load/store vectorization pass
This pass combines intersecting, adjacent and identical loads/stores into
potentially larger ones and will be used by ACO to greatly reduce the
number of memory operations.

v2: handle nir_deref_type_ptr_as_array
v3: assume explicitly laid out types for derefs
v4: create less deref casts
v4: fix shared boolean vectorization
v4: fix copy+paste error in resources_different
v4: fix extract_subvector() to pass
    nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64
v4: rebase
v5: subtract from deref/offset instead of scheduling offset calculations
v5: various non-functional changes/cleanups
v5: require less metadata and preserve more
v5: rebase
v6: cleanup and improve dependency handling
v6: emit less deref casts
v6: pass undef to components not set in the write_mask for new stores
v7: fix 8-bit extract_vector() with 64-bit input
v7: cleanup creation of store write data
v7: update align correctly for when the bit size of load/store increases
v7: rename extract_vector to extract_component and update comment
v8: prevent combining of row-major matrix column acceses
v9: rework process_block() to be able to vectorize more
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v9: remove entry::store_value, since it will not be updated if it's was
    from a vectorized load
v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2
v9: handle nir_intrinsic_scoped_memory_barrier
v10: use nir_ssa_scalar
v10: handle non-32-bit offsets
v10: use signed offsets for comparison
v10: improve create_entry_key_from_offset()
v10: support load_shared/store_shared
v10: remove strip_deref_casts()
v10: don't ever pass NULL to memcmp
v10: remove recursion in gcd()
v10: fix outdated comment
v11: use the new nir_extract_bits()
v12: remove use of nir_src_as_const_value in resources_different
v13: make entry key hash function deterministic
v13: simplify mask_sign_extend()
v14: add comment in hash_entry_key() about hashing pointers

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
2019-11-25 13:59:11 +00:00
Marek Olšák
ff71fae440 nir: strip as we serialize to remove the nir_shader_clone call
Serializing stripped NIR is faster now.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-21 18:49:57 -05:00
Jason Ekstrand
b8d45d9307 nir: Add tests for nir_extract_bits 2019-11-11 17:17:02 +00:00
Rob Clark
5e08f070f0 nir: add nir_lower_amul pass
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-18 15:08:54 -07:00
Erik Faye-Lund
878c94288a nir: add lowering-pass for point-size mov
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Dave Airlie
dc91a02a72 nir: add a pass to lower flat shading.
This takes any color or backcolor that has unspecified
shading and converts it to flat shading.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Marek Olšák
3340c066a1 nir: move gl_nir_opt_access from glsl directory
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-10 15:49:18 -04:00
Eric Engestrom
7a1dc6ab44 meson: rename libnir to _libnir to make it clear it's not meant to be used anywhere else
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-10-07 21:49:40 +01:00
Timur Kristóf
610cc3089c nir: Carve out nir_lower_samplers from GLSL code.
Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.

This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-06 12:20:20 +03:00
Daniel Schürmann
df86c5ffb3 nir: add divergence analysis pass.
This pass expects the shader to be in LCSSA form.
The algorithm is based on 'The Simple Divergence Analysis' from
Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira.
Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-20 17:40:13 +02:00
Eric Engestrom
a3d6024199 meson: add nir tests to the compiler/nir test suite
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-14 22:17:06 +01:00
Iago Toral Quiroga
48f5c34301 nir: add a pass to clamp gl_PointSize to a range
The OpenGL and OpenGL ES specs require that implementations clamp the
value of gl_PointSize to an implementation-depedent range. This pass
is useful for any GPU hardware that doesn't do this automatically
for either one or both sides of the range, such as V3D.

v2:
 - Turn into a generic NIR pass (Eric).
 - Make the pass work before lower I/O so we can use the deref variable
   to inspect if we are writing to gl_PointSize (Eric).
 - Make the pass take the range to clamp as parameter and allow it
   to clamp to both sides of the range or just one side.
 - Make the pass report progress.

v3:
 - Fix copyright header (Eric)
 - use fmin/fmax instead of bcsel to clamp (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 09:44:12 +02:00
Rhys Perry
7740149852 nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Rhys Perry
da8ed68aca nir: replace nir_move_load_const() with nir_opt_sink()
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).

v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Ian Romanick
405de7ccb6 nir/range-analysis: Rudimentary value range analysis pass
Most integer operations are omitted because dealing with integer
overflow is hard.  There are a few things that could be smarter if there
was a small amount more tracking of ranges of integer types (i.e.,
operands are Boolean, operand values fit in 16 bits, etc.).

The changes to nir_search_helpers.h are included in this patch to
simplify reordering the changes to nir_opt_algebraic.py.

v2: Memoize range analysis results.  Without this, some shaders appear
to get stuck in infinite loops.

v3: Rebase on many months of Mesa changes, including 1-bit Boolean
changes.

v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction".

v5: Use nir_alu_srcs_equal for detecting (a*a).  Previously just the SSA
value was compared, and this incorrectly matched (a.x*a.y).

v6: Many code improvements including (but not limited to) better names,
more comments, and better use of helper functions.  All suggested by
Caio.  Rework the handling of several opcodes to use a table for mapping
source ranges to a result range.  This change fixed a bug that caused
fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero.
Slightly tighten the range of fmul by recognizing that x*x is gt_zero if
x is gt_zero.  Add similar handling for -x*x.

v7: Use _______ in the tables as an alias for unknown.  Suggested by
Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Eric Engestrom
d2d85b950d meson: replace libmesa_util with idep_mesautil
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.

Next commit will remove the ones that were only added for that reason.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Jonathan Marek
bc3b6168ba nir: replace lower_sincos with algebraic opt
This version has less ops for the same precision.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Ian Romanick
b08d704051 nir: Add unit tests for nir_opt_comparison_pre
Each tests has a comment with the expected before and after NIR.  The
tests don't actually check this.  The tests only check whether or not
the optimization pass reported progress.  I couldn't think of a robust,
future-proof way to check the before and after code.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Daniel Schürmann
c31f470066 anv,nir: Move lower_input_attachments pass from ANV to NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:02:50 +02:00
Rob Clark
5787a2dfe3 nir: add pass to lower load_interpolated_input
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Connor Abbott
47e7c6961a nir: add a vectorization pass
This effectively does the opposite of nir_lower_alus_to_scalar, trying
to combine per-component ALU operations with the same sources but
different swizzles into one larger ALU operation. It uses a similar
model as CSE, where we do a depth-first approach and keep around a hash
set of instructions to be combined, but there are a few major
differences:

1. For now, we only support entirely per-component ALU operations.
2. Since it's not always guaranteed that we'll be able to combine
equivalent instructions, we keep a stack of equivalent instructions
around, trying to combine new instructions with instructions on the
stack.

The pass isn't comprehensive by far; it can't handle operations where
some of the sources are per-component and others aren't, and it can't
handle phi nodes. But it should handle the more common cases, and it
should be reasonably efficient.

[Alyssa: Rebase on latest master, updating with respect to typeless
moves]

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-18 06:43:30 -07:00
Ian Romanick
3ee2e84c60 nir: Rematerialize compare instructions
On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag.  This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction.  This can save register pressure by
not needing to store the Boolean value.

There are several possible ares for future improvement to this pass:

1. Be more conservative.  If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare.  The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.

2. Be less conservative.  Currently the pass requires that all users of
the comparison match the pattern.  The idea is that after the pass is
complete, no instruction will use the resulting Boolean value.  The only
uses will be of the flag value.  It may be beneficial to relax this
requirement in some cases.

3. Be less conservative.  Also try to rematerialize comparisons used for
discard_if intrinsics.  After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already.  The changes were pretty small.  Instructions were helped in 19
shaders, but, overall, cycles were hurt.  A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.

4. Copy the preceeding ALU instruction.  If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction.  On Intel GPUs, this may enable
cmod propagation to make additional progress.

v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Vasily Khoruzhick
e67e4e90b2 nir: implement lowering for fsin and fcos
Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648

It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Ian Romanick
158370ed2a nir/flrp: Add new lowering pass for flrp instructions
This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).

v2: Document the parameters to nir_lower_flrp.  Rebase on top of
3766334923 ("compiler/nir: add lowering for 16-bit flrp")

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Vasily Khoruzhick
443c5a3cd6 nir: add int_to_float lowering pass
This new pass lowers ints and bools to floats. It allows hardware
that doesn't have native integers (e.g. Mali4x0) use the same
code paths as modern hardware.

It uses newly introduced pass to gather SSA types and should be
used as late as possible.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Jason Ekstrand
91899495a1 nir: Add a SSA type gathering pass
This new pass (which isn't even compile-tested) attempts to determine
the ALU type of all the SSA values in a function impl.  It takes a
greedy approach and assigns intness or floatness to everything it thinks
can possibly contain an int or a float.  Some values will be labled as
both int and float and some will be labled as neither and it is up to
the caller to decide what to do with this information.  However, for a
"nice" shader where the original source contained no bit-casts and no
implicit bit-casts were introduced by optimizations, there shouldn't be
any overlap in the two sets save for the odd CSEd zero constant.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-04 03:52:05 +00:00
Rob Clark
a99c360a46 nir: add pass to lower fb reads
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00