From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering.
ior: fmax instead of fadd allows removing the fsat.
inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better
with the scalar instruction set.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
We also enable it in all of the NIR drivers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's a reasonably well-known fact in the world of compilers that integer
divisions by constants can be replaced by a multiply, an add, and some
shifts. This commit adds such an optimization to NIR for easiest case
of udiv. Other division operations will be added in following commits.
In order to provide some additional driver control, the pass takes a
minimum bit size to optimize.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
Just to make it easier to run a nir tests together.
Fixes: a0ae12ca91
("nir/algebraic: Add unit tests for bitsize validation")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The non-failure path can be tested by just compiling mesa and then
testing it, but the failure paths won't be hit unless you make a mistake,
so it's best to test them with some unit tests.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Meson test has a concepts of suites, which allow tests to be grouped
together. This allows for a subtest of tests to be run only (say only
the tests for nir). A test can be added to more than one suite, but for
the most part I've only added a test to a single suite, though I've
added a compiler group that includes nir, glsl, and glcpp tests.
To use this you'll need to invoke meson test directly, instead of ninja
test (which always runs all targets). it can be invoked as:
`meson test -C builddir --suite $suitename` (meson test has addition
options that are pretty useful).
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
This is different from the GL_ARB_spirv pass because it generates a much
simpler data structure that isn't tied to OpenGL and mtypes.h.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Instead of doing this as part of the existing copy_prop_vars pass.
Separation makes easier to expand the scope of both passes to be more
than per-block. For copy propagation, the information about valid
copies comes from previous instructions; while the dead write removal
depends on information from later instructions ("have any instruction
used this deref before overwrite it?").
Also change the tests to use this pass (instead of copy prop vars).
Note that the disabled tests continue to fail, since the standalone
pass is still per-block.
v2: Remove entries from dynarray instead of marking items as
deleted. Use foreach_reverse. (Caio)
(all from Jason)
Do not cache nir_deref_path. Not worthy for this patch.
Clear unused writes when hitting a call instruction.
Clean up enumeration of modes for barriers.
Move metadata calls to the inner function.
v3: For copies, use the vector length to calculate the mask.
(all from Jason)
Use nir_component_mask_t when applicable.
Rename functions for clarity.
Consider local vars used by a call to be conservative (SPIR-V has
such cases).
Comment and assert the assumption that stores and copies are
always to a deref that ends with a vector or scalar.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add basic helpers for doing tests on the vars related optimization
passes. The main goal is to lower the barrier to create tests during
development and debugging of the passes. Full coverage is not a
requirement.
v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason)
Move nir_imm_ivec2() to nir_builder.h. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This peephole optimization looks for a series of load/store_deref or
copy_deref instructions that copy an array from one variable to another
and turns it into a copy_deref that copies the entire array. The
pattern it looks for is extremely specific but it's good enough to pick
up on the input array copies in DXVK and should also be able to pick up
the sequence generated by spirv_to_nir for a OpLoad of a large composite
followed by OpStore. It can always be improved later if needed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through structures and considers them to be "split". This
pass exists to help other passes more easily see through structure
variables. If a back-end does implement arrays using scratch or
indirects on registers, having more smaller arrays is likely to have
better memory efficiency.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.
Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
also move some of the GLSL builtins over we will need for implementing
some OpenCL builtins
v2: replace NIR_IMM_FP by nir_imm_floatN_t in ported code
fix up changes caused by swizzle rework
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
This pass searches for reasonably large local variables which can be
statically proven to be constant and moves them into shader constant
data. This is especially useful when large tables are baked into the
shader source code because they can be moved into a UBO by the driver to
reduce register pressure and make indirect access cheaper.
v2 (Jason Ekstrand):
- Use a size/align function to ensure we get the right alignments
- Use the newly added deref offset helpers
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's only used by the ir3 stand-alone compiler and Rob said we could
delete it.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds a concept of "members" to a variable with an interface type.
It allows you to specify the full variable data for each member of the
interface instead of once for the variable. We also add a lowering pass
to lower those variables to a sequence of variables and rewrite all the
derefs accordingly.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit introduces a new nir_deref.h header for helpers that are
less common and really only needed by a few heavy-duty passes. In this
header is a new struct for representing a full deref path which can be
walked in either direction.
v2 (Jason Ekstrand):
- Assert that deref != NULL (Caio)
- Fill _short_path with 0xdeadbeef in debug builds when not used (Caio)
- Make nir_deref_path a typedef (Rob)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a pass for lowering deref instructions to deref chains
as well as some smaller helpers to ease the transition.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Run this pass late (after opt loop) to move load_const instructions back
into the basic blocks which use the result, in cases where a load_const
is only consumed in a single block.
This helps reduce register usage in cases where the backend driver
cannot lower the load_const to a uniform.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is based on the glsl/lower_instructions.cpp implementation, but
should be much more readable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Not all bit-sizes may be supported natively in hardware for all operations.
This pass allows drivers to lower such operations to a bit-size that is
actually supported and then converts the result back to the original
bit-size.
Compiler backends control which operations and wich bit-sizes require
the lowering through a callback function.
v2: generalize this pass and make it available in NIR core (Rob, Jason)
v3: remove some temporaries and reduce nesting in instruction loop using
a continue statement (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
nir_builder_opcodes.h also depends on nir_intrinsics.py for generating
the system-value builders.
Reported-by: Christoph Haag <haagch@frickel.club>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this we should have no passes in src/compiler/nir with any
dependencies on headers from core GL Mesa.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
ARB_gl_spirv adds the ability to use SPIR-V binaries, and a new
method, glSpecializeShader. Here we add a new function to do the
validation for this function:
From OpenGL 4.6 spec, section 7.2.1"
"Shader Specialization", error table:
INVALID_VALUE is generated if <pEntryPoint> does not name a valid
entry point for <shader>.
INVALID_VALUE is generated if any element of <pConstantIndex>
refers to a specialization constant that does not exist in the
shader module contained in <shader>.""
v2: rebase update (spirv_to_nir options added, changes on the warning
logging, and others)
v3: include passing options on common initialization, doesn't call
setjmp on common_initialization
v4: (after Jason comments):
* Rename common_initialization to vtn_builder_create
* Move validation method and their helpers to own source file.
* Create own handle_constant_decoration_cb instead of reuse existing one
v5: put vtn_build_create refactoring to their own patch (Jason)
v6: update after vtn_builder_create method renamed, add explanatory
comment, tweak existing comment and commit message (Timothy)
Future changes will add generated files used only from
src/compiler/glsl. These can't be built from Makefile.nir.am, and we
can't move all the rules from Makefile.nir.am to Makefile.spirv.am (and
it would be silly anyway).
v2: Do it for meson too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (the meson bits)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (the automake bits)
I threatened to do this a long time ago.. I probably *should* have done
it a long time ago when there where many fewer intrinsics. But the
system of macro/#include magic for dealing with intrinsics is a bit
annoying, and python has the nice property of optional fxn params,
making it possible to define new intrinsics while ignoring parameters
that are not applicable (and naming optional params). And not having to
specify various array lengths explicitly is nice too.
I think the end result makes it easier to add new intrinsics.
v2: couple small fixes found with a test program to compare the old and
new tables
v3: misc comments, don't rely on capture=true for meson.build, get rid
of system_values table to avoid return value of intrinsic() and
*mostly* remove side-effects, add autotools build support
v4: scons build
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This will only ever be used by gallium drivers so it probably doesn't
belong in the nir toolkit. Also we want to pass it some non NIR
things in the following patch.
To avoid regressions we wrap the lowering calls that have been moved
to st_glsl_to_nir with a quick hack so that they are only called for
radeonsi, we will replace the hack with a check for uniform packing
in a following patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This pass moves load UBO operations just before their first use,
loosely based on nir_opt_move_comparisons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Co-authored-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a very simple pass that just shrinks load_push_constant
intrinsics when some components are unused. For now, it can just
shrink vec4 to vec3, vec3 to vec2 and so on.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.
This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Don't use intermediate variables, use consistent whitespace.
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
This autogenerated pass will automatically find and set the type field
on all vtn_values. This way we always have the type and can use it for
validation and other checks.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
V2:
- fix matrix support, non-array matrices were being skipped in v1
v3:
- handle lowering of tcs output loads correctly
- correctly mark indirect locations for either in or out not both
when processing a stage.
- use nir_src_copy() when lowering stores.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This commit pulls nir_lower_read_invocations_to_scalar along with most
of the guts of nir_opt_intrinsics (which mostly does subgroup lowering)
into a new nir_lower_subgroups pass. There are various other bits of
subgroup lowering that we're going to want to do so it makes a bit more
sense to keep it all together in one pass. We also move it in i965 to
happen after nir_lower_system_values to ensure that because we want to
handle the subgroup mask system value intrinsics here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2 (Jason Ekstrand):
- Various whitespace cleanups
- Add helpers for reading/writing objects
- Rework derefs
- [de]serialize nir_shader::num_*
- Fix uses of blob_reserve_bytes
- Use a bitfield struct for packing tex_instr data
v3:
- Zero nir_variable struct on deserialization. (Jordan)
- Allow nir_serialize.h to be included in C++. (Jordan)
- Handle NULL info.name. (Jason)
- Set info.name to NULL when name is NULL. (Jordan)
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Also needed in freedreno/ir3.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
I've been doing this inside of vc4, but vc5 wants it as well and it may be
useful for other drivers (Intel has a related path for pre-gen6 with MRT,
and freedreno had a TGSI path for it at one point).
This required defining a common enum for the standard comparison
functions, but other lowering passes are likely to also want that enum.
v2: Add to meson.build as well.
Acked-by: Rob Clark <robdclark@gmail.com>
This was missed in a rebase, and doesn't affect radv or anv, only i965.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In truth gtest is an external dependency that upstream expects you to
"vendor" into your own tree. As such, it makes sense to treat it more
like a dependency than an internal library, and collect it's
requirements together in a dependency object.
v2: - include with -isystem instead of setting compiler args (Eric)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows building and installing the Intel "anv" Vulkan driver using
meson and ninja, the driver has been tested against the CTS and has
seems to pass the same series of tests (they both segfault when the CTS
tries to run wayland wsi tests).
There are still a mess of TODO, XXX, and FIXME comments in here. Those
are mostly for meson bugs I'm trying to fix, or for additional things to
implement for other drivers/features.
I have configured all intermediate libraries and optional tools to not
build by default, meaning they will only be built if they're pulled in
as a dependency of a target that will actually be installed) this allows
us to avoid massive if chains, while ensuring that only the bits that
need to be built are.
v2: - enable anv, x11, and wayland by default
- add configure option to disable valgrind
v3: - fix typo in meson_options (Nicholas)
v4: - Remove dead code (Eric)
- Remove change to generator that was from v0 (Eric)
- replace if chain with loop (Eric)
- Fix typos (Eric)
- define HAVE_DLOPEN for both libdl and builtin dl cases (Eric)
v5: - rebase on util string buffer implementation
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v4)